Last week, two authors, Paul Tremblay and Mona Awad, filed a lawsuit against OpenAI, claiming that their copyrighted books were used without consent to train the company’s artificial intelligence chatbot, ChatGPT. The authors argue that ChatGPT generates “very accurate summaries” of their works, which they believe is only possible if the chatbot was trained on their books, thus violating copyright law.
OpenAI, the San Francisco-based research company behind ChatGPT, has not yet responded to CNBC’s request for comment. Similarly, the lawyers representing Tremblay and Awad have not provided an immediate response.
ChatGPT is an advanced text generation model that responds to written prompts in a highly creative and sophisticated manner, surpassing previous chatbot technologies developed in Silicon Valley. OpenAI, led by Sam Altman and supported by Microsoft, trained the chatbot on an extensive dataset, but the precise details of the training data have not been disclosed. OpenAI mentioned that the data included web content, archived books, and information from Wikipedia.
The lawsuit, filed in a San Francisco federal court, alleges that a significant portion of OpenAI’s training data consists of copyrighted materials, including books written by Tremblay and Awad. However, proving how and where ChatGPT acquired this information, as well as demonstrating financial damages suffered by the authors, may present a challenge.
The complaint includes exhibits of the summaries generated by ChatGPT, acknowledging that the chatbot occasionally makes mistakes. Nevertheless, Awad and Tremblay contend that the rest of the summaries accurately reflect their works, indicating that ChatGPT retained knowledge from their books within its training dataset.
The complaint further states that ChatGPT did not reproduce any of the copyright management information provided by the authors with their published works.
As the lawsuit unfolds, it remains to be seen how the court will evaluate the claims made by the authors and determine the implications for OpenAI’s use of copyrighted material in training its AI models.