Meta CEO Mark Zuckerberg has been accused of allowing the Llama AI team to use pirated ebooks and articles for training without permission. The Kadrey v. Meta case is one of many against tech giants developing AI, accusing them of training models on copyrighted works without permission. Meta argues they are protected by fair use, but many creators reject this argument.
In unredacted documents filed with the U.S. District Court for the Northern District of California, plaintiffs in Kadrey v. Meta, including authors Sarah Silverman and Ta-Nehisi Coates, reveal that Meta’s use of a data set called LibGen for Llama-related training was approved by Mark Zuckerberg.
LibGen, a “links aggregator” that provides access to copyrighted works from publishers, has been sued, ordered to shut down, and fined tens of millions of dollars for copyright infringement. Meta’s testimony reveals that Zuckerberg cleared the use of LibGen to train at least one of Meta’s Llama models despite concerns within the company.
Meta, a company that reportedly cut corners to gather data for its AI, was hiring African contractors to aggregate book summaries and considering buying publisher Simon & Schuster. However, executives argued that fair use was a solid defense due to time-consuming license negotiations.
Meta has filed a lawsuit alleging that it may have concealed its alleged infringement by stripping attribution from LibGen data. The lawsuit claims that Meta engineer Nikolay Bashlykov wrote a script to remove copyright information from ebooks in LibGen and stripped copyright markers from science journal articles and “source metadata” in the training data used for Llama. The filing suggests that this was done not just for training purposes but also to conceal infringement.
Meta, a company that developed Llama models, has been accused of copyright infringement by allegedly torrenting LibGen, a file-sharing tool. The company’s head of generative AI, Ahmad Ah-Dahle, has been accused of clearing the path for torrenting LibGen, despite concerns from some Meta research engineers. The plaintiffs’ counsel claims that Meta’s decision to bypass lawful methods of acquiring books and become a knowing participant in an illegal torrenting network serves as proof of copyright infringement.
The case against Meta is far from decided, but the judge presiding over the case, Judge Thomas Hixson, rejected Meta’s request to redact large portions of the filing, stating that the sealing request is not designed to protect against the disclosure of sensitive business information but to avoid negative publicity.