Facebook-parent Meta CEO
Mark Zuckerberg has used
YouTube's battle against piracy to defend his company's use of copyrighted ebooks for AI training. According to newly released excerpts from a deposition (seen by Techcrunch), which was a part of the Kadrey v. Meta lawsuit, Zuckerberg’s defence highlighted the ongoing legal debate surrounding the "fair use" of copyrighted content for AI training, a claim disputed by many copyright holders.
This is one of many cases where authors and IP holders are challenging AI companies over their use of copyrighted material.
What Zuckerberg said to defend Meta
During his deposition, Zuckerberg said:
“For example, YouTube, I think, may end up hosting some stuff that people pirate for some period of time, but YouTube is trying to take that stuff down. And the vast majority of the stuff on YouTube, I would assume, is kind of good and they have the license to do.” What Zuckerberg said about Meta using LibGen
While a full transcript remains unavailable, excerpts from Zuckerberg's deposition shed light on his views regarding copyright and fair use in AI development. Zuckerberg seems to defend Meta's use of the "LibGen" dataset, a collection of copyrighted ebooks, to train its Llama AI models.
LibGen, a self-described "links aggregator," provides access to copyrighted works from major publishers and has faced numerous lawsuits and fines for copyright infringement. Despite concerns within Meta's AI teams about the legal implications, Zuckerberg allegedly approved the use of LibGen for training at least one Llama model.
The plaintiffs' counsel, representing authors like
Sarah Silverman and Ta-Nehisi Coates, highlighted internal concerns within Meta about using the LibGen dataset. According to a legal filing, Meta employees referred to LibGen as a "data set we know to be pirated" and warned that its use "may undermine (the company’s)] negotiating position with regulators."
However, during his deposition, Zuckerberg claimed he "hadn't really heard of" LibGen. He said:
“I get that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of. It’s just that I don’t have knowledge of that specific thing.”When questioned by one of the plaintiffs' attorneys, David Boies, about the company's use of the LibGen dataset. Zuckerberg defended Meta's use of the dataset, arguing that prohibiting the use of such resources for AI training would be unreasonable.
“So would I want to have a policy against people using YouTube because some of the content may be copyrighted? No. [T]here are cases where having such a blanket ban might not be the right thing to do,” he said.
“You know, [if there’s] someone who’s providing a website and they’re intentionally trying to violate people’s rights … obviously it’s something that we would want to be cautious about or careful about how we engaged with it or maybe even prevent our teams from engaging with it,” Zuckerberg added.