
OpenAI is under fire following allegations that its newly released GPT-4o model may have been trained using copyrighted and paywalled content from O’Reilly Media—without authorization. The claims stem from a recent study published by the AI Disclosures Project, a watchdog initiative co-founded by tech author Tim O’Reilly and economist Ilan Strauss. The research suggests that GPT-4o demonstrates a high level of familiarity with O’Reilly’s proprietary content, sparking concerns over intellectual property violations.
Researchers applied a method called DE-COP (Detection of Content Overlap via Perturbations), a membership inference technique that analyzes whether specific data was part of a model’s training set. By examining nearly 14,000 paragraphs from 34 O’Reilly books, the study found that GPT-4o responded with an 82% AUROC (Area Under the Receiver Operating Characteristic Curve) score—indicating strong evidence that this paywalled content was likely included during training.
This revelation is particularly troubling given that O’Reilly Media confirmed it has no licensing agreement with OpenAI. In comparison, GPT-3.5 Turbo—the predecessor to GPT-4o—showed significantly less overlap with the same content, suggesting OpenAI may have increased its use of non-public data in its newer models.
The findings arrive at a time of growing industry scrutiny around how large language models are trained. Copyright and data transparency have become flashpoints in the broader debate on ethical AI development. As companies scale AI capabilities, questions remain about consent, fair use, and the rights of content creators.
OpenAI has not yet publicly addressed the allegations. The AI Disclosures Project has called for greater transparency and regulation, urging AI developers to disclose training data sources and secure appropriate licenses.
The controversy underscores the ongoing tension between rapid AI innovation and the foundational need to respect intellectual property laws.
- OpenAI Faces Scrutiny Over Use of Paywalled Books in GPT-4o Training - April 23, 2025
- YouTube’s Debut AI-Generated Music Tools Can Clone Artist Voices And Transform Hums Into Songs - November 29, 2023
- YouTube Is Experimenting With Bite-Sized Games On Desktop And Mobile - September 20, 2023