AI Giants Accused of Potentially Violating Copyrights on YouTube Videos

Recent revelations from The New York Times suggest that both OpenAI and Google have utilized text transcribed from YouTube videos to train their AI models, sparking concerns over potential copyright violations.

The report illuminates the practices employed by these tech giants to maximize data input for their AI systems, raising questions about compliance with platform policies and ethical considerations.

Allegations Against OpenAI:

According to The New York Times, OpenAI transcribed over one million hours of YouTube videos using its Whisper speech recognition tool.

This transcribed data was then purportedly used to train OpenAI’s latest AI model, GPT-4. Notably, OpenAI’s president, Greg Brockman, was said to be involved in this endeavor, which allegedly extended to using YouTube videos and podcasts in training previous AI systems as well.

Response from Google:

While Google acknowledges its use of YouTube content for training AI models, it maintains that it only does so with the consent of creators who have agreed to such usage.

However, the report suggests that there were individuals within Google who were aware of OpenAI’s practices but did not intervene, potentially due to Google’s use of YouTube videos in training its AI models.

Google’s Policy Adjustments:

The report also highlights changes Google made to its privacy policy in June 2023, which aimed to broaden the scope of publicly available content used for training AI models and products.

While Google asserts that these changes were made for clarity and transparency, concerns arise over the expanded use of data from platforms like Google Docs and Google Sheets without explicit user consent.

Reactions and Further Inquiries:

Engadget has reached out to both Google and OpenAI for comments on the matter. Additionally, Google spokesperson Matt Bryant clarified that data usage from publicly available sources, including the amended language in its privacy policy, is contingent upon user consent, particularly for experimental features tests.