General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsHow Tech Giants Cut Corners to Harvest Data for A.I. (NYT)
https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?smid=nytcore-ios-share&referringSource=articleShare&sgrp=c-cbOpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.
By Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson and Nico Grant
Reporting from San Francisco, Washington and New York
April 6, 2024
-snip-
So OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.
Some OpenAI employees discussed how such a move might go against YouTubes rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are independent of the video platform.
Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAIs president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the worlds most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot.
-snip-
At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.
-snip-
Much, much more at the link. The story of how AI companies including Google trained the AI they now hope to make billions, even trillions, from, is one of grand theft that they all knew was grand theft.
As Justine Bateman said (the quote is in the article), "This is the largest theft in the United States, period."
It's a very long article, but well worth reading in its entirety to understand that these companies were well aware they were breaking laws and violating intellectual property rights, but they chose to do so anyway as they've engaged in a crazy AI arms race to have the biggest and best AI models.
They had meetings about this being unethical and illegal. They chose to do it anyway.
I've said repeatedly in posts here that GenAI, generative AI, is FUNDAMENTALLY unethical.
Big Tech essentially set out to steal as much of our culture and knowledge as possible, for the purpose of selling it back to us, with no real intention of ever compensating all the people they stole from.
And IF you use GenAI, tools like ChatGPT and Midjourney and Copilot - with the exception of a few GenAI models with legally licensed datasets (and there's dispute about whether some of those are truly legal) - you're basically saying you're okay with that theft.
erronis
(15,486 posts)If they get caught pay a few million$ in a non-inferring malfeasance fine. What's a few less lattes for the jet set?
My understanding and some knowledge of these companies (and governments) is that once they get the data, they never, ever, delete it completely. They may remove some of it from some caches or some local storage. It is backed up for millennia and will be recalled and reused whenever it serves their purposes.
highplainsdem
(49,140 posts)The companies need much more widespread adoption of their AI tools by paying customers to make them profitable. Those tools should be rejected.
If someone is forced to use illegally trained GenAI in their job, and they can't immediately find a more ethical job, that might be some excuse.
But there's no such excuse for people making individual choices to use it, for amusement or profit, if they're aware of the theft it's based on.
highplainsdem
(49,140 posts)snot
(10,549 posts)Whisper is an idiot and no wonder AI is, too.
highplainsdem
(49,140 posts)That earlier thread, from last October: https://www.democraticunderground.com/100218338820
Last night on Twitter:
Link to tweet
Link to tweet