Latest Breaking News
Showing Original Post only (View all)Project Analyzing Human Language Usage Shuts Down Because 'Generative AI Has Polluted the Data' [View all]
Source: 404 Media
The creator of an open source project that scraped the internet to determine the ever-changing popularity of different words in human language usage says that they are sunsetting the project because generative AI spam has poisoned the internet to a level where the project no longer has any utility.
Wordfreq is a program that tracked the ever-changing ways people used more than 40 different languages by analyzing millions of sources across Wikipedia, movie and TV subtitles, news articles, books, websites, Twitter, and Reddit. The system could be used to analyze changing language habits as slang and popular culture changed and language evolved, and was a resource for academics who study such things. In a note on the projects GitHub, creator Robyn Speer wrote that the project will not be updated anymore.
Generative AI has polluted the data, she wrote. I dont think anyone has reliable information about post-2021 language usage by humans.
She said that open web scraping was an important part of the projects data sources and now the web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.
-snip-
Read more: https://www.404media.co/project-analyzing-human-language-usage-shuts-down-because-generative-ai-has-polluted-the-data/
Generative AI pollutes our information ecosystem, whether it's something like what this article describes, AI search returning inaccurate and often false and defamatory results, or search results for an artist's work burying their real art under AI slop copying their style. And the problem will only worsen as genAI is used more and more.
It's as damaging to our information ecosystem as pollutants and climate change are to our planet, but instead of the harm being done over decades, it's already causing enormous damage in just a couple of years.