Fri Mar 17, 2023, 12:35 PM
highplainsdem (43,131 posts)
Wired: GPT-4 Will Make ChatGPT Smarter but Won't Fix Its Flaws
https://www.wired.com/story/gpt-4-openai-will-make-chatgpt-smarter-but-wont-fix-its-flaws/
However, GPT-4 suffers from the same problems that have bedeviled ChatGPT and cause some AI experts to be skeptical of its usefulness—including tendencies to “hallucinate” incorrect information, exhibit problematic social biases, and misbehave or assume disturbing personas when given an “adversarial” prompt.
“While they’ve made a lot of progress, it’s clearly not trustworthy,” says Oren Etzioni, a professor emeritus at the University of Washington and the founding CEO of the Allen Institute for AI. “It’s going to be a long time before you want any GPT to run your nuclear power plant.” And OpenAI's Greg Brockman admits it isn't trustworthy: https://techcrunch.com/2023/03/15/interview-with-openais-greg-brockman-gpt-4-isnt-perfect-but-neither-are-you/ But GPT-4 has serious shortcomings. Like GPT-3, the model “hallucinates” facts and makes basic reasoning errors. In one example on OpenAI’s own blog, GPT-4 describes Elvis Presley as the “son of an actor.” (Neither of his parents were actors.)
-snip- “We spent a lot of time trying to understand what GPT-4 is capable of,” Brockman said. “Getting it out in the world is how we learn. We’re constantly making updates, include a bunch of improvements, so that the model is much more scalable to whatever personality or sort of mode you want it to be in.” The early real-world results aren’t that promising, frankly. Beyond the Adversa AI tests, Bing Chat, Microsoft’s chatbot powered by GPT-4, has been shown to be highly susceptible to jailbreaking. Using carefully tailored inputs, users have been able to get the bot to profess love, threaten harm, defend the Holocaust and invent conspiracy theories. -snip- “Really figuring out GPT-4’s tone, the style and the substance has been a great focus for us,” Brockman said. “I think we’re starting to understand a little bit more of how to do the engineering, about how to have a repeatable process that kind of gets you to predictable results that are going to be really useful to people.” So they're "starting to understand a little bit more...about how to have a repeatable process that kind of gets you to predictable results that are going to be really useful to people." This is why people comparing LLMs to calculators as useful tools that should be widely adopted are dead wrong. Calculators were and are reliable. It was not necessary to tell users, as OpenAI does, that you can't trust the new tool, especially for anything important, and should check every result you get carefully. The people manufacturing calculators weren't doing interviews talking about how they were beginning to understand calculators a bit better, with the goal of getting to "a repeatable process that kind of gets you to predictable results that are going to be really useful." Yet OpenAI is pushing their LLM's use in business, science, etc. as hard as they can. It's as if a calculator manufacturer had admitted, "We know 11x13 isn't really 26 or 1,113 or -2, but look how FAST it offered those answers. Just check its math, and make sure you don't use it for anything important." Businesses are falling for the AI hype anyway, telling employees to rely on it more to greatly increase productivity. From Bloomberg: https://www.bloomberg.com/opinion/articles/2023-03-16/openai-s-gpt-4-could-turn-work-into-a-hyperproductive-hellscape#xj4y7vzkg - Archive page at https://archive.ph/gbTQD "McMillan says the effort will also further enrich the relationship between Morgan Stanley advisors and their clients by enabling them to assist more people more quickly."
How much more quickly? A spokesperson for Morgan Stanley tells me its advisers can now do in seconds what they used to do in half an hour, such as looking at an analyst’s note to advise a client on the performance of certain companies and their shares. and That is what partly happened to professional translators and interpreters. As artificial intelligence tools like Google Translate and DeepL grew in popularity among business customers, many translators feared they would be replaced. Instead, they were expected to increase their output.
Before the advent of translation tools, a professional would be expected to translate between 1,000 and 2,000 words a day, according to Nuria Llanderas, who has been a professional interpreter for more than 20 years. “Now they are expected to manage 7,000,” she says. Her industry peers have predicted more AI systems will start supporting them on simultaneous translation, but that could also mean more work for the human translators in practice, checking that the machine’s output isn’t wrong. Notice that there doesn't seem to be any recognition here that checking accuracy takes time. And they're using AI that not only should be checked but must be checked in any situation where errors can cause harm. I think it's a safe bet that some businesses - maybe a lot of businesses - using OpenAI's confident-sounding but often wrong or outright hallucinating software have run into problems with it, problems causing harm or potentially causing harm to the business and/or its employees and/or its customers. I also think it's a safe bet that businesses running into those problems will NOT want to make those problems public. They'll probably complain to OpenAI about them, but OpenAI won't want to make them public, either. We're not likely to hear about the failures until they're bad enough that whistleblowers risk job loss and lawsuits to come forward. OpenAI will probably hope all the warnings about the unreliability of its products will protect it from lawsuits. But businesses harmed by it are likely to try to test that. I just hope a lot of innocent individuals aren't harmed first, as OpenAI conducts what amounts to a giant beta test on our society and economy.
|
3 replies, 256 views
Always highlight: 10 newest replies | Replies posted after I mark a forum
Replies to this discussion thread
![]() |
Author | Time | Post |
![]() |
highplainsdem | Friday | OP |
Casady1 | Friday | #1 | |
highplainsdem | Friday | #2 | |
old as dirt | Friday | #3 |
Response to highplainsdem (Original post)
Fri Mar 17, 2023, 12:38 PM
Casady1 (1,667 posts)
1. My brother and I work in High tech
My current company has very good AI and it is probably the best in our vertical. The first thing we tell people is that AI is way oversold in its capabilities.
|
Response to Casady1 (Reply #1)
Fri Mar 17, 2023, 12:50 PM
highplainsdem (43,131 posts)
2. Kudos to your company for not overselling it!
![]() |
Response to highplainsdem (Original post)
Fri Mar 17, 2023, 05:54 PM
old as dirt (1,935 posts)
3. I prefer DeepL to Google translate.
(I forget why, but I had reason.)
Lately, I think I like ChatGPT over DeepL. That is what partly happened to professional translators and interpreters. As artificial intelligence tools like Google Translate and DeepL grew in popularity among business customers, many translators feared they would be replaced. |