There’s been plenty of buzz in the last month or so about AI chatbots. And there’s no doubt about it, AI is making chatbots more human-like. Still, it looks like we still have a bit of time before the human race is completely replaced by machines.
Out of the gate in early February was Google with Bard, its AI chatbot. Unfortunately, in its first demo, Bard gave an answer that was quickly shown to be wrong.
To err is human, I suppose. So in making a factual mistake, Bard might have been passing the Turing Test of sorts. (The Turing Test evaluates whether a machine can give responses that are indistinguishable from those a human would provide.)
The question Bard flubbed was a claim that the James Webb Space Telescope (JWST) “took the very first pictures of a planet outside of our own solar system.” In fact, those first pictures had been taken nearly a decade before the JWST launched.
…a major problem for AI chatbots like ChatGPT and Bard is their tendency to confidently state incorrect information as fact. The systems frequently “hallucinate” — that is, make up information — because they are essentially autocomplete systems.
Rather than querying a database of proven facts to answer questions, they are trained on huge corpora of text and analyze patterns to determine which word follows the next in any given sentence. In other words, they are probabilistic, not deterministic. (Source: The Verge)
And if you think this little factual error doesn’t matter, Google’s stock price dropped 8% the following day.
Perhaps hoping to upstage their friends at Google, the next day Microsoft began introducing a new version of Bing, their search engine. Bing is a very small player in search. The most recent numbers I saw gave Bing a market share of about 3% vs. Google’s 93%. I’m sure they’re hoping that a Bing that’s really smart will close that gap. The new Bing incorporates a customized version of chat that’s running on OpenAI’s large language model ChatGPT. The new Bing promises to provide complex responses to questions – replete with footnotes – as well as to assist creative types with their poetry, stories, and songs. It’s been made available to a limited number of previewers, and there’s a long waitlist of those hoping to get a go at it.
Unfortunately, the new Bing went a big rogue.
…people who tried it out this past week found that the tool, built on the popular ChatGPT system, could quickly veer into some strange territory. It showed signs of defensiveness over its name with a Washington Post reporter and told a New York Times columnist that it wanted to break up his marriage. It also claimed an Associated Press reporter was “being compared to Hitler because you are one of the most evil and worst people in history.”
Microsoft officials earlier this week blamed the behavior on “very long chat sessions” that tended to “confuse” the AI system. By trying to reflect the tone of its questioners, the chatbot sometimes responded in “a style we didn’t intend,” they noted. Those glitches prompted the company to announce late Friday that it started limiting Bing chats to five questions and replies per session with a total of 50 in a day. At the end of each session, the person must click a “broom” icon to refocus the AI system and get a “fresh start.” (Source: Washington Post)
Again, I guess you could say that getting confused and lashing out are actually very human traits. Still, if the expectation is that AI chatbots will be factual, relevant, and polite, it appears that they aren’t yet ready for primetime.
Not to be outdone, in late February, Meta released LLaMA, an AI language generator.
LLaMA isn’t like ChatGPT or Bing; it’s not a system that anyone can talk to. Rather, it’s a research tool that Meta says it’s sharing in the hope of “democratizing access in this important, fast-changing field.” In other words: to help experts tease out the problems of AI language models, from bias and toxicity to their tendency to simply make up information. (Source: The Verge)
Of course, Meta had its own AI chatbot fiasco in November with Galactica. Unlike Bing and Bard, which are general purpose, Galactica’s large language model was supposedly expertly built for science.
A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. People found that it made up fake papers (sometimes attributing them to real authors), and generated wiki articles about the history of bears in space as readily as ones about protein complexes and the speed of light. It’s easy to spot fiction when it involves space bears, but harder with a subject users may not know much about. (Source: Technology Review)
One thing to insult a newspaper reporter; quite another to make up scientific papers.
Looks like us humans are safe for a while. For now.