One night last week, the law professor Jonathan Turley got a troubling email. As part of a research study, a fellow lawyer in California had asked the AI chatbot ChatGPT to generate a list of legal scholars who had sexually harassed someone. Turley’s name was on the list.
Tech is not your friend. We are. Sign up for The Tech Friend newsletter.
The chatbot, created by OpenAI, said Turley had made sexually suggestive comments and attempted to touch a student while on a class trip to Alaska, citing a March 2018 article in The Washington Post as the source of the information. The problem: No such article existed. There had never been a class trip to Alaska. And Turley said he’d never been accused of harassing a student.
A regular commentator in the media, Turley had sometimes asked for corrections in news stories. But this time, there was no journalist or editor to call — and no way to correct the record.
“It was quite chilling,” he said in an interview with The Post. “An allegation of this kind is incredibly harmful.”
Turley’s experience is a case study in the pitfalls of the latest wave of language bots, which have captured mainstream attention with their ability to write computer code, craft poems and hold eerily humanlike conversations. But this creativity can also be an engine for erroneous claims; the models can misrepresent key facts with great flourish, even fabricating primary sources to back up their claims.
As largely unregulated artificial intelligence software such as ChatGPT, Microsoft’s Bing and Google’s Bard begins to be incorporated across the web, its propensity to generate potentially damaging falsehoods raises concerns about the spread of misinformation — and novel questions about who’s responsible when chatbots mislead.
“Because these systems respond so confidently, it’s very seductive to assume they can do everything, and it’s very difficult to tell the difference between facts and falsehoods,” said Kate Crawford, a professor at the University of Southern California at Annenberg and senior principal researcher at Microsoft Research.
In a statement, OpenAI spokesperson Niko Felix said, “When users sign up for ChatGPT, we strive to be as transparent as possible that it may not always generate accurate answers. Improving factual accuracy is a significant focus for us, and we are making progress.”
Today’s AI chatbots work by drawing on vast pools of online content, often scraped from sources such as Wikipedia and Reddit, to stitch together plausible-sounding responses to almost any question. They’re trained to identify patterns of words and ideas to stay on topic as they generate sentences, paragraphs and even whole essays that may resemble material published online.
But just because they’re good at predicting which words are likely to appear together doesn’t mean the resulting sentences are always true; the Princeton University computer science professor Arvind Narayanan has called ChatGPT a “bulls--- generator.” While their responses often sound authoritative, the models lack reliable mechanisms for verifying the things they say. Users have posted numerous examples of the tools fumbling basic factual questions or even fabricating falsehoods, complete with realistic details and fake citations.
On Wednesday, Reuters reported that Brian Hood, regional mayor of Hepburn Shire in Australia, is threatening to file the first defamation lawsuit against OpenAI unless it corrects false claims that he had served time in prison for bribery.
Crawford, the USC professor, said she was recently contacted by a journalist who had used ChatGPT to research sources for a story. The bot suggested Crawford and offered examples of her relevant work, including an article title, publication date and quotes. All of it sounded plausible, and all of it was fake.
Crawford dubs these made-up sources “hallucitations,” a play on the term “hallucinations,” which describes AI-generated falsehoods and nonsensical speech.
“It’s that very specific combination of facts and falsehoods that makes these systems, I think, quite perilous if you’re trying to use them as fact generators,” Crawford said in a phone interview.
Microsoft’s Bing chatbot and Google’s Bard chatbot both aim to give more factually grounded responses, as does a new subscription-only version of ChatGPT that runs on an updated model, called GPT-4. But they all still make notable slip-ups. And the major chatbots all come with disclaimers, such as Bard’s fine-print message below each query: “Bard may display inaccurate or offensive information that doesn’t represent Google’s views.”
Tech is not your friend. We are. Sign up for The Tech Friend newsletter.
The chatbot, created by OpenAI, said Turley had made sexually suggestive comments and attempted to touch a student while on a class trip to Alaska, citing a March 2018 article in The Washington Post as the source of the information. The problem: No such article existed. There had never been a class trip to Alaska. And Turley said he’d never been accused of harassing a student.
A regular commentator in the media, Turley had sometimes asked for corrections in news stories. But this time, there was no journalist or editor to call — and no way to correct the record.
“It was quite chilling,” he said in an interview with The Post. “An allegation of this kind is incredibly harmful.”
Turley’s experience is a case study in the pitfalls of the latest wave of language bots, which have captured mainstream attention with their ability to write computer code, craft poems and hold eerily humanlike conversations. But this creativity can also be an engine for erroneous claims; the models can misrepresent key facts with great flourish, even fabricating primary sources to back up their claims.
As largely unregulated artificial intelligence software such as ChatGPT, Microsoft’s Bing and Google’s Bard begins to be incorporated across the web, its propensity to generate potentially damaging falsehoods raises concerns about the spread of misinformation — and novel questions about who’s responsible when chatbots mislead.
“Because these systems respond so confidently, it’s very seductive to assume they can do everything, and it’s very difficult to tell the difference between facts and falsehoods,” said Kate Crawford, a professor at the University of Southern California at Annenberg and senior principal researcher at Microsoft Research.
In a statement, OpenAI spokesperson Niko Felix said, “When users sign up for ChatGPT, we strive to be as transparent as possible that it may not always generate accurate answers. Improving factual accuracy is a significant focus for us, and we are making progress.”
Today’s AI chatbots work by drawing on vast pools of online content, often scraped from sources such as Wikipedia and Reddit, to stitch together plausible-sounding responses to almost any question. They’re trained to identify patterns of words and ideas to stay on topic as they generate sentences, paragraphs and even whole essays that may resemble material published online.
But just because they’re good at predicting which words are likely to appear together doesn’t mean the resulting sentences are always true; the Princeton University computer science professor Arvind Narayanan has called ChatGPT a “bulls--- generator.” While their responses often sound authoritative, the models lack reliable mechanisms for verifying the things they say. Users have posted numerous examples of the tools fumbling basic factual questions or even fabricating falsehoods, complete with realistic details and fake citations.
On Wednesday, Reuters reported that Brian Hood, regional mayor of Hepburn Shire in Australia, is threatening to file the first defamation lawsuit against OpenAI unless it corrects false claims that he had served time in prison for bribery.
Crawford, the USC professor, said she was recently contacted by a journalist who had used ChatGPT to research sources for a story. The bot suggested Crawford and offered examples of her relevant work, including an article title, publication date and quotes. All of it sounded plausible, and all of it was fake.
Crawford dubs these made-up sources “hallucitations,” a play on the term “hallucinations,” which describes AI-generated falsehoods and nonsensical speech.
“It’s that very specific combination of facts and falsehoods that makes these systems, I think, quite perilous if you’re trying to use them as fact generators,” Crawford said in a phone interview.
Microsoft’s Bing chatbot and Google’s Bard chatbot both aim to give more factually grounded responses, as does a new subscription-only version of ChatGPT that runs on an updated model, called GPT-4. But they all still make notable slip-ups. And the major chatbots all come with disclaimers, such as Bard’s fine-print message below each query: “Bard may display inaccurate or offensive information that doesn’t represent Google’s views.”