Tech giant OpenAI has touted its AI-powered transcription tool Whisper as having human-level robustness and accuracy.
But Whisper has a major drawback: It has a tendency to make up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the made-up text — known in the industry as hallucinations — could include racist commentary, violent rhetoric and even imagined medical treatments.
Experts said such fabrications are problematic because Whisper is used in a range of industries around the world to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.
More worrying, they said, is a rush through medical centers to use Whisper-based tools to transcribe patients’ consultations with doctors OpenAI’ s warns that the tool should not be used in ‘high-risk domains’.
The full extent of the problem is difficult to know, but researchers and engineers said they often encountered Whisper’s hallucinations in their work. A University of Michigan For example, a researcher studying public meetings said he found hallucinations in eight out of 10 audio transcripts he inspected before he started trying to improve the model.
A machine learning engineer said he initially discovered hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer said he found hallucinations in almost every one of the 26,000 transcriptions he made with Whisper.
The problems persist even with well-recorded, short audio clips. A recent study by computer scientists revealed 187 hallucinations in the more than 13,000 clear audio samples they examined.
That trend would lead to tens of thousands of incorrect transcriptions of millions of recordings, researchers said.
Copyright 2024 Health News Florida