Startups

OpenAI open-sources Whisper, a multilingual speech recognition system

Comment

OpenAI's logo
Image Credits: OpenAI

Speech recognition remains a challenging problem in AI and machine learning. In a step toward solving it, OpenAI today open-sourced Whisper, an automatic speech recognition system that the company claims enables “robust” transcription in multiple languages as well as translation from those languages into English.

Countless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon and Meta. But what makes Whisper different, according to OpenAI, is that it was trained on 680,000 hours of multilingual and “multitask” data collected from the web, which lead to improved recognition of unique accents, background noise and technical jargon.

“The primary intended users of [the Whisper] models are AI researchers studying robustness, generalization, capabilities, biases and constraints of the current model. However, Whisper is also potentially quite useful as an automatic speech recognition solution for developers, especially for English speech recognition,” OpenAI wrote in the GitHub repo for Whisper, from where several versions of the system can be downloaded. “[The models] show strong ASR results in ~10 languages. They may exhibit additional capabilities … if fine-tuned on certain tasks like voice activity detection, speaker classification or speaker diarization but have not been robustly evaluated in these area.”

Whisper has its limitations, particularly in the area of text prediction. Because the system was trained on a large amount of “noisy” data, OpenAI cautions Whisper might include words in its transcriptions that weren’t actually spoken — possibly because it’s both trying to predict the next word in audio and trying to transcribe the audio itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to speakers of languages that aren’t well-represented in the training data.

That last bit is nothing new to the world of speech recognition, unfortunately. Biases have long plagued even the best systems, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM and Microsoft made far fewer errors — about 19% — with users who are white than with users who are Black.

Despite this, OpenAI sees Whisper’s transcription capabilities being used to improve existing accessibility tools.

“While Whisper models cannot be used for real-time transcription out of the box, their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation,” the company continues on GitHub. “The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications … [W]e hope the technology will be used primarily for beneficial purposes, making automatic speech recognition technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication.”

The release of Whisper isn’t necessarily indicative of OpenAI’s future plans. While increasingly focused on commercial efforts like DALL-E 2 and GPT-3, the company is pursuing several purely theoretical research threads, including AI systems that learn by observing videos.

More TechCrunch

The TechCrunch team runs down all of the biggest news from the Apple WWDC 2024 keynote in an easy-to-skim digest.

Here’s everything Apple announced at the WWDC 2024 keynote, including Apple Intelligence, Siri makeover

Hello and welcome back to TechCrunch Space. What a week! In the same seven-day period, we watched Boeing’s Starliner launch astronauts to space for the first time, and then we…

TechCrunch Space: A week that will go down in history

Elon Musk’s posts seem to misunderstand the relationship Apple announced with OpenAI at WWDC 2024.

Elon Musk threatens to ban Apple devices from his companies over Apple’s ChatGPT integrations

“We’re looking forward to doing integrations with other models, including Google Gemini, for instance, in the future,” Federighi said during WWDC 2024.

Apple confirms plans to work with Google’s Gemini ‘in the future’

When Urvashi Barooah applied to MBA programs in 2015, she focused her applications around her dream of becoming a venture capitalist. She got rejected from every school, and was told…

How Urvashi Barooah broke into venture after everyone told her she couldn’t

Slack CEO Denise Dresser is speaking at TechCrunch Disrupt 2024.

Slack CEO Denise Dresser is coming to TechCrunch Disrupt this October

Apple kicked off its weeklong Worldwide Developers Conference (WWDC 2024) event today with the customary keynote at 1 p.m. ET/10 a.m. PT. The presentation focused on the company’s software offerings…

Watch the Apple Intelligence reveal, and the rest of WWDC 2024 right here

Apple’s SDKs (software development kits) have been updated with a variety of new APIs and frameworks.

Apple brings its GenAI ‘Apple Intelligence’ to developers, will let Siri control apps

Older iPhones or iPhone 15 users won’t be able to use these features.

Apple Intelligence features will be available on iPhone 15 Pro and devices with M1 or newer chips

Soon, Siri will be able to tap ChatGPT for “expertise” where it might be helpful, Apple says.

Apple brings ChatGPT to its apps, including Siri

Apple Intelligence will have an understanding of who you’re talking with in a messaging conversation.

Apple debuts AI-generated … Bitmoji

To use InSight, Apple TV+ subscribers can swipe down on their remote to bring up a display with actor names and character information in real time.

Apple TV+ introduces InSight, a new feature similar to Amazon’s X-Ray, at WWDC 2024

Siri is now more natural, more relevant and more personal — and it has new look.

Apple gives Siri an AI makeover

The company has been pushing the feature as integral to all of its various operating system offerings, including iOS, macOS and the latest, VisionOS.

Apple Intelligence is the company’s new generative AI offering

In addition to all the features you can find in the Passwords menu today, there’s a new column on the left that lets you more easily navigate your password collection.

Apple is launching its own password manager app

With Smart Script, Apple says it’s making handwriting your notes even smoother and straighter.

Smart Script in iPadOS 18 will clean up your handwriting when using an Apple Pencil

iOS’ perennial tips calculating app is finally coming to the larger screen.

Calculator for iPad does the math for you

The new OS, announced at WWDC 2024, will allow users to mirror their iPhone screen directly on their Mac and even control it.

With macOS Sequoia, you can mirror your iPhone on your Mac

At Apple’s WWDC 2024, the company announced MacOS Sequoia.

Apple unveils macOS Sequoia

“Messages via Satellite,” announced at Apple’s WWDC 2024 keynote, works much like the SOS feature does.

iPhones will soon text via satellite

Apple says the new design will lead to less time searching for photos.

Apple revamps its Photos app for iOS 18

Users will be able to lock an app when they hand over their phone.

iOS 18 will let you hide and lock apps

Apple’s WWDC 2024 keynote was packed, including a number of key new updates for iOS 18. One of the more interesting additions is Tap to Cash, which is more or…

Tap to Cash lets you pay by touching iPhones

In iOS 18, Apple will now support long-requested functionality, like the ability to set app icons and widgets wherever you want.

iOS 18 will finally let you customize your icons and unlock them from the grid

As expected, this is a pivotal moment for the mobile platform as iOS 18 is going to focus on artificial intelligence.

Apple unveils iOS 18 with tons of AI-powered features

Apple today kicked off what it promised would be a packed WWDC 2024 with a handful of visionOS announcements. At the top of the list is the ability to turn…

visionOS can now make spatial photos out of 3D images

The Apple Vision Pro is now available in eight new countries.

Apple to release Vision Pro in international markets

VisionOS 2 will come to Vision Pro as a free update later this year.

Apple debuts visionOS 2 at WWDC 2024

The security firm said the attacks targeting Snowflake customers is “ongoing,” suggesting the number of affected companies may rise.

Mandiant says hackers stole a ‘significant volume of data’ from Snowflake customers

French startup Kelvin, which uses computer vision and machine learning to make it easier to audit homes for energy efficiency, has raised $5.1M.

Kelvin wants to help save the planet by applying AI to home energy audits