AI

WaveOne aims to make video AI-native and turn streaming upside down

Comment

WaveOne logo on a background of blurry trees that get clearer towards the right.
Image Credits: WaveOne

Video has worked the same way for a long, long time. And because of its unique qualities, video has been largely immune to the machine learning explosion upending industry after industry. WaveOne hopes to change that by taking the decades-old paradigm of video codecs and making them AI-powered — while somehow avoiding the pitfalls that would-be codec revolutionizers and “AI-powered” startups often fall into.

The startup has until recently limited itself to showing its results in papers and presentations, but with a recently raised $6.5M seed round, they are ready to move towards testing and deploying their actual product. It’s no niche: video compression may seem a bit in the weeds to some, but there’s no doubt it’s become one of the most important processes of the modern internet.

Here’s how it’s worked pretty much since the old days when digital video first became possible. Developers create a standard algorithm for compressing and decompressing video, a codec, which can easily be distributed and run on common computing platforms. This is stuff like MPEG-2, H.264, and that sort of thing. The hard work of compressing a video can be done by content providers and servers, while the comparatively lighter work of decompressing is done on the end user’s machines.

This approach is quite effective, and improvements to codecs (which allow more efficient compression) have led to the possibility of sites like YouTube. If videos were 10 times bigger, YouTube would never have been able to launch when it did. The other major change was beginning to rely on hardware acceleration of said codecs — your computer or GPU might have an actual chip in it with the codec baked in, ready to perform decompression tasks with far greater speed than an ordinary general-purpose CPU in a phone. Just one problem: when you get a new codec, you need new hardware.

Mac-optimized TensorFlow flexes new M1 and GPU muscles

But consider this: many new phones ship with a chip designed for running machine learning models, which like codecs can be accelerated, but unlike them the hardware is not bespoke for the model. So why aren’t we using this ML-optimized chip for video? Well, that’s exactly what WaveOne intends to do.

I should say that I initially spoke with WaveOne’s cofounders, CEO Lubomir Bourdev and CTO Oren Rippel, from a position of significant skepticism despite their impressive backgrounds. We’ve seen codec companies come and go, but the tech industry has coalesced around a handful of formats and standards that are revised in a painfully slow fashion. H.265, for instance, was introduced in 2013, but years afterwards its predecessor, H.264, was only beginning to achieve ubiquity. It’s more like the 3G, 4G, 5G system than version 7, version 7.1, etc. So smaller options, even superior ones that are free and open source, tend to get ground beneath the wheels of the industry-spanning standards.

This track record for codecs, plus the fact that startups like to describe practically everything is “AI-powered,” had me expecting something at best misguided, at worst scammy. But I was more than pleasantly surprised: In fact WaveOne is the kind of thing that seems obvious in retrospect and appears to have a first-mover advantage.

The first thing Rippel and Bourdev made clear was that AI actually has a role to play here. While codecs like H.265 aren’t dumb — they’re very advanced in many ways — they aren’t exactly smart, either. They can tell where to put more bits into encoding color or detail in a general sense, but they can’t, for instance, tell where there’s a face in the shot that should be getting extra love, or a sign or trees that can be done in a special way to save time.

But face and scene detection are practically solved problems in computer vision. Why shouldn’t a video codec understand that there is a face, then dedicate a proportionate amount of resources to it? It’s a perfectly good question. The answer is that the codecs aren’t flexible enough. They don’t take that kind of input. Maybe they will in H.266, whenever that comes out, and a couple years later it’ll be supported on high-end devices.

So how would you do it now? Well, by writing a video compression and decompression algorithm that runs on AI accelerators many phones and computers have or will have very soon, and integrating scene and object detection in it from the get-go. Like Krisp.ai understanding what a voice is and isolating it without hyper-complex spectrum analysis, AI can make determinations like that with visual data incredibly fast and pass that on to the actual video compression part.

Image Credits: WaveOne

Variable and intelligent allocation of data means the compression process can be very efficient without sacrificing image quality. WaveOne claims to reduce the size of files by as much as half, with better gains in more complex scenes. When you’re serving videos hundreds of millions of times (or to a million people at once), even fractions of a percent add up, let alone gains of this size. Bandwidth doesn’t cost as much as it used to, but it still isn’t free.

Understanding the image (or being told) also lets the codec see what kind of content it is; a video call should prioritize faces if possible, of course, but a game streamer may want to prioritize small details, while animation requires yet another approach to minimize artifacts in its large single-color regions. This can all be done on the fly with an AI-powered compression scheme.

There are implications beyond consumer tech as well: A self-driving car, sending video between components or to a central server, could save time and improve video quality by focusing on what the autonomous system designates important — vehicles, pedestrians, animals — and not wasting time and bits on a featureless sky, trees in the distance, and so on.

Content-aware encoding and decoding is probably the most versatile and easy to grasp advantage WaveOne claims to offer, but Bourdev also noted that the method is much more resistant to disruption from bandwidth issues. It’s one of the other failings of traditional video codecs that missing a few bits can throw off the whole operation — that’s why you get frozen frames and glitches. But ML-based decoding can easily make a “best guess” based on whatever bits it has, so when your bandwidth is suddenly restricted you don’t freeze, just get a bit less detailed for the duration.

Example of different codecs compressing the same frame.

These benefits sound great, but as before the question is not “can we improve on the status quo?” (obviously we can) but “can we scale those improvements?”

“The road is littered with failed attempts to create cool new codecs,” admitted Bourdev. “Part of the reason for that is hardware acceleration; even if you came up with the best codec in the world, good luck if you don’t have a hardware accelerator that runs it. You don’t just need better algorithms, you need to be able to run them in a scalable way across a large variety of devices, on the edge and in the cloud.”

That’s why the special AI cores on the latest generation of devices is so important. This is hardware acceleration that can be adapted in milliseconds to a new purpose. And WaveOne happens to have been working for years on video-focused machine learning that will run on those cores, doing the work that H.26X accelerators have been doing for years, but faster and with far more flexibility.

Of course, there’s still the question of “standards.” Is it very likely that anyone is going to sign on to a single company’s proprietary video compression methods? Well, someone’s got to do it! After all, standards don’t come etched on stone tablets. And as Bourdev and Rippel explained, they actually are using standards — just not the way we’ve come to think of them.

Before, a “standard” in video meant adhering to a rigidly defined software method so that your app or device could work with standards-compatible video efficiently and correctly. But that’s not the only kind of standard. Instead of being a soup-to-nuts method, WaveOne is an implementation that adheres to standards on the ML and deployment side.

They’re building the platform to be compatible with all the major ML distribution and development publishers like TensorFlow, ONNX, Apple’s CoreML, and others. Meanwhile the models actually developed for encoding and decoding video will run just like any other accelerated software on edge or cloud devices: deploy it on AWS or Azure, run it locally with ARM or Intel compute modules, and so on.

It feels like WaveOne may be onto something that ticks all the boxes of a major b2b event: it invisibly improves things for customers, runs on existing or upcoming hardware without modification, saves costs immediately (potentially, anyhow) but can be invested in to add value.

Perhaps that’s why they managed to attract such a large seed round: $6.5 million, led by Khosla Ventures, with $1M each from Vela Partners and Incubate Fund, plus $650K from Omega Venture Partners and $350K from Blue Ivy.

Right now WaveOne is sort of in a pre-alpha stage, having demonstrated the technology satisfactorily but not built a full-scale product. The seed round, Rippel said, was to de-risk the technology, and while there’s still lots of R&D yet to be done, they’ve proven that the core offering works — building the infrastructure and API layers comes next and amounts to a totally different phase for the company. Even so, he said, they hope to get testing done and line up a few customers before they raise more money.

The future of the video industry may not look a lot like the last couple decades, and that could be a very good thing. No doubt we’ll be hearing more from WaveOne as it migrates from lab to product.

More TechCrunch

The TechCrunch team runs down all of the biggest news from the Apple WWDC 2024 keynote in an easy-to-skim digest.

Here’s everything Apple announced at the WWDC 2024 keynote, including Apple Intelligence, Siri makeover

Hello and welcome back to TechCrunch Space. What a week! In the same seven-day period, we watched Boeing’s Starliner launch astronauts to space for the first time, and then we…

TechCrunch Space: A week that will go down in history

Elon Musk’s posts seem to misunderstand the relationship Apple announced with OpenAI at WWDC 2024.

Elon Musk threatens to ban Apple devices from his companies over Apple’s ChatGPT integrations

“We’re looking forward to doing integrations with other models, including Google Gemini, for instance, in the future,” Federighi said during WWDC 2024.

Apple confirms plans to work with Google’s Gemini ‘in the future’

When Urvashi Barooah applied to MBA programs in 2015, she focused her applications around her dream of becoming a venture capitalist. She got rejected from every school, and was told…

How Urvashi Barooah broke into venture after everyone told her she couldn’t

Slack CEO Denise Dresser is speaking at TechCrunch Disrupt 2024.

Slack CEO Denise Dresser is coming to TechCrunch Disrupt this October

Apple kicked off its weeklong Worldwide Developers Conference (WWDC 2024) event today with the customary keynote at 1 p.m. ET/10 a.m. PT. The presentation focused on the company’s software offerings…

Watch the Apple Intelligence reveal, and the rest of WWDC 2024 right here

Apple’s SDKs (software development kits) have been updated with a variety of new APIs and frameworks.

Apple brings its GenAI ‘Apple Intelligence’ to developers, will let Siri control apps

Older iPhones or iPhone 15 users won’t be able to use these features.

Apple Intelligence features will be available on iPhone 15 Pro and devices with M1 or newer chips

Soon, Siri will be able to tap ChatGPT for “expertise” where it might be helpful, Apple says.

Apple brings ChatGPT to its apps, including Siri

Apple Intelligence will have an understanding of who you’re talking with in a messaging conversation.

Apple debuts AI-generated … Bitmoji

To use InSight, Apple TV+ subscribers can swipe down on their remote to bring up a display with actor names and character information in real time.

Apple TV+ introduces InSight, a new feature similar to Amazon’s X-Ray, at WWDC 2024

Siri is now more natural, more relevant and more personal — and it has new look.

Apple gives Siri an AI makeover

The company has been pushing the feature as integral to all of its various operating system offerings, including iOS, macOS and the latest, VisionOS.

Apple Intelligence is the company’s new generative AI offering

In addition to all the features you can find in the Passwords menu today, there’s a new column on the left that lets you more easily navigate your password collection.

Apple is launching its own password manager app

With Smart Script, Apple says it’s making handwriting your notes even smoother and straighter.

Smart Script in iPadOS 18 will clean up your handwriting when using an Apple Pencil

iOS’ perennial tips calculating app is finally coming to the larger screen.

Calculator for iPad does the math for you

The new OS, announced at WWDC 2024, will allow users to mirror their iPhone screen directly on their Mac and even control it.

With macOS Sequoia, you can mirror your iPhone on your Mac

At Apple’s WWDC 2024, the company announced MacOS Sequoia.

Apple unveils macOS Sequoia

“Messages via Satellite,” announced at Apple’s WWDC 2024 keynote, works much like the SOS feature does.

iPhones will soon text via satellite

Apple says the new design will lead to less time searching for photos.

Apple revamps its Photos app for iOS 18

Users will be able to lock an app when they hand over their phone.

iOS 18 will let you hide and lock apps

Apple’s WWDC 2024 keynote was packed, including a number of key new updates for iOS 18. One of the more interesting additions is Tap to Cash, which is more or…

Tap to Cash lets you pay by touching iPhones

In iOS 18, Apple will now support long-requested functionality, like the ability to set app icons and widgets wherever you want.

iOS 18 will finally let you customize your icons and unlock them from the grid

As expected, this is a pivotal moment for the mobile platform as iOS 18 is going to focus on artificial intelligence.

Apple unveils iOS 18 with tons of AI-powered features

Apple today kicked off what it promised would be a packed WWDC 2024 with a handful of visionOS announcements. At the top of the list is the ability to turn…

visionOS can now make spatial photos out of 3D images

The Apple Vision Pro is now available in eight new countries.

Apple to release Vision Pro in international markets

VisionOS 2 will come to Vision Pro as a free update later this year.

Apple debuts visionOS 2 at WWDC 2024

The security firm said the attacks targeting Snowflake customers is “ongoing,” suggesting the number of affected companies may rise.

Mandiant says hackers stole a ‘significant volume of data’ from Snowflake customers

French startup Kelvin, which uses computer vision and machine learning to make it easier to audit homes for energy efficiency, has raised $5.1M.

Kelvin wants to help save the planet by applying AI to home energy audits