Enterprise

Exafunction aims to reduce AI dev costs by abstracting away hardware

Comment

Futuristic digital blockchain background. Abstract connections technology and digital network. 3d illustration of the Big data and communications technology.
Image Credits: v_alex / Getty Images

The most sophisticated AI systems today are capable of impressive feats, from directing cars through city streets to writing human-like prose. But they share a common bottleneck: hardware. Developing systems on the bleeding edge often requires a huge amount of computing power. For example, creating DeepMind’s protein structure-predicting AlphaFold took a cluster of hundreds of GPUs. Further underlining the challenge, one source estimates that developing AI startup OpenAI’s language-generating GPT-3 system using a single GPU would’ve taken 355 years.

New techniques and chips designed to accelerate certain aspects of AI system development promise to (and, indeed, already have) cut hardware requirements. But developing with these techniques calls for expertise that can be tough for smaller companies to come by. At least, that’s the assertion of Varun Mohan and Douglas Chen, the co-founders of infrastructure startup Exafunction. Emerging from stealth today, Exafunction is developing a platform to abstract away the complexity of using hardware to train AI systems.

“Improvements [in AI] are often underpinned by large increases in … computational complexity. As a consequence, companies are forced to make large investments in hardware to realize the benefits of deep learning. This is very difficult because the technology is improving so rapidly, and the workload size quickly increases as deep learning proves value within a company,” Chen told TechCrunch in an email interview. “The specialized accelerator chips necessary to run deep learning computations at scale are scarce. Efficiently using these chips also requires esoteric knowledge uncommon among deep learning practitioners.”

With $28 million in venture capital, $25 million of which came from a Series A round led by Greenoaks with participation from Founders Fund, Exafunction aims to address what it sees as the symptom of the expertise shortage in AI: idle hardware. GPUs and the aforementioned specialized chips used to “train” AI systems — i.e., feed the data that the systems can use to make predictions — are frequently underutilized. Because they complete some AI workloads so quickly, they sit idle while they wait for other components of the hardware stack, like processors and memory, to catch up.

Lukas Beiwald, the founder of AI development platform Weights and Biases, reports that nearly a third of his company’s customers average less than 15% GPU utilization. Meanwhile, in a 2021 survey commissioned by Run:AI, which competes with Exafunction, just 17% of companies said that they were able to achieve “high utilization” of their AI resources while 22% said that their infrastructure mostly sits idle.

The costs add up. According to Run:AI, 38% of companies had an annual budget for AI infrastructure — including hardware, software and cloud fees — exceeding $1 million as of October 2021. OpenAI is estimated to have spent $4.6 million training GPT-3.

“Most companies operating in deep learning go into business so they can focus on their core technology, not to spend their time and bandwidth worrying about optimizing resources,” Mohan said via email. “We believe there is no meaningful competitor that addresses the problem that we’re focused on, namely, abstracting away the challenges of managing accelerated hardware like GPUs while delivering superior performance to customers.”

Seed of an idea

Prior to co-founding Exafunction, Chen was a software engineer at Facebook, where he helped to build the tooling for devices like the Oculus Quest. Mohan was a tech lead at autonomous delivery startup Nuro responsible for managing the company’s autonomy infrastructure teams.

“As our deep learning workloads [at Nuro] grew in complexity and demandingness, it became apparent that there was no clear solution to scale our hardware accordingly,” Mohan said. “Simulation is a weird problem. Perhaps paradoxically, as your software improves, you need to simulate even more iterations in order to find corner cases. The better your product, the harder you have to search to find fallibilities. We learned how difficult this was the hard way and spent thousands of engineering hours trying to squeeze more performance out of the resources we had.”

Exafunction
Image Credits: Exafunction

Exafunction customers connect to the company’s managed service or deploy Exafunction’s software in a Kubernetes cluster. The technology dynamically allocates resources, moving computation onto “cost-effective hardware” such as spot instances when available.

Mohan and Chen demurred when asked about the Exafunction platform’s inner workings, preferring to keep those details under wraps for now. But they explained that, at a high level, Exafunction leverages virtualization to run AI workloads even with limited hardware availability, ostensibly leading to better utilization rates while lowering costs.

Exafunction’s reticence to reveal information about its technology — including whether it supports cloud-hosted accelerator chips like Google’s tensor processing units (TPUs) — is cause for some concern. But to allay doubts, Mohan, without naming names, said that Exafunction is already managing GPUs for “some of the most sophisticated autonomous vehicle companies and organizations at the cutting edge of computer vision.”

“Exafunction provides a platform that decouples workloads from acceleration hardware like GPUs, ensuring maximally efficient utilization — lowering costs, accelerating performance, and allowing companies to fully benefit from hardware …  [The] platform lets teams consolidate their work on a single platform, without the challenges of stitching together a disparate set of software libraries,” he added. “We expect that [Exafunction’s product] will be profoundly market-enabling, doing for deep learning what AWS did for cloud computing.”

Growing market

Mohan might have grandiose plans for Exafunction, but the startup isn’t the only one applying the concept of “intelligent” infrastructure allocation to AI workloads. Beyond Run:AI — whose product also creates an abstraction layer to optimize AI workloads — Grid.ai offers software that allows data scientists to train AI models across hardware in parallel. For its part, Nvidia sells AI Enterprise, a suite of tools and frameworks that lets companies virtualize AI workloads on Nvidia-certified servers. 

But Mohan and Chen see a massive addressable market despite the crowdedness. In conversation, they positioned Exafunction’s subscription-based platform not only as a way to bring down barriers to AI development but to enable companies facing supply chain constraints to “unlock more value” from hardware on hand. (In recent years, for a range of different reasons, GPUs have become hot commodities.) There’s always the cloud, but, to Mohan’s and Chen’s point, it can drive up costs. One estimate found that training an AI model using on-premises hardware is up to 6.5x cheaper than the least costly cloud-based alternative.

“While deep learning has virtually endless applications, two of the ones we’re most excited about are autonomous vehicle simulation and video inference at scale,” Mohan said. “Simulation lies at the heart of all software development and validation in the autonomous vehicle industry … Deep learning has also led to exceptional progress in automated video processing, with applications across a diverse range of industries. [But] though GPUs are essential to autonomous vehicle companies, their hardware is frequently underutilized, despite their price and scarcity. [Computer vision applications are] also computationally demanding, [because] each new video stream effectively represents a firehose of data — with each camera outputting millions of frames per day.”

Mohan and Chen say that the capital from the Series A will be put toward expanding Exafunction’s team and “deepening” the product. The company will also invest in optimizing AI system runtimes “for the most latency-sensitive applications” (e.g., autonomous driving and computer vision).

“While currently we are a strong and nimble team focused primarily on engineering, we expect to rapidly build the size and capabilities of our org in 2022,” Mohan said. “Across virtually every industry, it is clear that as workloads grow more complex (and a growing number of companies wish to leverage deep-learning insights), demand for compute is vastly exceeding [supply]. While the pandemic has highlighted these concerns, this phenomenon, and its related bottlenecks, is poised to grow more acute in the years to come, especially as cutting-edge models become exponentially more demanding.”

More TechCrunch

Over half of Americans wear corrective glasses or contact lenses. While there isn’t a shortage of low-cost and luxury frames available online or in stores, consumers can only buy them…

Eyebot raised $6M for AI-powered kiosks that provide 90-second eye exams without optometrist

Google on Thursday said it is rolling out NotebookLM, its AI-powered note-taking assistant, to over 200 new countries, nearly six months after opening its access in the U.S. The platform,…

Google’s updated AI-powered NotebookLM expands to India, UK and over 200 other countries

Inflation and currency devaluation have always been a growing concern for Africans with bank accounts.

Starting in war-torn Sudan, YC-backed Elevate now provides fintech to freelancers globally

Featured Article

Amazon buys Indian video streaming service MX Player

Amazon has agreed to acquire key assets of Indian video streaming service MX Player from the local media powerhouse Times Internet, the latest step by the e-commerce giant to make its services and brand popular in smaller cities and towns in the key overseas market.  The two firms reached a…

4 hours ago
Amazon buys Indian video streaming service MX Player

Dealt is now building a service platform for retailers instead of end customers.

Dealt turns retailers into service providers and proves that pivots sometimes work

Snowflake is the latest company in a string of high-profile security incidents and sizable data breaches caused by the lack of MFA.

Hundreds of Snowflake customer passwords found online are linked to info-stealing malware

The buy will benefit ChromeOS, Google’s lightweight Linux-based operating system, by giving ChromeOS users greater access to Windows apps “without the hassle of complex installations or updates.”

Google acquires Cameyo to bring Windows apps to ChromeOS

Mistral is no doubt looking to grow revenue as it faces considerable — and growing — competition in the generative AI space.

Mistral launches new services and SDK to let customers fine-tune its models

The warning for the Ai Pin was issued “out of an abundance of caution,” according to Humane.

Humane urges customers to stop using charging case, citing battery fire concerns

The keynote will be focused on Apple’s software offerings and the developers that power them, including the latest versions of iOS, iPadOS, macOS, tvOS, visionOS and watchOS.

Watch Apple kick off WWDC 2024 right here

As WWDC 2024 nears, all sorts of rumors and leaks have emerged about what iOS 18 and its AI-powered apps and features have in store.

What to expect from Apple’s AI-powered iOS 18 at WWDC 2024

Welcome to Elon Musk’s X. The social network formerly known as Twitter where the rules are made up and the check marks don’t matter. Or do they? The Tesla and…

Elon Musk’s X: A complete timeline of what Twitter has become

TechCrunch has kept readers informed regarding Fearless Fund’s courtroom battle to provide business grants to Black women. Today, we are happy to announce that Fearless Fund CEO and co-founder Arian…

Fearless Fund’s Arian Simone coming to Disrupt 2024

Bridgy Fed is one of the efforts aimed at connecting the fediverse with the web, Bluesky and, perhaps later, other networks like Nostr.

Bluesky and Mastodon users can now talk to each other with Bridgy Fed

Zoox, Amazon’s self-driving unit, is bringing its autonomous vehicles to more cities.  The self-driving technology company announced Wednesday plans to begin testing in Austin and Miami this summer. The two…

Zoox to test self-driving cars in Austin and Miami 

Called Stable Audio Open, the generative model takes a text description and outputs a recording up to 47 seconds in length.

Stability AI releases a sound generator

It’s not just instant-delivery startups that are struggling. Oda, the Norway-based online supermarket delivery startup, has confirmed layoffs of 150 jobs as it drastically scales back its expansion ambitions to…

SoftBank-backed grocery startup Oda lays off 150, resets focus on Norway and Sweden

Newsletter platform Substack is introducing the ability for writers to send videos to their subscribers via Chat, its private community feature, the company announced on Wednesday. The rollout of video…

Substack brings video to its Chat feature

Hiya, folks, and welcome to TechCrunch’s inaugural AI newsletter. It’s truly a thrill to type those words — this one’s been long in the making, and we’re excited to finally…

This Week in AI: Ex-OpenAI staff call for safety and transparency

Ms. Rachel isn’t a household name, but if you spend a lot of time with toddlers, she might as well be a rockstar. She’s like Steve from Blues Clues for…

Cameo fumbles on Ms. Rachel fundraiser as fans receive credits instead of videos  

Cartwheel helps animators go from zero to basic movement, so creating a scene or character with elementary motions like taking a step, swatting a fly or sitting down is easier.

Cartwheel generates 3D animations from scratch to power up creators

The new tool, which is set to arrive in Wix’s app builder tool this week, guides users through a chatbot-like interface to understand the goals, intent and aesthetic of their…

Wix’s new tool taps AI to generate smartphone apps

ClickUp Knowledge Management combines a new wiki-like editor and with a new AI system that can also bring in data from Google Drive, Dropbox, Confluence, Figma and other sources.

ClickUp wants to take on Notion and Confluence with its new AI-based Knowledge Base

New York City, home to over 60,000 gig delivery workers, has been cracking down on cheap, uncertified e-bikes that have resulted in battery fires across the city.  Some e-bike providers…

Whizz wants to own the delivery e-bike subscription space, starting with NYC

This is the last major step before Starliner can be certified as an operational crew system, and the first Starliner mission is expected to launch in 2025. 

Boeing’s Starliner astronaut capsule is en route to the ISS 

TechCrunch Disrupt 2024 in San Francisco is the must-attend event for startup founders aiming to make their mark in the tech world. This year, founders have three exciting ways to…

Three ways founders can shine at TechCrunch Disrupt 2024

Google’s newest startup program, announced on Wednesday, aims to bring AI technology to the public sector. The newly launched “Google for Startups AI Academy: American Infrastructure” will offer participants hands-on…

Google’s new startup program focuses on bringing AI to public infrastructure

eBay’s newest AI feature allows sellers to replace image backgrounds with AI-generated backdrops. The tool is now available for iOS users in the U.S., U.K., and Germany. It’ll gradually roll…

eBay debuts AI-powered background tool to enhance product images

If you’re anything like me, you’ve tried every to-do list app and productivity system, only to find yourself giving up sooner rather than later because managing your productivity system becomes…

Hoop uses AI to automatically manage your to-do list

Asana is using its work graph to train LLMs with the goal of creating AI assistants that work alongside human employees in company workflows.

Asana introduces ‘AI teammates’ designed to work alongside human employees