Enterprise

Exafunction aims to reduce AI dev costs by abstracting away hardware

Comment

Futuristic digital blockchain background. Abstract connections technology and digital network. 3d illustration of the Big data and communications technology.
Image Credits: v_alex / Getty Images

The most sophisticated AI systems today are capable of impressive feats, from directing cars through city streets to writing human-like prose. But they share a common bottleneck: hardware. Developing systems on the bleeding edge often requires a huge amount of computing power. For example, creating DeepMind’s protein structure-predicting AlphaFold took a cluster of hundreds of GPUs. Further underlining the challenge, one source estimates that developing AI startup OpenAI’s language-generating GPT-3 system using a single GPU would’ve taken 355 years.

New techniques and chips designed to accelerate certain aspects of AI system development promise to (and, indeed, already have) cut hardware requirements. But developing with these techniques calls for expertise that can be tough for smaller companies to come by. At least, that’s the assertion of Varun Mohan and Douglas Chen, the co-founders of infrastructure startup Exafunction. Emerging from stealth today, Exafunction is developing a platform to abstract away the complexity of using hardware to train AI systems.

“Improvements [in AI] are often underpinned by large increases in … computational complexity. As a consequence, companies are forced to make large investments in hardware to realize the benefits of deep learning. This is very difficult because the technology is improving so rapidly, and the workload size quickly increases as deep learning proves value within a company,” Chen told TechCrunch in an email interview. “The specialized accelerator chips necessary to run deep learning computations at scale are scarce. Efficiently using these chips also requires esoteric knowledge uncommon among deep learning practitioners.”

With $28 million in venture capital, $25 million of which came from a Series A round led by Greenoaks with participation from Founders Fund, Exafunction aims to address what it sees as the symptom of the expertise shortage in AI: idle hardware. GPUs and the aforementioned specialized chips used to “train” AI systems — i.e., feed the data that the systems can use to make predictions — are frequently underutilized. Because they complete some AI workloads so quickly, they sit idle while they wait for other components of the hardware stack, like processors and memory, to catch up.

Lukas Beiwald, the founder of AI development platform Weights and Biases, reports that nearly a third of his company’s customers average less than 15% GPU utilization. Meanwhile, in a 2021 survey commissioned by Run:AI, which competes with Exafunction, just 17% of companies said that they were able to achieve “high utilization” of their AI resources while 22% said that their infrastructure mostly sits idle.

The costs add up. According to Run:AI, 38% of companies had an annual budget for AI infrastructure — including hardware, software and cloud fees — exceeding $1 million as of October 2021. OpenAI is estimated to have spent $4.6 million training GPT-3.

“Most companies operating in deep learning go into business so they can focus on their core technology, not to spend their time and bandwidth worrying about optimizing resources,” Mohan said via email. “We believe there is no meaningful competitor that addresses the problem that we’re focused on, namely, abstracting away the challenges of managing accelerated hardware like GPUs while delivering superior performance to customers.”

Seed of an idea

Prior to co-founding Exafunction, Chen was a software engineer at Facebook, where he helped to build the tooling for devices like the Oculus Quest. Mohan was a tech lead at autonomous delivery startup Nuro responsible for managing the company’s autonomy infrastructure teams.

“As our deep learning workloads [at Nuro] grew in complexity and demandingness, it became apparent that there was no clear solution to scale our hardware accordingly,” Mohan said. “Simulation is a weird problem. Perhaps paradoxically, as your software improves, you need to simulate even more iterations in order to find corner cases. The better your product, the harder you have to search to find fallibilities. We learned how difficult this was the hard way and spent thousands of engineering hours trying to squeeze more performance out of the resources we had.”

Exafunction
Image Credits: Exafunction

Exafunction customers connect to the company’s managed service or deploy Exafunction’s software in a Kubernetes cluster. The technology dynamically allocates resources, moving computation onto “cost-effective hardware” such as spot instances when available.

Mohan and Chen demurred when asked about the Exafunction platform’s inner workings, preferring to keep those details under wraps for now. But they explained that, at a high level, Exafunction leverages virtualization to run AI workloads even with limited hardware availability, ostensibly leading to better utilization rates while lowering costs.

Exafunction’s reticence to reveal information about its technology — including whether it supports cloud-hosted accelerator chips like Google’s tensor processing units (TPUs) — is cause for some concern. But to allay doubts, Mohan, without naming names, said that Exafunction is already managing GPUs for “some of the most sophisticated autonomous vehicle companies and organizations at the cutting edge of computer vision.”

“Exafunction provides a platform that decouples workloads from acceleration hardware like GPUs, ensuring maximally efficient utilization — lowering costs, accelerating performance, and allowing companies to fully benefit from hardware …  [The] platform lets teams consolidate their work on a single platform, without the challenges of stitching together a disparate set of software libraries,” he added. “We expect that [Exafunction’s product] will be profoundly market-enabling, doing for deep learning what AWS did for cloud computing.”

Growing market

Mohan might have grandiose plans for Exafunction, but the startup isn’t the only one applying the concept of “intelligent” infrastructure allocation to AI workloads. Beyond Run:AI — whose product also creates an abstraction layer to optimize AI workloads — Grid.ai offers software that allows data scientists to train AI models across hardware in parallel. For its part, Nvidia sells AI Enterprise, a suite of tools and frameworks that lets companies virtualize AI workloads on Nvidia-certified servers. 

But Mohan and Chen see a massive addressable market despite the crowdedness. In conversation, they positioned Exafunction’s subscription-based platform not only as a way to bring down barriers to AI development but to enable companies facing supply chain constraints to “unlock more value” from hardware on hand. (In recent years, for a range of different reasons, GPUs have become hot commodities.) There’s always the cloud, but, to Mohan’s and Chen’s point, it can drive up costs. One estimate found that training an AI model using on-premises hardware is up to 6.5x cheaper than the least costly cloud-based alternative.

“While deep learning has virtually endless applications, two of the ones we’re most excited about are autonomous vehicle simulation and video inference at scale,” Mohan said. “Simulation lies at the heart of all software development and validation in the autonomous vehicle industry … Deep learning has also led to exceptional progress in automated video processing, with applications across a diverse range of industries. [But] though GPUs are essential to autonomous vehicle companies, their hardware is frequently underutilized, despite their price and scarcity. [Computer vision applications are] also computationally demanding, [because] each new video stream effectively represents a firehose of data — with each camera outputting millions of frames per day.”

Mohan and Chen say that the capital from the Series A will be put toward expanding Exafunction’s team and “deepening” the product. The company will also invest in optimizing AI system runtimes “for the most latency-sensitive applications” (e.g., autonomous driving and computer vision).

“While currently we are a strong and nimble team focused primarily on engineering, we expect to rapidly build the size and capabilities of our org in 2022,” Mohan said. “Across virtually every industry, it is clear that as workloads grow more complex (and a growing number of companies wish to leverage deep-learning insights), demand for compute is vastly exceeding [supply]. While the pandemic has highlighted these concerns, this phenomenon, and its related bottlenecks, is poised to grow more acute in the years to come, especially as cutting-edge models become exponentially more demanding.”

More TechCrunch

What a wild week for transportation news! It was a smorgasbord of news that seemed to touch every sector and theme in transportation.

Tesla keeps cutting jobs and the feds probe Waymo

Sony Music Group has sent letters to more than 700 tech companies and music streaming services to warn them not to use its music to train AI without explicit permission.…

Sony Music warns tech companies over ‘unauthorized’ use of its content to train AI

Winston Chi, Butter’s founder and CEO, told TechCrunch that “most parties, including our investors and us, are making money” from the exit.

GrubMarket buys Butter to give its food distribution tech an AI boost

The investor lawsuit is related to Bolt securing a $30 million personal loan to Ryan Breslow, which was later defaulted on.

Bolt founder Ryan Beslow wants to settle an investor lawsuit by returning $37 million worth of shares

Meta, the parent company of Facebook, launched an enterprise version of the prominent social network in 2015. It always seemed like a stretch for a company built on a consumer…

With the end of Workspace, it’s fair to wonder if Meta was ever serious about the enterprise

X, formerly Twitter, turned TweetDeck into X Pro and pushed it behind a paywall. But there is a new column-based social media tool in the town, and it’s from Instagram…

Meta Threads is testing pinned columns on the web, similar to the old TweetDeck

As part of 2024’s Accessibility Awareness Day, Google is showing off some updates to Android that should be useful to folks with mobility or vision impairments. Project Gameface allows gamers…

Google expands hands-free and eyes-free interfaces on Android

A hacker listed the data allegedly breached from Samco on a known cybercrime forum.

Hacker claims theft of India’s Samco account data

A top European privacy watchdog is investigating following the recent breaches of Dell customers’ personal information, TechCrunch has learned.  Ireland’s Data Protection Commission (DPC) deputy commissioner Graham Doyle confirmed to…

Ireland privacy watchdog confirms Dell data breach investigation

Ampere and Qualcomm aren’t the most obvious of partners. Both, after all, offer Arm-based chips for running data center servers (though Qualcomm’s largest market remains mobile). But as the two…

Ampere teams up with Qualcomm to launch an Arm-based AI server

At Google’s I/O developer conference, the company made its case to developers – and to some extent, consumers –  why its bets on AI are ahead of rivals. At the…

Google I/O was an AI evolution, not a revolution

TechCrunch Disrupt has always been the ultimate convergence point for all things startup and tech. In the bustling world of innovation, it serves as the “big top” tent, where entrepreneurs,…

Meet the Magnificent Six: A tour of the stages at Disrupt 2024

There’s apparently a lot of demand for an on-demand handyperson. Khosla Ventures and Pear VC have just tripled down on their investment in Honey Homes, which offers up a dedicated…

Khosla Ventures, Pear VC triple down on Honey Homes, a smart way to hire a handyman

TikTok is testing the ability for users to upload 60-minute videos, the company confirmed to TechCrunch on Thursday. The feature is available to a limited group of users in select…

TikTok tests 60-minute video uploads as it continues to take on YouTube

Flock Safety is a multibillion-dollar startup that’s got eyes everywhere. As of Wednesday, with the company’s new Solar Condor cameras, those eyes are solar-powered and using wireless 5G networks to…

Flock Safety’s solar-powered cameras could make surveillance more widespread

Since he was very young, Bar Mor knew that he would inevitably do something with real estate. His family was involved in all types of real estate projects, from ground-up…

Agora raises $34M Series B to keep building the Carta for real estate

Poshmark, the social commerce site that lets people buy and sell new and used items to each other, launched a paid marketing tool on Thursday, giving sellers the ability to…

Poshmark’s ‘Promoted Closet’ tool lets sellers boost all their listings at once

Google is launching a Gemini add-on for educational institutes through Google Workspace.

Google adds Gemini to its Education suite

More money for the generative AI boom: Y Combinator-backed developer infrastructure startup Recall.ai announced Thursday it has raised a $10 million Series A funding round, bringing its total raised to over…

YC-backed Recall.ai gets $10M Series A to help companies use virtual meeting data

Engineers Adam Keating and Jeremy Andrews were tired of using spreadsheets and screenshots to collab with teammates — so they launched a startup, Colab, to build a better way. The…

Colab’s collaborative tools for engineers line up $21M in new funding

Reddit announced on Wednesday that it is reintroducing its awards system after shutting down the program last year. The company said that most of the mechanisms related to awards will…

Reddit reintroduces its awards system

Sigma Computing, a startup building a range of data analytics and business intelligence tools, has raised $200 million in a fresh VC round.

Sigma is building a suite of collaborative data analytics tools

European Union enforcers of the bloc’s online governance regime, the Digital Services Act (DSA), said Thursday they’re closely monitoring disinformation campaigns on the Elon Musk-owned social network X (formerly Twitter)…

EU ‘closely’ monitoring X in wake of Fico shooting as DSA disinfo probe rumbles on

Wind is the largest source of renewable energy in the U.S., according to the U.S. Energy Information Administration, but wind farms come with an environmental cost as wind turbines can…

Spoor uses AI to save birds from wind turbines

The key to taking on legacy players in the financial technology industry may be to go where they have not gone before. That’s what Chicago-based Aeropay is doing. The provider…

Cannabis industry and gaming payments startup Aeropay is now offering an alternative to Mastercard and Visa

Facebook and Instagram are under formal investigation in the European Union over child protection concerns, the Commission announced Thursday. The proceedings follow a raft of requests for information to parent…

EU opens child safety probes of Facebook and Instagram, citing addictive design concerns

Bedrock Materials is developing a new type of sodium-ion battery, which promises to be dramatically cheaper than lithium-ion.

Forget EVs: Why Bedrock Materials is targeting gas-powered cars for its first sodium-ion batteries

Private equity giant Thoma Bravo has announced that its security information and event management (SIEM) company LogRhythm will be merging with Exabeam, a rival cybersecurity company backed by the likes…

Thoma Bravo’s LogRhythm merges with Exabeam in more cybersecurity consolidation

Consumer protection groups around the European Union have filed coordinated complaints against Temu, accusing the Chinese-owned, ultra low-cost e-commerce platform of a raft of breaches related to the bloc’s Digital…

Temu accused of breaching EU’s DSA in bundle of consumer complaints

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced