Enterprise

Synthesis AI raises $17M to generate synthetic data for computer vision

Comment

The world of big data is seen in this complex and vibrantly colored visual representation of data.
Image Credits: John Lund / Getty Images

Synthesis AI, a startup developing a platform that generates synthetic data to train AI systems, today announced that it raised $17 million in a Series A funding round led by 468 Capital with participation from Sorenson Ventures and Strawberry Creek Ventures, Bee Partners, PJC, iRobot Ventures, Boom Capital and Kubera Venture Capital. CEO and founder Yashar Behzadi says that the proceeds will be put toward product R&D, growing the company’s team and expanding research — particularly in the area of mixed real and synthetic data.

Synthetic data, or data that’s created artificially rather than captured from the real world, is coming into wider use in data science as the demand for AI systems grows. The benefits are obvious: While collecting real-world data to develop an AI system is costly and labor intensive, a theoretically infinite amount of synthetic data can be generated to fit any criteria. For example, a developer could use synthetic images of cars and other vehicles to develop a system that can differentiate between makes and models.

Unsurprisingly, Gartner predicts that 60% of the data used for the de­vel­op­ment of AI and an­a­lyt­ics projects will be synthetic by 2024. One survey called the use of synthetic data “one of the most promising general techniques on the rise in [AI].”

But synthetic data has limitations. While it can mimic many properties of real data, it isn’t an exact copy. And the quality of synthetic data is dependent on the quality of the algorithm that created it.

Behzadi, of course, asserts that Synthesis has taken meaningful steps toward overcoming these technical hurdles. A former scientist at IT government services firm SAIC and the creator of PopSlate, a smartphone case with a built-in E Ink display, Behzadi founded Synthesis in AI in 2019 with the goal of — in his words — “solving the data issue in AI and transform[ing] the computer vision paradigm.

“As companies develop new hardware, new models or expand their geographic and customer base, new training data is required to ensure models perform adequately,” Behzadi told TechCrunch via email. “Companies are also struggling with ethical issues related to model bias and consumer privacy in human-centered products. It is clear that a new paradigm is required to build the next generation of computer vision.”

In most AI systems, labels — which can come in the form of captions or annotations —  are used during the development process to “teach” the system to recognize certain objects. Teams normally have to painstakingly add labels to real-world images, but synthetic tools like Synthesis’ eliminate the need — in theory.

Synthesis’ cloud-based platform allows companies to generate synthetic image data with labels using a combination of AI, procedural generation and VFX rendering technologies. For customers developing algorithms to tackle challenges like recognizing faces and monitoring drivers, for instance, Synthesis generated roughly 100,000 “synthetic people” spanning different genders, ages, BMIs, skin tones and ethnicities. Through the platform, data scientists could customize the avatars’ poses as well as their hair, facial hair, apparel (e.g., masks and glasses), and environmental aspects like the lighting and even the “lens type” of the virtual camera.

“Leading companies in the AR, VR and metaverse space are using our diverse digital humans and accompanying rich set of 3D facial and body landmarks to build more realistic and emotive avatars,” Behzadi said. “[Meanwhile,] our smartphone and consumer device customers are using synthetic data to understand the performance of various camera modules … Several of our customers are building a car driver and occupant sensing system. They leveraged synthetic data of thousands of individuals in the car cabin across various situations and environments to determine the optimal camera placement and overall configuration to ensure the best performance.”

Synthesis AI
One of Synthesis AI’s digital avatars. Image Credits: Synthesis AI

Some of the domains that Synthesis endorses are controversial, it’s worth pointing out — like facial recognition and “emotion sensing.” Gender and racial biases are a well–documented phenomenon in facial analysis, attributable to shortcomings in the datasets used to train the algorithms. (Generally speaking, an algorithm developing using images of people with homogenous facial structures and colors will perform worse on “face types” to which it hasn’t been exposed.) Recent research highlights the consequences, showing that some production systems classify emotions expressed by Black people as more negative. Computer vision-powered tools like Zoom’s virtual backgrounds and Twitter’s automatic photo cropping, too, have historically disfavored people with darker skin.

But Behzadi is of the optimistic belief that Synthesis can reduce these biases by generating examples of data — e.g., diverse faces — that’d otherwise go uncollected. He also claims that Synthesis’ synthetic data confers privacy and fair use advantages, mainly in that it’s not tied to personally identifiable information (although some research disagrees) and isn’t copyrighted (unlike many of the images on the public web).

“In addition to creating more capable models, Synthesis is focused on the ethical development of AI by reducing bias, preserving privacy and democratizing access … [The platform] provides perfectly labeled data on-demand at orders of magnitude increased speed and reduced cost compared to human-in-the-loop labeling approaches,” Behzadi said. “AI is driven by high-quality labeled data. As the AI space shifts from model-centric to data-centric AI, data becomes the key competitive driving force.”

Indeed, synthetic data — depending on how it’s applied — has the potential to address many of the development challenges plaguing companies attempting to operationalize AI. Recently, MIT researchers found a way to classify images using synthetic data. Nvidia researchers have explored a way to use synthetic data created in virtual environments to train robots to pick up objects. And nearly every major autonomous vehicle company uses simulation data to supplement the real-world data they collect from cars on the road.

But again, not all synthetic data is created equal. Datasets need to be transformed in order to make them useable by the systems that create synthetic data, and assumptions made during the transformations can lead to undesirable results. A STAT report found that Watson Health, IBM’s beleaguered life sciences division, often gave poor and unsafe cancer treatment advice because the platform’s models were trained using erroneous, synthetic patient records rather than real data. And in a January 2020 study, researchers at Arizona State University showed that an AI system trained on a dataset of images of professors could create highly realistic synthetic faces — but synthetic faces that were mostly male and white, because it amplified biases contained in the original dataset.

Matthew Guzdial, an assistant computer science professor at the University of Alberta, points out that Synthesis’ own white paper acknowledges that training a model on synthetic data alone generally causes it to do a worse job.

“I don’t see anything that really stands out here [with Synthesis’ platform]. It’s pretty standard, synthetic-datawise. In some cases they’re able to use synthetic data in combination with real data to help a model usefully generalize,” he told TechCrunch via email. “[G]enerally I steer my students away from using synthetic data as I find that it’s too easy to introduce bias that actually makes your end model worse … Since synthetic data is generated in some algorithmic fashion (e.g., with a function), the easiest thing for a model to learn is to just replicate the behavior of that function, rather than the actual problem you’re trying to approximate.”

Image Credits: Synthesis AI

Robin Röhm, the co-founder of data analytics platform Apheris, argues that quality checks should be developed for every new synthetic dataset to prevent misuse. The party generating and validating the dataset must have specific knowledge about how the data will be applied, he says, or run the risk of creating an inaccurate — and possibly harmful — system.

Behzadi agrees in principle — but with an eye toward expanding the number of applications that Synthesis supports, beating back rivals like Mostly AI, Rendered.ai, YData, Datagen and Synthetaic. With over $24 million in financing and Fortune 50 customers in the consumer, metaverse and robotics spaces, Synthesis plans to launch new products targeting new and existing verticals including photo enhancement, teleconferencing, smart homes and smart assistants.

“With an unrivaled breadth and depth of representative human data, Synthesis AI has established itself as the go-to provider for production-level synthetic data … The company has delivered over 10 million labeled images to support the most advanced computer vision companies in the world,” Behzadi said. “Synthesis AI has 20 employees and will be scaling to 50 by the end of the year.”

More TechCrunch

As part of 2024’s Accessibility Awareness Day, Google is showing off some updates to Android that should be useful to folks with mobility or vision impairments. Project Gameface allows gamers…

Google’s expands hands-free and eyes-free interfaces on Android

A hacker listed the data allegedly breached from Samco on a known cybercrime forum.

Hacker claims theft of India’s Samco account data

A top European privacy watchdog is investigating following the recent breaches of Dell customers’ personal information, TechCrunch has learned.  Ireland’s Data Protection Commission (DPC) deputy commissioner Graham Doyle confirmed to…

Ireland privacy watchdog confirms Dell data breach investigation

Ampere and Qualcomm aren’t the most obvious of partners. Both, after all, offer Arm-based chips for running data center servers (though Qualcomm’s largest market remains mobile). But as the two…

Ampere teams up with Qualcomm to launch an Arm-based AI server

At Google’s I/O developer conference, the company made its case to developers – and to some extent, consumers –  why its bets on AI are ahead of rivals. At the…

Google I/O was an AI evolution, not a revolution

TechCrunch Disrupt has always been the ultimate convergence point for all things startup and tech. In the bustling world of innovation, it serves as the “big top” tent, where entrepreneurs,…

Meet the Magnificent Six: A tour of the stages at Disrupt 2024

There’s apparently a lot of demand for an on-demand handyperson. Khosla Ventures and Pear VC have just tripled down on their investment in Honey Homes, which offers up a dedicated…

Khosla Ventures, Pear VC triple down on Honey Homes, a smart way to hire a handyman

TikTok is testing the ability for users to upload 60-minute videos, the company confirmed to TechCrunch on Thursday. The feature is available to a limited group of users in select…

TikTok tests 60-minute video uploads as it continues to take on YouTube

Flock Safety is a multibillion-dollar startup that’s got eyes everywhere. As of Wednesday, with the company’s new Solar Condor cameras, those eyes are solar-powered and using wireless 5G networks to…

Flock Safety’s solar-powered cameras could make surveillance more widespread

Since he was very young, Bar Mor knew that he would inevitably do something with real estate. His family was involved in all types of real estate projects, from ground-up…

Agora raises $34M Series B to keep building the Carta for real estate

Poshmark, the social commerce site that lets people buy and sell new and used items to each other, launched a paid marketing tool on Thursday, giving sellers the ability to…

Poshmark’s ‘Promoted Closet’ tool lets sellers boost all their listings at once

Google is launching a Gemini add-on for educational institutes through Google Workspace.

Google adds Gemini to its Education suite

More money for the generative AI boom: Y Combinator-backed developer infrastructure startup Recall.ai announced Thursday it’s raised a $10 million Series A funding round, bringing its total raised to over $12M.…

YC-backed Recall.ai gets $10M Series A to help companies use virtual meeting data

Engineers Adam Keating and Jeremy Andrews were tired of using spreadsheets and screenshots to collab with teammates — so they launched a startup, Colab, to build a better way. The…

Colab’s collaborative tools for engineers line up $21M in new funding

Reddit announced on Wednesday that it is reintroducing its awards system after shutting down the program last year. The company said that most of the mechanisms related to awards will…

Reddit reintroduces its awards system

Sigma Computing, a startup building a range of data analytics and business intelligence tools, has raised $200 million in a fresh VC round.

Sigma is building a suite of collaborative data analytics tools

European Union enforcers of the bloc’s online governance regime, the Digital Services Act (DSA), said Thursday they’re closely monitoring disinformation campaigns on the Elon Musk-owned social network X (formerly Twitter)…

EU ‘closely’ monitoring X in wake of Fico shooting as DSA disinfo probe rumbles on

Wind is the largest source of renewable energy in the U.S., according to the U.S. Energy Information Administration, but wind farms come with an environmental cost as wind turbines can…

Spoor uses AI to save birds from wind turbines

The key to taking on legacy players in the financial technology industry may be to go where they have not gone before. That’s what Chicago-based Aeropay is doing. The provider…

Cannabis industry and gaming payments startup Aeropay is now offering an alternative to Mastercard and Visa

Facebook and Instagram are under formal investigation in the European Union over child protection concerns, the Commission announced Thursday. The proceedings follow a raft of requests for information to parent…

EU opens child safety probes of Facebook and Instagram, citing addictive design concerns

Bedrock Materials is developing a new type of sodium-ion battery, which promises to be dramatically cheaper than lithium-ion.

Forget EVs: Why Bedrock Materials is targeting gas-powered cars for its first sodium-ion batteries

Private equity giant Thoma Bravo has announced that its security information and event management (SIEM) company LogRhythm will be merging with Exabeam, a rival cybersecurity company backed by the likes…

Thoma Bravo’s LogRhythm merges with Exabeam in more cybersecurity consolidation

Consumer protection groups around the European Union have filed coordinated complaints against Temu, accusing the Chinese-owned ultra low-cost e-commerce platform of a raft of breaches related to the bloc’s Digital…

Temu accused of breaching EU’s DSA in bundle of consumer complaints

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

The AI industry moves faster than the rest of the technology sector, which means it outpaces the federal government by several orders of magnitude.

Senate study proposes ‘at least’ $32B yearly for AI programs

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday.  For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn