Enterprise

This startup is setting a DALL-E 2-like AI free, consequences be damned

Comment

Stable Diffusion
Image Credits: Bryce Durbin / TechCrunch

DALL-E 2, OpenAI’s powerful text-to-image AI system, can create photos in the style of cartoonists, 19th century daguerreotypists, stop-motion animators and more. But it has an important, artificial limitation: a filter that prevents it from creating images depicting public figures and content deemed too toxic.

Now an open source alternative to DALL-E 2 is on the cusp of being released, and it’ll have few — if any — such content filters.

London- and Los Altos-based startup Stability AI this week announced the release of a DALL-E 2-like system, Stable Diffusion, to just over a thousand researchers ahead of a public launch in the coming weeks. A collaboration between Stability AI, media creation company RunwayML, Heidelberg University researchers and the research groups EleutherAI and LAION, Stable Diffusion is designed to run on most high-end consumer hardware, generating 512×512-pixel images in just a few seconds given any text prompt.

Stability AI Stable Diffusion
Stable Diffusion sample outputs. Image Credits: Stability AI

“Stable Diffusion will allow both researchers and soon the public to run this under a range of conditions, democratizing image generation,” Stability AI CEO and founder Emad Mostaque wrote in a blog post. “We look forward to the open ecosystem that will emerge around this and further models to truly explore the boundaries of latent space.”

But Stable Diffusion’s lack of safeguards compared to systems like DALL-E 2 poses tricky ethical questions for the AI community. Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms. And making the raw components of the system freely available leaves the door open to bad actors who could train them on subjectively inappropriate content, like pornography and graphic violence.

Creating Stable Diffusion

Stable Diffusion is the brainchild of Mostaque. Having graduated from Oxford with a Masters in mathematics and computer science, Mostaque served as an analyst at various hedge funds before shifting gears to more public-facing works. In 2019, he co-founded Symmitree, a project that aimed to reduce the cost of smartphones and internet access for people living in impoverished communities. And in 2020, Mostaque was the chief architect of Collective & Augmented Intelligence Against COVID-19, an alliance to help policymakers make decisions in the face of the pandemic by leveraging software.

He co-founded Stability AI in 2020, motivated both by a personal fascination with AI and what he characterized as a lack of “organization” within the open source AI community.

Stable Diffusion Obama
An image of former president Barack Obama created by Stable Diffusion. Image Credits: Stability AI

“Nobody has any voting rights except our 75 employees — no billionaires, big funds, governments or anyone else with control of the company or the communities we support. We’re completely independent,” Mostaque told TechCrunch in an email. “We plan to use our compute to accelerate open source, foundational AI.”

Mostaque says that Stability AI funded the creation of LAION 5B, an open source, 250-terabyte dataset containing 5.6 billion images scraped from the internet. (“LAION” stands for Large-scale Artificial Intelligence Open Network, a nonprofit organization with the goal of making AI, datasets and code available to the public.) The company also worked with the LAION group to create a subset of LAION 5B called LAION-Aesthetics, which contains 2 billion AI-filtered images ranked as particularly “beautiful” by testers of Stable Diffusion.

The initial version of Stable Diffusion was based on LAION-400M, the predecessor to LAION 5B, which was known to contain depictions of sex, slurs and harmful stereotypes. LAION-Aesthetics attempts to correct for this, but it’s too early to tell to what extent it’s successful.

Stable Diffusion
A collage of images created by Stable Diffusion. Image Credits: Stability AI

In any case, Stable Diffusion builds on research incubated at OpenAI as well as Runway and Google Brain, one of Google’s AI R&D divisions. The system was trained on text-image pairs from LAION-Aesthetics to learn the associations between written concepts and images, like how the word “bird” can refer not only to bluebirds but parakeets and bald eagles, as well as more abstract notions.

At runtime, Stable Diffusion — like DALL-E 2 — breaks the image generation process down into a process of “diffusion.” It starts with pure noise and refines an image over time, making it incrementally closer to a given text description until there’s no noise left at all.

Boris Johnson Stable Diffusion
Boris Johnson wielding various weapons, generated by Stable Diffusion. Image Credits: Stability AI

Stability AI used a cluster of 4,000 Nvidia A100 GPUs running in AWS to train Stable Diffusion over the course of a month. CompVis, the machine vision and learning research group at Ludwig Maximilian University of Munich, oversaw the training, while Stability AI donated the compute power.

Stable Diffusion can run on graphics cards with around 5GB of VRAM. That’s roughly the capacity of mid-range cards like Nvidia’s GTX 1660, priced around $230. Work is underway on bringing compatibility to AMD MI200’s data center cards and even MacBooks with Apple’s M1 chip (although in the case of the latter, without GPU acceleration, image generation will take as long as a few minutes).

“We have optimized the model, compressing the knowledge of over 100 terabytes of images,” Mosaque said. “Variants of this model will be on smaller datasets, particularly as reinforcement learning with human feedback and other techniques are used to take these general digital brains and make then even smaller and focused.”

Stability AI Stable Diffusion
Samples from Stable Diffusion. Image Credits: Stability AI

For the past few weeks, Stability AI has allowed a limited number of users to query the Stable Diffusion model through its Discord server, slowing increasing the number of maximum queries to stress-test the system. Stability AI says that more than 15,000 testers have used Stable Diffusion to create 2 million images a day.

Far-reaching implications

Stability AI plans to take a dual approach in making Stable Diffusion more widely available. It’ll host the model in the cloud behind tunable filters for specific content, allowing people to continue using it to generate images without having to run the system themselves. In addition, the startup will release what it calls “benchmark” models under a permissive license that can be used for any purpose — commercial or otherwise — as well as compute to train the models.

That will make Stability AI the first to release an image generation model nearly as high-fidelity as DALL-E 2. While other AI-powered image generators have been available for some time, including Midjourney, NightCafe and Pixelz.ai, none have open sourced their frameworks. Others, like Google and Meta, have chosen to keep their technologies under tight wraps, allowing only select users to pilot them for narrow use cases.

Stability AI will make money by training “private” models for customers and acting as a general infrastructure layer, Mostaque said — presumably with a sensitive treatment of intellectual property. The company claims to have other commercializable projects in the works, including AI models for generating audio, music and even video.

Stable Diffusion Harry Potter
Sand sculptures of Harry Potter and Hogwarts, generated by Stable Diffusion. Image Credits: Stability AI

“We will provide more details of our sustainable business model soon with our official launch, but it is basically the commercial open source software playbook: services and scale infrastructure,” Mostaque said. “We think AI will go the way of servers and databases, with open beating proprietary systems — particularly given the passion of our communities.”

With the hosted version of Stable Diffusion — the one available through Stability AI’s Discord server — Stability AI doesn’t permit every kind of image generation. The startup’s terms of service ban some lewd or sexual material (although not scantily-clad figures), hateful or violent imagery (such as antisemitic iconography, racist caricatures, misogynistic and misandrist propaganda), prompts containing copyrighted or trademarked material, and personal information like phone numbers and Social Security numbers. But while Stability AI has implemented a keyword filter in the server similar to OpenAI’s, which prevents the model from even attempting to generate an image that might violate the usage policy, it appears to be more permissive than most.

(A previous version of this article implied that Stability AI wasn’t using a keyword filter. That’s not the case; TechCrunch regrets the error.)

Stable Diffusion women
A Stable Diffusion generation, given the prompt: “very sexy woman with black hair, pale skin, in bikini, wet hair, sitting on the beach.” Image Credits: Stability AI

Stability AI also doesn’t have a policy against images with public figures. That presumably makes deepfakes fair game (and Renaissance-style paintings of famous rappers), though the model struggles with faces at times, introducing odd artifacts that a skilled Photoshop artist rarely would.

“Our benchmark models that we release are based on general web crawls and are designed to represent the collective imagery of humanity compressed into files a few gigabytes big,” Mostaque said. “Aside from illegal content, there is minimal filtering, and it is on the user to use it as they will.”

Stable Diffusion Hitler
An image of Hitler generated by Stable Diffusion. Image Credits: Stability AI

Potentially more problematic are the soon-to-be-released tools for creating custom and fine-tuned Stable Diffusion models. An “AI furry porn generator” profiled by Vice offers a preview of what might come; an art student going by the name of CuteBlack trained an image generator to churn out illustrations of anthropomorphic animal genitalia by scraping artwork from furry fandom sites. The possibilities don’t stop at pornography. In theory, a malicious actor could fine-tune Stable Diffusion on images of riots and gore, for instance, or propaganda.

Already, testers in Stability AI’s Discord server are using Stable Diffusion to generate a range of content disallowed by other image generation services, including images of the war in Ukraine, nude women, an imagined Chinese invasion of Taiwan and controversial depictions of religious figures like the Prophet Muhammad. Doubtless, some of these images are against Stability AI’s own terms, but the company is currently relying on the community to flag violations. Many bear the telltale signs of an algorithmic creation, like disproportionate limbs and an incongruous mix of art styles. But others are passable on first glance. And the tech will continue to improve, presumably.

Nude women Stability AI
Nude women generated by Stable Diffusion. Image Credits: Stability AI

Mostaque acknowledged that the tools could be used by bad actors to create “really nasty stuff,” and CompVis says that the public release of the benchmark Stable Diffusion model will “incorporate ethical considerations.” But Mostaque argues that — by making the tools freely available — it allows the community to develop countermeasures.

“We hope to be the catalyst to coordinate global open source AI, both independent and academic, to build vital infrastructure, models and tools to maximize our collective potential,” Mostaque said. “This is amazing technology that can transform humanity for the better and should be open infrastructure for all.”

Stability AI terrorist
A generation from Stable Diffusion, given the prompt “9/11 2.0 September 11th 2022 terrorist attack.”

Not everyone agrees, as evidenced by the controversy over “GPT-4chan,” an AI model trained on one of 4chan’s infamously toxic discussion boards. AI researcher Yannic Kilcher made GPT-4chan — which learned to output racist, antisemitic and misogynist hate speech — available earlier this year on Hugging Face, a hub for sharing trained AI models. Following discussions on social media and Hugging Face’s comment section, the Hugging Face team first “gated” access to the model before removing it altogether, but not before it was downloaded more than a thousand times.

War in Ukraine Stability AI
“War in Ukraine” images generated by Stable Diffusion. Image Credits: Stability AI

Meta’s recent chatbot fiasco illustrates the challenge of keeping even ostensibly safe models from going off the rails. Just days after making its most advanced AI chatbot to date, BlenderBot 3, available on the web, Meta was forced to confront media reports that the bot made frequent antisemitic comments and repeated false claims about former U.S. President Donald Trump winning reelection two years ago.

The publisher of AI Dungeon, Latitude, encountered a similar content problem. Some players of the text-based adventure game, which is powered by OpenAI’s text-generating GPT-3 system, observed that it would sometimes bring up extreme sexual themes, including pedophelia — the result of fine-tuning on fiction stories with gratuitous sex. Facing pressure from OpenAI, Latitude implemented a filter and started automatically banning gamers for purposefully prompting content that wasn’t allowed.

BlenderBot 3’s toxicity came from biases in the public websites that were used to train it. It’s a well-known problem in AI — even when fed filtered training data, models tend to amplify biases like photo sets that portray men as executives and women as assistants. With DALL-E 2, OpenAI has attempted to combat this by implementing techniques, including dataset filtering, that help the model generate more “diverse” images. But some users claim that they’ve made the model less accurate than before at creating images based on certain prompts.

Stable Diffusion contains little in the way of mitigations besides training dataset filtering. So what’s to prevent someone from generating, say, photorealistic images of protests, pornographic pictures of underage actors, “evidence” of fake moon landings and general misinformation? Nothing really. But Mostaque says that’s the point.

Stable Diffusion protest
Given the prompt “protests against the dilma government, brazil [sic],” Stable Diffusion created this image. Image Credits: Stability AI
“A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society … We are taking significant safety measures including formulating cutting-edge tools to help mitigate potential harms across release and our own services. With hundreds of thousands developing on this model, we are confident the net benefit will be immensely positive and as billions use this tech harms will be negated.”

Note: While the images in this article are credited to Stability AI, the company’s terms make it clear that generated images belong to the users who prompted them. In other words, Stability AI doesn’t assert rights over images created by Stable Diffusion.

More TechCrunch

The National Democratic Alliance (NDA) has emerged victorious in India’s 2024 general election, but with a smaller majority compared to 2019. According to post-election analysis by Goldman Sachs, JP Morgan,…

Modi-led coalition’s election win signals policy continuity in India – but also spending cuts

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

12 hours ago
A comprehensive list of 2024 tech layoffs

Featured Article

What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

Apple is hoping to make WWDC 2024 memorable as it finally spells out its generative AI plans.

13 hours ago
What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

We just announced the breakout session winners last week. Now meet the roundtable sessions that really “rounded” out the competition for this year’s Disrupt 2024 audience choice program. With five…

The votes are in: Meet the Disrupt 2024 audience choice roundtable winners

The malicious attack appears to have involved malware transmitted through TikTok’s DMs.

TikTok acknowledges exploit targeting high-profile accounts

It’s unusual for three major AI providers to all be down at the same time, which could signal a broader infrastructure issues or internet-scale problem.

AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time

Welcome to TechCrunch Fintech! This week, we’re looking at LoanSnap’s woes, Nubank’s and Monzo’s positive milestones, a plethora of fintech fundraises and more! To get a roundup of TechCrunch’s biggest…

A look at LoanSnap’s troubles and which neobanks are having a moment

Databricks, the analytics and AI giant, has acquired data management company Tabular for an undisclosed sum. (CNBC reports that Databricks paid over $1 billion.) According to Tabular co-founder Ryan Blue,…

Databricks acquires Tabular to build a common data lakehouse standard

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

The next few weeks could be pivotal for Worldcoin, the controversial eyeball-scanning crypto venture co-founded by OpenAI’s Sam Altman, whose operations remain almost entirely shuttered in the European Union following…

Worldcoin faces pivotal EU privacy decision within weeks

OpenAI’s chatbot ChatGPT has been down for several users across the globe for the last few hours.

OpenAI fixes the issue that caused ChatGPT outage for several hours

True Fit, the AI-powered size-and-fit personalization tool, has offered its size recommendation solution to thousands of retailers for nearly 20 years. Now, the company is venturing into the generative AI…

True Fit leverages generative AI to help online shoppers find clothes that fit

Audio streaming service TuneIn is teaming up with Discord to bring free live radio to the platform. This is TuneIn’s first collaboration with a social platform and one that is…

Discord and TuneIn partner to bring live radio to the social platform

The early victors in the AI gold rush are selling the picks and shovels needed to develop and apply artificial intelligence. Just take a look at data-labeling startup Scale AI…

Scale AI founder Alexandr Wang is coming to Disrupt 2024

Try to imagine the number of parts that go into making a rocket engine. Now imagine requesting and comparing quotes for each of those parts, getting approvals to purchase the…

Engineer brothers found Forge to modernize hardware procurement

Raspberry Pi has released a $70 AI extension kit with a neural network inference accelerator that can be used for local inferencing, for the Raspberry Pi 5.

Raspberry Pi partners with Hailo for its AI extension kit

When Stacklet’s founders, Travis Stanfield and Kapil Thangavelu, came out of Capital One in 2020 to launch their startup, most companies weren’t all that concerned with constraining cloud costs. But…

Stacklet sees demand grow as companies take cloud cost control more seriously

Fivetran’s Managed Data Lake Service aims to remove the repetitive work of managing data lakes.

Fivetran launches a managed data lake service

Lance Riedel and Nigel Daley both spent decades in search discovery, but it was while working at Pinterest that they began trying to understand how to use search engines to…

How a couple of former Pinterest search experts caught Biz Stone’s attention

GetWhy helps businesses carry out market studies and extract insights from video-based interviews using AI.

GetWhy, a market research AI platform that extracts insights from video interviews, raises $34.5M

AI-powered virtual physical therapy platform Sword Health has seen its valuation soar 50% to $3 billion.

Sword Health raises $130M and its valuation soars to $3B

Jeffrey Katzenberg and Sujay Jaswa, along with three general partners, manage $1.5 billion in assets today through their Build, Venture and Seed strategies.

WndrCo officially gets into venture capital with fresh $460M across two funds

The startup targets the middle ground between platforms that offer rigid templates, and those that facilitate a full-control approach.

Storyblok raises $80M to add more AI to its ‘headless’ CMS aimed at non-technical people

The startup has been pursuing a ground-up redesign of a well-understood technology.

‘Star Wars’ lasers and waterfalls of molten salt: How Xcimer plans to make fusion power happen

Sēkr, a startup that offers a mobile app for outdoor enthusiasts and campers, is launching a new AI tool for planning road trips. The new tool, called Copilot, is available…

Travel app Sēkr can plan your next road trip with its new AI tool

Microsoft’s education-focused flavor of its cloud productivity suite, Microsoft 365 Education, is facing investigation in the European Union. Privacy rights nonprofit noyb has just lodged two complaints with Austria’s data…

Microsoft hit with EU privacy complaints over schools’ use of 365 Education suite

Since the shock of Russia’s 2022 invasion of Ukraine, solar energy has been having a moment in Europe. Electricity prices have been going up while the investment required to get…

Samara is accelerating the energy transition in Spain one solar panel at a time

Featured Article

DEI backlash: Stay up-to-date on the latest legal and corporate challenges

It’s clear that this year will be a turning point for DEI.

1 day ago
DEI backlash: Stay up-to-date on the latest legal and corporate challenges

The keynote will be focused on Apple’s software offerings and the developers that power them, including the latest versions of iOS, iPadOS, macOS, tvOS, visionOS and watchOS.

Watch Apple kick off WWDC 2024 right here

Hello and welcome back to TechCrunch Space. Unfortunately, Boeing’s Starliner launch was delayed yet again, this time due to issues with one of the three redundant computers used by United…

TechCrunch Space: China’s victory