Startups

D-ID launches ‘Speaking Portrait,’ a way to turn photos into custom, photo-realistic videos

Comment

Image Credits: D-ID

The company whose tech powered the sensational MyHeritage app that turned classic family photos into lifelike moving portraits is back with a new implementation of its technology: Transforming still photographs into ultra-realistic video, capable of saying whatever you want.

D-ID’s Speaking Portraits may look like the notorious “deepfakes” that have made headlines over the past couple of years, but the underlying tech is actually quite different, and there’s no training required for basic functionality.

D-ID, which actually debuted at TechCrunch Battlefield in 2018 with a very different focus (scrambling facial recognition tech), debuted its new Speaking Portraits product live at TechCrunch Disrupt 2021. The company showed off a number of use cases, including using its new tech to create a multilingual TV anchor capable of expressing various emotions; creating virtual chatbot personas for customer support interactions; developing training courses for professional development use; and creating interactive conversational video ad kiosks.

Both this new product and D-ID’s partnership with MyHeritage, which saw the latter company’s app briefly take over the top of Apple’s App Store charts, are obviously major departures from the company’s initial focus. Up until even May of last year, D-ID was still raising funding based on its earlier approach, but its partnership with MyHeritage debuted in February, followed by a similar deal with GoodTrust after that and a splashy tie-up with Warner Bros. on the Hugh Jackman film “Reminiscence” that allowed fans to insert themselves into its trailer.

D-ID’s pivot might seem more dramatic than most, but from a technical perspective its new focus on bringing photos to life is not so far off from its de-identification software. D-ID CEO and co-founder Gil Perry told me that the company chose the new direction because it was apparent that there’s a very large addressable market when it comes to this kind of application.

Big-name clients like Warner Bros., as well as an App Store-dominating app from a relatively unknown brand, would seem to support that assessment. Speaking Portraits, however, is aimed at clients both big and small, and allows anyone to generate a full HD video from a source image, plus either recorded speech or typed text. D-ID is launching the product with support for English, Spanish and Japanese, but plans to add other languages in the future, too, as customers request support for those.

D-ID offers two basic categories of Speaking Portrait, including a “Single Portrait” that can be made using just a single still image, which features an animated head but other parts stay static. This one will also work with the existing background in the photo only.

For a bit more uncanny reality, there’s a “Trained Character” option that requires submitting a 10-minute training video of the character requested, following guidelines supplied by the company. This has the advantage of being able to work against a custom, swappable background, and features some preset animation options for the character’s body and hands.

Check out an example of a Speaking Portrait newscaster generated using the trained character method below to get a sense of how realistic it can be:

The demo that Perry showed us live at Disrupt today was created from a still photo of himself as a child. The photo was mapped to facial expressions performed by a sort of human puppeteer who also voiced the script for what the Speaking Portrait version of Gil ended up saying during the interaction between his current and younger self. You can see a video of how the speaker’s expressions were mirrored by the animated photo below:

Obviously, the ability to create photo-realistic videos from just a single photo that can convincingly deliver any lines you want is a bit of a hair-raising prospect. We’ve already seen far-ranging debates about the ethics of deepfakes, as well as industry efforts to try to fingerprint and identify when AI generated realistic, but artificial, results.

Perry said at Disrupt that D-ID is “keen to make sure it’s used for good, not bad,” and that in order to achieve that, they’re going to be issuing a pledge at the end of October, alongside partners, that outline their commitments to “transparency and consent” when it comes to using tech like Speaking Portraits. The purpose of said commitment is to ensure that “users aren’t confused about what they’re seeing and that people involved give their consent.”

While D-ID wants to make assurances in its terms of use and public position on misuse of this kind of tech, Perry says it “can’t do it alone,” which is why he’s calling on others in the ecosystem to join forces in efforts to avoid abuse.

More TechCrunch

Amazon Web Services (AWS), Amazon’s cloud computing business, has confirmed further details of its European “sovereign cloud” which is designed to enable greater data residency across the region. The company…

AWS confirms European ‘sovereign cloud’ to launch in Germany by 2025, plans €7.8B investment over 15 years

Go Digit, an Indian insurance startup, has raised $141 million from investors including Goldman Sachs, ADIA, and Morgan Stanley as part of its IPO.

Indian insurance startup Go Digit raises $141M from anchor investors ahead of IPO

Peakbridge intends to invest in between 16 and 20 companies, investing around $10 million in each company. It has made eight investments so far.

Food VC Peakbridge has new $187M fund to transform future of food, like lab-made cocoa

For over six decades, the nonprofit has been active in the financial services sector.

Accion’s new $152.5M fund will back financial institutions serving small businesses globally

Meta’s newest social network, Threads is starting its own fact-checking program after piggybacking on Instagram and Facebook’s network for a few months. Instagram head Adam Mosseri noted that the company…

Threads finally starts its own fact-checking program

Looking Glass makes trippy-looking mixed-reality screens that make things look 3D without the need of special glasses. Today, it launches a pair of new displays, including a 16-inch mode that…

Looking Glass launches new 3D displays

Replacing Sutskever is Jakub Pachocki, OpenAI’s director of research.

Ilya Sutskever, OpenAI co-founder and longtime chief scientist, departs

Intuitive Machines made history when it became the first private company to land a spacecraft on the moon, so it makes sense to adapt that tech for Mars.

Intuitive Machines wants to help NASA return samples from Mars

As Google revamps itself for the AI era, offering AI overviews within its search results, the company is introducing a new way to filter for just text-based links. With the…

Google adds ‘Web’ search filter for showing old-school text links as AI rolls out

Blue Origin’s New Shepard rocket will take a crew to suborbital space for the first time in nearly two years later this month, the company announced on Tuesday.  The NS-25…

Blue Origin to resume crewed New Shepard launches on May 19

This will enable developers to use the on-device model to power their own AI features.

Google is building its Gemini Nano AI model into Chrome on the desktop

It ran 110 minutes, but Google managed to reference AI a whopping 121 times during Google I/O 2024 (by its own count). CEO Sundar Pichai referenced the figure to wrap…

Google mentioned ‘AI’ 120+ times during its I/O keynote

Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.

Google launches Firebase Genkit, a new open source framework for building AI-powered apps

In the coming months, Google says it will open up the Gemini Nano model to more developers.

Patreon and Grammarly are already experimenting with Gemini Nano, says Google

As part of the update, Reddit also launched a dedicated AMA tab within the web post composer.

Reddit introduces new tools for ‘Ask Me Anything,’ its Q&A feature

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

LearnLM is already powering features across Google products, including in YouTube, Google’s Gemini apps, Google Search and Google Classroom.

LearnLM is Google’s new family of AI models for education

The official launch comes almost a year after YouTube began experimenting with AI-generated quizzes on its mobile app. 

Google is bringing AI-generated quizzes to academic videos on YouTube

Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company.  Earlier this week, TechCrunch reported…

Motional cut about 550 employees, around 40%, in recent restructuring, sources say

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: Watch all of the AI, Android reveals

Google Play has a new discovery feature for apps, new ways to acquire users, updates to Play Points, and other enhancements to developer-facing tools.

Google Play preps a new full-screen app discovery feature and adds more developer tools

Soon, Android users will be able to drag and drop AI-generated images directly into their Gmail, Google Messages and other apps.

Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more

Veo can capture different visual and cinematic styles, including shots of landscapes and timelapses, and make edits and adjustments to already-generated footage.

Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024

In addition to the body of the emails themselves, the feature will also be able to analyze attachments, like PDFs.

Gemini comes to Gmail to summarize, draft emails, and more

The summaries are created based on Gemini’s analysis of insights from Google Maps’ community of more than 300 million contributors.

Google is bringing Gemini capabilities to Google Maps Platform

Google says that over 100,000 developers already tried the service.

Project IDX, Google’s next-gen IDE, is now in open beta

The system effectively listens for “conversation patterns commonly associated with scams” in-real time. 

Google will use Gemini to detect scams during calls

The standard Gemma models were only available in 2 billion and 7 billion parameter versions, making this quite a step up.

Google announces Gemma 2, a 27B-parameter version of its open model, launching in June

This is a great example of a company using generative AI to open its software to more users.

Google TalkBack will use Gemini to describe images for blind people

Google’s Circle to Search feature will now be able to solve more complex problems across psychics and math word problems. 

Circle to Search is now a better homework helper