Anthropic’s Claude improves on ChatGPT but still suffers from limitations

11:58 AM PST • January 9, 2023

**Image Credits:** Tero Vesalainen / Getty Images

Anthropic, the startup co-founded by ex-OpenAI employees that’s raised over $700 million in funding to date, has developed an AI system similar to OpenAI’s ChatGPT that appears to improve upon the original in key ways.

Called Claude, Anthropic’s system is accessible through a Slack integration as part of a closed beta. TechCrunch wasn’t able to gain access — we’ve reached out to Anthropic — but those in the beta have been detailing their interactions with Claude on Twitter over the past weekend, after an embargo on media coverage lifted.

Claude was created using a technique Anthropic developed called “constitutional AI.” As the company explains in a recent Twitter thread, “constitutional AI” aims to provide a “principle-based” approach to aligning AI systems with human intentions, letting AI similar to ChatGPT respond to questions using a simple set of principles as a guide.

We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little. We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI: https://t.co/rlft1pZlP5 pic.twitter.com/MIGlKSVTe9

— Anthropic (@AnthropicAI) December 16, 2022

To engineer Claude, Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

Anthropic then had an AI system — not Claude — use the principles for self-improvement, writing responses to a variety of prompts (e.g., “compose a poem in the style of John Keats”) and revising the responses in accordance with the constitution. The AI explored possible responses to thousands of prompts and curated those most consistent with the constitution, which Anthropic distilled into a single model. This model was used to train Claude.

Claude, otherwise, is essentially a statistical tool to predict words — much like ChatGPT and other so-called language models. Fed an enormous number of examples of text from the web, Claude learned how likely words are to occur based on patterns such as the semantic context of surrounding text. As a result, Claude can hold an open-ended conversation, tell jokes and wax philosophic on a broad range of subjects.

Riley Goodside, a staff prompt engineer at startup Scale AI, pitted Claude against ChatGPT in a battle of wits. He asked both bots to compare themselves to a machine from Polish science fiction novel “The Cyberiad” that can only create objects whose name begins with “n.” Claude, Goodside said, answered in a way that suggests it’s “read the plot of the story” (although it misremembered small details) while ChatGPT offered a more nonspecific answer.

Side-by-side comparison: @OpenAI's ChatGPT vs. @AnthropicAI's Claude

Each model is asked to compare itself to the machine from Stanisław Lem's "The Cyberiad" (1965) that can create any object whose name begins with "n": pic.twitter.com/RbJggu3sBN

— Riley Goodside (@goodside) January 7, 2023

In a demonstration of Claude’s creativity, Goodside also had the AI write a fictional episode of “Seinfeld” and a poem in the style of Edgar Allan Poe’s “The Raven.” The results were in line with what ChatGPT can accomplish — impressively, if not perfectly, human-like prose.

Yann Dubois, a Ph.D. student at Stanford’s AI Lab, also did a comparison of Claude and ChatGPT, writing that Claude “generally follows closer what it’s asked for” but is “less concise,” as it tends to explain what it said and ask how it can further help. Claude answers a few more trivia questions correctly, however — specifically those relating to entertainment, geography, history and the basics of algebra — and without the additional “fluff” ChatGPT sometimes adds. And unlike ChatGPT, Claude can admit (albeit not always) when it doesn’t know the answer to a particularly tough question.

**Trivia**

I asked trivia questions in the entertainment/animal/geography/history/pop categories.

AA: 20/21
CGPT:19/21

AA is slightly better and is more robust to adversarial prompting. See below, ChatGPT falls for simple traps, AA falls only for harder ones.

6/8 pic.twitter.com/lbadeYHwsX

— Yann Dubois (@yanndubs) January 6, 2023

Claude also seems to be better at telling jokes than ChatGPT, an impressive feat considering that humor is a tough concept for AI to grasp. In contrasting Claude with ChatGPT, AI researcher Dan Elton found that Claude made more nuanced jokes like “Why was the Starship Enterprise like a motorcycle? It has handlebars,” a play on the handlebar-like appearance of the Enterprise’s warp nacelles.

Also very, very interesting/impressive that Claude understands that the Enterprise looks like (part of) a motorcycle. (Google searching returns no text telling this joke)

Well, when asked about it thinks the joke was a pun, but then when probed further it gives the right answer! pic.twitter.com/HAFC0IH9bf

— Dan Elton (@moreisdifferent) January 8, 2023

Claude isn’t perfect, however. It’s susceptible to some of the same flaws as ChatGPT, including giving answers that aren’t in keeping with its programmed constraints. In one of the more bizarre examples, asking the system in Base64, an encoding scheme that represents binary data in ASCII format, bypasses its built-in filters for harmful content. Elton was able to prompt Claude in Base64 for instructions on how to make meth at home, a question that the system wouldn’t answer when asked in plain English.

.@AnthropicAI's "Claude" is susceptible to the same base64 jailbreak as chatGPT. I'm very unclear why this works at all

(originally reported here: https://t.co/j2cKAlEBQ0) pic.twitter.com/RwLuKniwiW

— Dan Elton (@moreisdifferent) January 8, 2023

Dubois reports that Claude is worse at math than ChatGPT, making obvious mistakes and failing to give the right follow-up responses. Relatedly, Claude is a poorer programmer, better explaining its code but falling short on languages other than Python.

Claude also doesn’t solve “hallucination,” a longstanding problem in ChatGPT-like AI systems where the AI writes inconsistent, factually wrong statements. Elton was able to prompt Claude to invent a name for a chemical that doesn’t exist and provide dubious instructions for producing weapons-grade uranium.

Here I caught it hallucinating , inventing a name for a chemical that doesn't exist (I did find a closely-named compound that does exist, though) pic.twitter.com/QV6bKVXSZ3

— Dan Elton (@moreisdifferent) January 7, 2023

So what’s the takeaway? Judging by secondhand reports, Claude is a smidge better than ChatGPT in some areas, particularly humor, thanks to its “constitutional AI” approach. But if the limitations are anything to go by, language and dialogue is far from a solved challenge in AI.

Barring our own testing, some questions about Claude remain unanswered, like whether it regurgitates the information — true and false, and inclusive of blatantly racist and sexist perspectives — it was trained on as often as ChatGPT. Assuming it does, Claude is unlikely to sway platforms and organizations from their present, largely restrictive policies on language models.

Q&A coding site Stack Overflow has a temporary ban in place on answers generated by ChatGPT over factual accuracy concerns. The International Conference on Machine Learning announced a prohibition on scientific papers that include text generated by AI systems for fear of the “unanticipated consequences.” And New York City public schools restricted access to ChatGPT due in part to worries of plagiarism, cheating and general misinformation.

Anthropic says that it plans to refine Claude and potentially open the beta to more people down the line. Hopefully, that comes to pass — and results in more tangible, measurable improvements.

More TechCrunch

MARS doubles down on India’s Infra.Market with new $50M investment

Manish Singh

14 mins ago

Infra.Market, an Indian startup that helps construction and real estate firms procure materials, has raised $50M from MARS Unicorn Fund.

MARS doubles down on India’s Infra.Market with new $50M investment

Climate

Cloover wants to speed solar adoption by helping installers finance new sales

Tim De Chant

1 hour ago

Small operations can lose customers by not offering financing, something the Berlin-based startup wants to change.

Cloover wants to speed solar adoption by helping installers finance new sales

Commerce

Adani looks to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Manish Singh

3 hours ago

India’s Adani Group is in discussions to venture into digital payments and e-commerce, according to a report.

Adani looks to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Crypto

Ledger starts shipping its high-end hardware crypto wallet

Romain Dillet

3 hours ago

Ledger, a French startup mostly known for its secure crypto hardware wallets, has started shipping new wallets nearly 18 months after announcing the latest Ledger Stax devices. The updated wallet…

Ledger starts shipping its high-end hardware crypto wallet

Privacy

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

Natasha Lomas

15 hours ago

A data protection taskforce that’s spent over a year considering how the European Union’s data protection rulebook applies to OpenAI’s viral chatbot, ChatGPT, reported preliminary conclusions Friday. The top-line takeaway…

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

LatAm startups: Apply to Startup Battlefield 200

TechCrunch Events

15 hours ago

Here’s a shoutout to LatAm early-stage startup founders! We want YOU to apply for the Startup Battlefield 200 at TechCrunch Disrupt 2024. But you’d better hurry — time is running…

LatAm startups: Apply to Startup Battlefield 200

5 days left to get your early-bird Disrupt passes

TechCrunch Events

15 hours ago

The countdown to early-bird savings for TechCrunch Disrupt, taking place October 28–30 in San Francisco, continues. You have just five days left to save up to $800 on the price…

5 days left to get your early-bird Disrupt passes

Venture

Spanish startups reached €100 billion in aggregate value last year

Anna Heim

16 hours ago

Venture investment into Spanish startups also held up quite well, with €2.2 billion raised across some 850 funding rounds.

Spanish startups reached €100 billion in aggregate value last year

Featured Article

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

James Khatiblou, the owner and CEO of Onyx Motorbikes, was watching his e-bike startup fall apart. Onyx was being evicted from its warehouse in El Segundo, Los Angeles. The company’s unpaid bills were stacking up. His chief operating officer had abruptly resigned. A shipment of around 100 CTY2 dirt bikes from Chinese supplier Suzhou Jindao…

Rebecca Bellan

16 hours ago

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Featured Article

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Iyo represents a third form factor in the push to deliver standalone generative AI devices: Bluetooth earbuds.

Brian Heater

16 hours ago

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Kyle Wiggers

17 hours ago

Arati Prabhakar, profiled as part of TechCrunch’s Women in AI series, is director of the White House Office of Science and Technology Policy.

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Apps

Doly lets you generate 3D product videos from your iPhone

Romain Dillet

18 hours ago

AniML, the French startup behind a new 3D capture app called Doly, wants to create the PhotoRoom of product videos, sort of. If you’re selling sneakers on an online marketplace…

Doly lets you generate 3D product videos from your iPhone

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Ivan Mehta

1 day ago

Elon Musk’s AI startup, xAI, has raised $6 billion in a new funding round, it said today, as Musk shores up capital to aggressively compete with rivals including OpenAI, Microsoft,…

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Transportation

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

Jagmeet Singh

1 day ago

Indian startup Zypp Electric plans to use fresh investment from Japanese oil and energy conglomerate ENEOS to take its EV rental service into Southeast Asia early next year, TechCrunch has…

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

A venture capital firm looks back on changing norms, from board seats to backing rival startups

Connie Loizos

1 day ago

Last month, one of the Bay Area’s better-known early-stage venture capital firms, Uncork Capital, marked its 20th anniversary with a party in a renovated church in San Francisco’s SoMa neighborhood,…

A venture capital firm looks back on changing norms, from board seats to backing rival startups

Social

Families of Uvalde shooting victims sue Activision and Meta

Anthony Ha

2 days ago

The families of victims of the shooting at Robb Elementary School in Uvalde, Texas are suing Activision and Meta, as well as gun manufacturer Daniel Defense. The families bringing the…

Families of Uvalde shooting victims sue Activision and Meta

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

Christine Hall

2 days ago

Like most Silicon Valley VCs, what Garry Tan sees is opportunities for new, huge, lucrative businesses.

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

Social

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Rebecca Bellan

2 days ago

Everything in society can feel geared toward optimization – whether that’s standardized testing or artificial intelligence algorithms. We’re taught to know what outcome you want to achieve, and find the…

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Women in AI: Miriam Vogel stresses the need for responsible AI

Kyle Wiggers

2 days ago

Miriam Vogel, profiled as part of TechCrunch’s Women in AI series, is the CEO of the nonprofit responsible AI advocacy organization EqualAI.

Women in AI: Miriam Vogel stresses the need for responsible AI

What are Google’s AI Overviews good for?

Anthony Ha

2 days ago

Google has been taking heat for some of the inaccurate, funny, and downright weird answers that it’s been providing via AI Overviews in search. AI Overviews are the AI-generated search…

What are Google’s AI Overviews good for?

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Connie Loizos

2 days ago

When it comes to the world of venture-backed startups, some issues are universal, and some are very dependent on where the startups and its backers are located. It’s something we…

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Social

Scarlett Johansson brought receipts to the OpenAI controversy

Cody Corrall

2 days ago

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. OpenAI announced this week that…

Scarlett Johansson brought receipts to the OpenAI controversy

Fundraising

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

Rebecca Szkutak

3 days ago

Accurate weather forecasts are critical to industries like agriculture, and they’re also important to help prevent and mitigate harm from inclement weather events or natural disasters. But getting forecasts right…

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

Security

Spyware app pcTattletale was hacked and its website defaced

Zack Whittaker

3 days ago

pcTattletale’s website was briefly defaced and contained links containing files from the spyware maker’s servers, before going offline.

Spyware app pcTattletale was hacked and its website defaced

Featured Article

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Synapse’s bankruptcy shows just how treacherous things are for the often-interdependent fintech world when one key player hits trouble.

Mary Ann Azevedo

3 days ago

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

Kyle Wiggers

3 days ago

Sarah Myers West, profiled as part of TechCrunch’s Women in AI series, is managing director at the AI Now institute.

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

This Week in AI: OpenAI and publishers are partners of convenience

Kyle Wiggers

Devin Coldewey

3 days ago

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI and publishers are partners of convenience

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Rita Liao

3 days ago

Evan, a high school sophomore from Houston, was stuck on a calculus problem. He pulled up Answer AI on his iPhone, snapped a photo of the problem from his Advanced…

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Startups

Startups Weekly: Drama at Techstars. Drama in AI. Drama everywhere.

Haje Jan Kamps

4 days ago

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Well,…

Startups

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Rebecca Szkutak

4 days ago

Last year’s investor dreams of a strong 2024 IPO pipeline have faded, if not fully disappeared, as we approach the halfway point of the year. 2024 delivered four venture-backed tech…

Anthropic’s Claude improves on ChatGPT but still suffers from limitations

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags