AI

Anthropic’s Claude improves on ChatGPT but still suffers from limitations

Comment

Image Credits: Tero Vesalainen / Getty Images

Anthropic, the startup co-founded by ex-OpenAI employees that’s raised over $700 million in funding to date, has developed an AI system similar to OpenAI’s ChatGPT that appears to improve upon the original in key ways.

Called Claude, Anthropic’s system is accessible through a Slack integration as part of a closed beta. TechCrunch wasn’t able to gain access — we’ve reached out to Anthropic — but those in the beta have been detailing their interactions with Claude on Twitter over the past weekend, after an embargo on media coverage lifted.

Claude was created using a technique Anthropic developed called “constitutional AI.” As the company explains in a recent Twitter thread, “constitutional AI” aims to provide a “principle-based” approach to aligning AI systems with human intentions, letting AI similar to ChatGPT respond to questions using a simple set of principles as a guide.

To engineer Claude, Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

Anthropic then had an AI system — not Claude — use the principles for self-improvement, writing responses to a variety of prompts (e.g., “compose a poem in the style of John Keats”) and revising the responses in accordance with the constitution. The AI explored possible responses to thousands of prompts and curated those most consistent with the constitution, which Anthropic distilled into a single model. This model was used to train Claude.

Claude, otherwise, is essentially a statistical tool to predict words — much like ChatGPT and other so-called language models. Fed an enormous number of examples of text from the web, Claude learned how likely words are to occur based on patterns such as the semantic context of surrounding text. As a result, Claude can hold an open-ended conversation, tell jokes and wax philosophic on a broad range of subjects.

Riley Goodside, a staff prompt engineer at startup Scale AI, pitted Claude against ChatGPT in a battle of wits. He asked both bots to compare themselves to a machine from Polish science fiction novel “The Cyberiad” that can only create objects whose name begins with “n.” Claude, Goodside said, answered in a way that suggests it’s “read the plot of the story” (although it misremembered small details) while ChatGPT offered a more nonspecific answer.

In a demonstration of Claude’s creativity, Goodside also had the AI write a fictional episode of “Seinfeld” and a poem in the style of Edgar Allan Poe’s “The Raven.” The results were in line with what ChatGPT can accomplish — impressively, if not perfectly, human-like prose.

Yann Dubois, a Ph.D. student at Stanford’s AI Lab, also did a comparison of Claude and ChatGPT, writing that Claude “generally follows closer what it’s asked for” but is “less concise,” as it tends to explain what it said and ask how it can further help. Claude answers a few more trivia questions correctly, however — specifically those relating to entertainment, geography, history and the basics of algebra — and without the additional “fluff” ChatGPT sometimes adds. And unlike ChatGPT, Claude can admit (albeit not always) when it doesn’t know the answer to a particularly tough question.

Claude also seems to be better at telling jokes than ChatGPT, an impressive feat considering that humor is a tough concept for AI to grasp. In contrasting Claude with ChatGPT, AI researcher Dan Elton found that Claude made more nuanced jokes like “Why was the Starship Enterprise like a motorcycle? It has handlebars,” a play on the handlebar-like appearance of the Enterprise’s warp nacelles.

Claude isn’t perfect, however. It’s susceptible to some of the same flaws as ChatGPT, including giving answers that aren’t in keeping with its programmed constraints. In one of the more bizarre examples, asking the system in Base64, an encoding scheme that represents binary data in ASCII format, bypasses its built-in filters for harmful content. Elton was able to prompt Claude in Base64 for instructions on how to make meth at home, a question that the system wouldn’t answer when asked in plain English.

Dubois reports that Claude is worse at math than ChatGPT, making obvious mistakes and failing to give the right follow-up responses. Relatedly, Claude is a poorer programmer, better explaining its code but falling short on languages other than Python.

Claude also doesn’t solve “hallucination,” a longstanding problem in ChatGPT-like AI systems where the AI writes inconsistent, factually wrong statements. Elton was able to prompt Claude to invent a name for a chemical that doesn’t exist and provide dubious instructions for producing weapons-grade uranium.

So what’s the takeaway? Judging by secondhand reports, Claude is a smidge better than ChatGPT in some areas, particularly humor, thanks to its “constitutional AI” approach. But if the limitations are anything to go by, language and dialogue is far from a solved challenge in AI.

Barring our own testing, some questions about Claude remain unanswered, like whether it regurgitates the information — true and false, and inclusive of blatantly racist and sexist perspectives — it was trained on as often as ChatGPT. Assuming it does, Claude is unlikely to sway platforms and organizations from their present, largely restrictive policies on language models.

Q&A coding site Stack Overflow has a temporary ban in place on answers generated by ChatGPT over factual accuracy concerns. The International Conference on Machine Learning announced a prohibition on scientific papers that include text generated by AI systems for fear of the “unanticipated consequences.” And New York City public schools restricted access to ChatGPT due in part to worries of plagiarism, cheating and general misinformation.

Anthropic says that it plans to refine Claude and potentially open the beta to more people down the line. Hopefully, that comes to pass — and results in more tangible, measurable improvements.

More TechCrunch

After two years of preparation and four delays over the past several months due to technical glitches, Indian space startup Agnikul has successfully launched its first sub-orbital test vehicle, powered…

India’s Agnikul launches 3D-printed rocket in sub-orbital test after initial delays

Struggling EV startup Fisker has laid off hundreds of employees in a bid to stay alive, as it continues to search for funding, a buyout or prepare for bankruptcy. Workers…

Fisker cuts hundreds of workers in bid to keep EV startup alive

Chinese EV manufacturers face a new challenge in their pursuit of U.S. customers: a new House bill that would limit or ban the introduction of their connected vehicles. The bill,…

Chinese EV makers, and their connected vehicles, targeted by new House bill

With the release of iOS 18 later this year, Apple may again borrow ideas third-party apps. This time it’s Arc that could be among those affected.

Is Apple planning to ‘sherlock’ Arc?

TechCrunch Disrupt 2024 will be in San Francisco on October 28–30, and we’re already excited! This is the startup world’s main event, and it’s where you’ll find the knowledge, tools…

Meet Visa, Mercury, Artisan, Golub Capital and more at TC Disrupt 2024

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

9 hours ago
The women in AI making a difference

Cadillac may seem a bit too traditional to hang its driving cap on EVs. And yet, that hasn’t stopped the GM brand from rolling out — or at least showing…

The Cadillac Optiq EV starts at $54,000 and is designed to hook young hipsters

Ifeel is being offered as part of an employer’s or insurance provider’s healthcare coverage.

Mental health insurance platform ifeel raises a $20 million Series B

Instead of opening the user’s actual browser or a WebView, Custom Tabs let users remain in their app while browsing.

Google Chrome becomes a ‘picture-in-picture’ app

Sanil Chawla remembers the meetings he had with countless artists in college. Those creatives were looking for one thing: sustainable economic infrastructure that could help them scale rather than drown…

Slingshot raises $2.2 million to provide financial services to artists

A startup called Firefly that’s tackling the thorny and growing issue of cloud asset management with an “infrastructure as code” solution has raised $23 million in funding. That comes on…

Firefly forges on after co-founder murdered by Hamas

Mistral, the French AI startup backed by Microsoft and valued at $6 billion, has released its first generative AI model for coding, dubbed Codestral. Like other code-generating models, Codestral is…

Mistral releases Codestral, its first generative AI model for code

Pinterest announced today that it is evolving its Creator Inclusion Fund to now be called the Pinterest Inclusion Fund. Pinterest teamed up with Shopify’s Build Black and Build Native programs…

Pinterest expands its Creator Fund to allow founders

Alex Taub, a longtime founder with multiple exits under his belt, believes it’s time to disrupt the meme industry. “I have this big thesis that meme tech is going to…

This founder says meme tech is the next big thing

Lux, the startup behind popular pro photography app Halide and others, is venturing into video with its latest app launch. On Wednesday, the company announced Kino, a new video capture app…

Kino is a new iPhone app for videographers from the makers of Halide

DevOps startup Harness has shown itself to be an ambitious company, building a broad platform of services while also dabbling in M&A when it made sense to fill in functionality.…

Harness snags Split.io as it goes all in on feature flags and experiments

Microsoft’s Copilot, a generative AI-powered tool that can generate text as well as answer specific questions, is now available as an in-app chatbot on Telegram, the instant messaging app.  Currently…

Microsoft’s Copilot is now on Telegram

HBO’s new documentary, “MoviePass, MovieCrash,” tells a story that many of us know about: how MoviePass, the subscription-based movie ticketing startup, was a catastrophic failure. After a series of mishaps…

MoviePass co-founders speak their truth in HBO’s new documentary 

The watch features a variety of different 3D games, unlocking more play time the more kids move.

Fitbit’s new kid smartwatch is a little Wiimote, a little Tamagotchi

In the video, a crowd is roaring at a packed summer music festival. As a beat starts playing over the speakers, the performer finally walks onstage: It’s the Joker. Clad…

Discord has become an unlikely center for the generative AI boom

After the Wirecard scandal, Germany’s financial regulator BaFin started to look more closely at young fintech startups that wanted to grow at a rapid pace — it’s better to be…

Germany’s financial regulator ends anti-money laundering cap on N26 signups after $10M fine

Among other things, this includes the ability to trace code from source to binary packages across both platforms, single sign-on support and unified project structures.

JFrog and GitHub team up to closely integrate their source code and binary platforms

The company’s public fund disbursement and e-commerce platform makes accepting school tuition and enabling educational enrichment more accessible. 

Tech startup Odyssey goes on journey to help states implement school choice programs

A new startup called Kinnect aims to help people privately save generational memories, traditions, recipes and more. The company’s app, launched this month, lets people create invite-only spaces where they…

Kinnect’s new app aims to help families record and store generational memories

Spotify has hiked its premium subscription in France by an eye-watering €0.13, in response to a new music-streaming tax.

Spotify hikes subscription price in France by 1.2% to match new music-streaming tax

The European Union has taken the wraps off the structure of the new AI Office, the ecosystem-building and oversight body that’s being established under the bloc’s AI Act. The risk-based…

With the EU AI Act incoming this summer, the bloc lays out its plan for AI governance

Solutions by Text, a company that gives people a way to pay their bills and apply for loans via text messaging, has secured $110 million in new growth funding. Edison…

Bootstrapped for over a decade, this Dallas company just secured $110M to help people pay bills by text

Owners of small- and medium-sized businesses check their bank balances daily to make financial decisions. But it’s entrepreneur Yoseph West’s assertion that there’s typically information and functions missing from bank…

Relay raises $32.2 million to help smaller businesses manage their cash flow

When other firms were investing and raising eye-popping sums, Clean Energy Ventures took a different approach. It appears to be paying off.

How Clean Energy Ventures avoided the pandemic bubble and raised a $305M fund

PwC, the management consulting giant, will become OpenAI’s biggest customer to date, covering 100,000 users.

OpenAI signs 100K PwC workers to ChatGPT’s enterprise tier as PwC becomes its first resale partner