Help for generative AI is on the way

Three powerful approaches have emerged to improve the reliability of large language models by developing a fact-checking layer to support them.

139994676 lifesaver rescue help — Thinkstock

As the momentous first year of ChatGPT comes to a close, it’s clear that generative AI (genAI) and large language models (LLMs) are exciting technologies. But are they ready for prime-time enterprise use?

There are well-understood challenges with ChatGPT, where its responses have poor accuracy. Despite being based on sophisticated computer models of human knowledge like GPT-4, ChatGPT rarely wants to admit ignorance, a phenomenon referred to as AI hallucinations, and it often struggles with logical reasoning. Of course, this is because ChatGPT doesn’t reason—it operates like an advanced text auto-complete system.

This can be hard for users to accept. After all, GPT-4 is an impressive system: It can take a simulated bar exam and pass with a score in the top 10% of entrants. The prospect of employing such an intelligent system to interrogate corporate knowledge bases is undoubtedly appealing. But we need to guard against both its overconfidence and its stupidity.

To combat these, three powerful new approaches have emerged, and they can offer a way to enhance reliability. While these approaches may differ in their emphasis, they share a fundamental concept: treating the LLM as a “closed box.” In other words, the focus is not necessarily on perfecting the LLM itself (though AI engineers continue to improve their models considerably) but on developing a fact-checking layer to support it. This layer aims to filter out inaccurate responses and infuse the system with a “common sense.”

Let’s look at each in turn and see how.

A wider search capability

One of these approaches involves the widespread adoption of vector search. This is now a common feature of many databases, including some databases that are specialized solely to vectors.

A vector database is intended to be able to index unstructured data like text or images, placing them in a high-dimensional space for search, retrieval, and closeness. For example, searching for the term “apple” might find information about a fruit, but nearby in the “vector space” there might be results about a technology company or a record label.

Vectors are useful glue for AI because we can use them to correlate data points across components like databases and LLMs, and not just use them as keys into a database for training machine learning models.

From RAGs to riches

Retrieval-augmented generation, or RAG, is a common method for adding context to an interaction with an LLM. Under the bonnet, RAG retrieves supplementary content from a database system to contextualize a response from an LLM. The contextual data can include metadata, such as timestamp, geolocation, reference, and product ID, but could in theory be the results of arbitrarily sophisticated database queries.

This contextual information serves to help the overall system generate relevant and accurate responses. The essence of this approach lies in obtaining the most accurate and up-to-date information available on a given topic in a database, thereby refining the model’s responses. A useful by-product of this approach is that, unlike the opaque inner workings of GPT-4, if RAG forms the foundation for the business LLM, the business user gains more transparent insight into how the system arrived at the presented answer.

If the underlying database has vector capabilities, then the response from the LLM, which includes embedded vectors, can be used to find pertinent data from the database to improve the accuracy of the response.

The power of a knowledge graph

However, even the most advanced vector-powered, RAG-boosted search function would be insufficient to ensure mission-critical reliability of ChatGPT for the business. Vectors alone are merely one way of cataloging data, for example, and certainly not the richest of data models.

Instead, knowledge graphs have gained significant traction as the database of choice for RAG. A knowledge graph is a semantically rich web of interconnected information, pulling together information from many dimensions into a single data structure (much like the web has done for humans). Because a knowledge graph holds transparent, curated content, its quality can be assured.

We can tie the LLM and the knowledge graph together using vectors too. But in this case once the vector is resolved to a node in the knowledge graph, the topology of the graph can be used to perform fact-checking, closeness searches, and general pattern matching to ensure what’s being returned to the user is accurate.

This isn’t the only way that knowledge graphs are being used. An interesting concept is being explored at the University of Washington by an AI researcher called Professor Yejin Choi, who Bill Gates recently interviewed. Professor Choi and her team have built a machine-authored knowledge base that aids the LLM to sort good from bad knowledge by asking questions and then only adding in (as rules) answers that consistently check out.

Choi’s work uses an AI called a “critic” that probes the logical reasoning of an LLM to build a knowledge graph consisting of only good reasoning and good facts. A clear example of deficient reasoning is evident if you ask ChatGPT (3.5) how long it would take to dry five shirts in the sun if it takes one hour to dry one shirt. While common sense dictates that if it takes an hour to dry one shirt, it would still take an hour regardless of quantity, the AI tried to do complicated math to solve the problem, justifying its approach by showing its (incorrect) workings!

While AI engineers work hard to solve these problems (and ChatGPT 4 doesn’t fail here), Choi’s approach to distilling a knowledge graph offers a general-purpose solution. It’s particularly fitting that this knowledge graph is then used to train an LLM, which has much higher accuracy despite being smaller.

Getting the context back in

We have seen that knowledge graphs enhance GPT systems by providing more context and structure through RAG. We’ve also seen the evidence mount that by using a combination of vector-based and graph-based semantic search (a synonym for knowledge graphs), organizations achieve consistently high-accuracy results.

By incorporating an architecture that leverages a combination of vectors, RAG, and a knowledge graph to support a large language model, we can construct highly valuable business applications without requiring expertise in the intricate processes of building, training, and fine-tuning an LLM.

It’s a synthesis that means we can add a rich, contextual understanding of a concept with the more foundational “understanding” a computer (LLM) can achieve. Clearly, enterprises can benefit from this approach. Where graphs succeed is in answering the big questions: What’s important in the data? What’s unusual? Most importantly, given the patterns of the data, graphs can forecast what’s going to happen next.

This factual prowess coupled with the generative element of LLMs is compelling and has wide applicability. As we move further into 2024, I predict we will see widespread acceptance of this powerful way to make LLMs into mission-critical business tools.

Jim Webber is chief scientist at graph database and analytics leader Neo4j. He is co-author of Graph Databases (1st and 2nd editions, O’Reilly), Graph Databases for Dummies (Wiley), and Building Knowledge Graphs (O’Reilly).

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

Next read this: