Senior Writer

IT leaders look beyond LLMs for gen AI needs

Feature
May 21, 20249 mins
Generative AIIT Strategy

Not all enterprise use cases are best suited for large language models. New multimodal generative AI models and smaller models show promise for niche needs.

Male Project Supervisor Talks to an Industry 4 Engineer Who Works on Desktop Computer in Robotics Startup Industrial Office. Software Developer Discussing Work Related Tasks During a Meeting.
Credit: Gorodenkoff / Shutterstock

With the generative AI gold rush in full swing, some IT leaders are finding generative AI’s first-wave darlings — large language models (LLMs) — may not be up to snuff for their more promising use cases.

LLMs, with their advanced ability to comprehend and generate text, have become a near stand-in for generative AI in the collective mindshare. Along with code-generating copilots and text-to-image generators, which leverage a combination of LLMs and diffusion processing, LLMs are at the core of most generative AI experimentation in business today.

But not all problems are best solved using LLMs, some IT leaders say, opening a next wave of multimodal models that go beyond language to deliver more purposeful results — for example, to handle dynamic tabular data stored in spreadsheets and vector databases, video, and audio data. 

Multimodal foundation models combine multiple modes, such as text, audio, image, and video, and are capable of generating captions for images or answering questions about images, according to IDC’s Market Glance: Generative Foundation AI Models. Examples include Google Gato, OpenAI GPT-4o, Microsoft LLaVA, Nvidia NeVA, Vicuna, BLIP2, and Flamingo, IDC notes.   

Northwestern Medicine’s Advanced Technologies Group collaborated with Dell’s AI Innovation team to build a proprietary multimodal LLM that can interpret chest X-ray images and create a summarization of key findings. With this model, patients get results almost 80% faster than before. Next, Northwestern and Dell will develop an enhanced multimodal LLM for CAT scans and MRIs and a predictive model for the entire electronic medical record.

“The model is very interesting because not a lot of people are using multimodal at this point,” says Dr. Mozziyar Etemadi, an anesthesiologist and medical director of advanced technologies at Northwestern. Etemadi notes that the current model saves radiologists 40% of time no longer needing to write text notes, and more time due to the model’s ability to analyze the imagery. “Models usually are just LLMs, and some text, or Excel, but now we can accommodate images and X-rays. It’s fabulous.”

Putting new models to work

Labor-scheduling SaaS MakeShift is another organization looking beyond the LLM to help perform complex predictive scheduling for its healthcare, retail, and manufacturing clients.

“We were using LLMs for chat support for administrators and employees, but when you get into vector data, and large graphical structures with a couple of hundred million rows of inter-related data and you want to optimize towards a predictive model for the future, you can’t get anywhere with LLMs,” says MakeShift CTO Danny McGuinness.

Instead, MakeShift is embracing what is being dubbed a new patent-pending large graphical model (LGM) from MIT startup Ikigai Labs.

“We’re leveraging the large graphical models with complex structured data, establishing those interrelationships causation and correlation,” McGuinness says.

MakeShift joins companies such as Medico, HSBC, Spirit Halloween, Taager.com, Future Metals, and WIO in deploying Ikigai Labs’ no-code models for tabular and time-series data. Ikigai Labs — co-founded by Devavrat Shah, director of MIT’s AI and Data Science department, and Vinayak Ramesh — offers AI for tabular data organized in rows and columns. The company has doubled its head count in the past six months and scored a $25 million investment late last year.

Other types of multimodal models with support for video are also emerging for software services that rely heavily on computer vision and video, giving CIOs a raft of new tools to call on to leverage AI models suited to their specific needs.

For MakeShift and its clients, scheduling is a business process made complex by its 24-by-7 operations, as well as nuanced requirements brought about by union regulations and collective bargaining agreements. MakeShift engineers started working with the Ikigai Labs’ APIs and models last year and are now in full production. Predictive scheduling to address constantly changing data sets and procedures is made far easier using LGM-based AI,” McGuinness says. And the benefits of MakeShift’s use of AI are beginning to multiply.

“It’s starting to evolve because the AI is learning and we’ve started to see we can incorporate other types of data into these models,” McGuiness says, noting that some customers are pulling in additional data types to improve scheduling functionality. “One of our retail customers is starting to talk about pulling in weather data. We can start to incorporate public data, such as weather forecasting, proximity to mass transit, and density of people in a store.”

Another benefit of MakeShift’s use of Ikigai’s models is “surfacing scenarios that you would not have thought of, in terms of correlation and causation, and starting to bring up other questions to ask the data,” McGuiness says. “One of our first healthcare customers is looking at other use cases besides historic scheduling, certain processes and events that have financial transactions involved.”

Of course, LLMs can also handle tabular and other forms of data through markup language, notes Naveen Rao, vice president of AI at Databricks, which acquired his company, Mosaic, last year.

But the rise of alternative models, such as Ikigai’s, and the gray area in terms of what can be readily accomplished with more broadly applicable LLMs underscores the wild west nature of the generative AI market that CIOs currently face.

Going small

Gartner AI analyst Arun Chandrasekaran says the evolution of LLMs into more powerful, multimodal models was to be expected, but he sees such models representing a smaller percentage of business use due to their enormous cost.

“In 2023, it was really dominated by models with text and code,” Chandrasekaran says. “Then we started seeing models with computer vision and seeing the inklings of lots of other modalities such as speech models. But fundamentally, building these models is still enormously expensive in terms of compute and data resources.”

Instead, Chandrasekaran sees many enterprises evolving beyond the LLM by going small.

“These enormously powerful models definitely have a place in several use cases in the enterprise,” he notes. “But we’ll see pricing periodically prioritizing the size of models because smaller ones are less expensive and good enough for the tasks that enterprises aim to deploy them.”

Databricks’ Naveen Rao agrees, noting that building a large model can cost up to $200 million. The vast majority of that cost, according to Rao, is not in the compute power needed but in data labeling and data curation, which determines the performance of models.

Rao, who founded Mosaic to build models that would be more affordable and accessible to any enterprise, believes specialization is the way forward for most.

“It’s just really about specialization versus generalization,” Rao says. “Larger models tend to be trained on lots of tokens or lots of general text and capabilities. Smaller models are a subset and tend to focus on one thing.”

Here, open source can help give CIOs a leg up, Rao says.

“You can either start from scratch and build your own model with your own data or take an existing open-source model, fine-tune and customize it on your data for your own application,” he says.

Baldor Specialty Foods is one organization that aims to deploy smaller models that its chief information and digital officer believes can be trained for custom solutions without bias or errors.

“I’m going to use smaller models because sometimes [LLMs] hallucinate,” says Satyan Parameswaran, who spent decades in top IT posts at UPS. “You don’t want to be in the business of designing models. You can get a small model from Hugging Face and then customize it for your specific tasks.”

A new equation for generative AI

Several enterprise AI vendors today offer smaller models in AI marketplaces, including C3.ai, Anaplan, Dataiku, and Hugging Face.

As for Ikigai Labs, its self-described LGMs provide a probabilistic representation of data for tabular timestamp data, such as a spreadsheet, says CEO Shah. As the models are trained, they learn relationships between random variables, what data might be missing, or what rows look similar between two spreadsheets, facilitating new insights.

“That means you can now actually start stitching data together,” says Shah, adding that a user can generate new rows in a spreadsheet and “if it’s temporal when you’re doing forecasting, if variables start changing in between, you are detecting change points. You’re capturing anomalies.”

From that, a user would be able to create and generate data from multiple spreadsheets in multiple dimensions. “You do a simulation, or synthetic duration, on your data — and your data only — with large graphical models to get to good and meaningful learning from the data,” Shah says.

Naturally, cost will be a major factor in determining the extent to which these models will be customized. Currently, text-only LLMs require tremendous compute power. As large chip manufacturers and even cloud providers race to develop semiconductors that can provide a more ample supply of this compute power, enterprises will continue to experiment and put into production all types of large and small models that will yield new insights to make their businesses more efficient and innovative.

For now, many enterprises are getting their feet wet with LLMs through experimentation, moving into production as efficiencies are confirmed. Use of LVMs and LGMs remains in its infancy, but early adopters, such as MakeShift’s McGuiness, are seeing payoffs.

“We’re looking to [help our customers] schedule people optimally with the right skill at the right time,” he says. “When you’re forecasting for that, you have to incorporate union agreements where people have seniority, or are moving between locations, or have a different union agreement. Those rules all have to be applied, and [you have to take into account] burnout, overtime costs, and all of those things go into the model.”

Without the aid of AI, the complexity and lift involved in this task is considerable, McGuiness says. But thanks to new multimodal models and smaller models focused on specific tasks, it’s getting easier.