Home Careers What is your data strategy for an AI future?

by Martin De Saulles

Contributing writer

What is your data strategy for an AI future?

Opinion

Jan 22, 20245 mins

Artificial IntelligenceCIOData Management

Access to sufficient, reliable, and timely data will be a key determinant of success for enterprises over the coming years as AI transforms business workflows.

Businessman, colleagues and laptop at office in night for strategy, goals and target for investment in stock market. Stock broker team, reading and computer with innovation, working late and focus

Credit: PeopleImages.com - Yuri A / Shutterstock

As enterprises become more data-driven, the old computing adage garbage in, garbage out (GIGO) has never been truer. The application of AI to many business processes will only accelerate the need to ensure the veracity and timeliness of the data used, whether generated internally or sourced externally.

The costs of bad data

Gartner has estimated that organizations lose an average of $12.9m a year from using poor quality data. And IBM calculate that bad data is costing the US economy more than $3 trillion a year. Most of these costs relate to the work carried out within enterprises checking and correcting data as it moves through and across departments. IBM believes that half of knowledge workers’ time is wasted on these activities.

Apart from these internal costs, there’s the greater problem of reputational damage among customers, regulators, and suppliers from organizations acting improperly based on bad or misleading data. Sports Illustrated and its CEO found this out recently when it was revealed the magazine published articles written by fake authors with AI-generated images. While the CEO lost his job, the parent company, Arena Group, lost 20% of its market value. There’ve also been several high-profile cases of legal firms getting into hot water by submitting fake, AI-generated cases as evidence of precedence in legal disputes.

The AI black box

Although costly, checking and correcting the data used in corporate decision making and business operations has become an established practice for most enterprises. However, understanding what’s going on with some large language models (LLMs) in terms of how they’ve been trained, and on what data and whether the outputs can be trusted, is another matter considering the increasing rate of hallucinations. In Australia, for instance, an elected regional mayor has threatened to sue OpenAI over a false claim made by the company’s ChatGPT that he had served prison time for bribery whereas, in fact, he had been a whistleblower on criminal activity.

Training an LLM on trusted data and adopting approaches such as iterative querying, retrieval-augmented generation, or reasoning are good ways to significantly lessen the dangers of hallucinations, but can’t guarantee they won’t occur.

Training on synthetic data

As companies seek a competitive advantage through deploying AI systems, the rewards may go to those with access to sufficient and relevant proprietary data to train their models. But what about most enterprises without access to such data? Researchers have predicted that high-quality text data used for training LLM models will run out before 2026 if current trends continue.

One answer to this impending problem will be an increased use of synthetic training data. Gartner estimates that by 2030, synthetic data will overtake the use of real data in AI models. However, returning to the GIGO warning, an over-reliance on synthetic data risks accelerating the dangers of inaccurate outputs and poor decision making; such data is only as good as the models that created it. A longer-term danger may arise from “data inbreeding,” as AI models are trained on sub-standard synthetic data that produce outputs, which are then fed back into later models.

Moving with caution

The AI genie is out of the bottle, and while it’ll take more time for the widespread digital revolution promised by some overly-enthusiastic technology vendors and consultants to occur, AI will continue to transform businesses in ways we can’t yet imagine. However, access to reliable and trusted data available at the scale needed by enterprises is already a bottleneck that CIOs and other business leaders have to find ways to remedy before it’s too late.

by Martin De Saulles

Contributing writer

Dr. Martin De Saulles is a writer and academic specializing in researching and writing about data-driven innovation and artificial intelligence. See more of his work on his blog, www.martindesaulles.com

Africa

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

What is your data strategy for an AI future?

Access to sufficient, reliable, and timely data will be a key determinant of success for enterprises over the coming years as AI transforms business workflows.

The costs of bad data

The AI black box

Training on synthetic data

Moving with caution

More from this author

Is AI in the enterprise ready for primetime? Not yet.

It’s a new dawn of AI-powered knowledge management

5 ways AI will transform CRM

Show me more

Oracle adds AI capabilities to its Fusion Cloud CX

What LinkedIn learned leveraging LLMs for its billion users

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

What is your data strategy for an AI future?

Access to sufficient, reliable, and timely data will be a key determinant of success for enterprises over the coming years as AI transforms business workflows.

The costs of bad data

The AI black box

Training on synthetic data

Moving with caution

Related content

TransUnion transforms its business with IT

The 10 highest-paying industries for IT talent

M&A action is gaining momentum, are your cloud security leaders prepared?

CIOs eager to scale AI despite difficulty demonstrating ROI, survey finds

From our editors straight to your inbox

More from this author

Is AI in the enterprise ready for primetime? Not yet.

It’s a new dawn of AI-powered knowledge management

5 ways AI will transform CRM

Show me more

Oracle adds AI capabilities to its Fusion Cloud CX

What LinkedIn learned leveraging LLMs for its billion users

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group