AI

For successful AI projects, celebrate your graveyard and be prepared to fail fast

Comment

Image of an origami crane and several crumpled pieces of paper to represent success from failure.
Image Credits: Wachiwit (opens in a new window) / Getty Images

AI teams invest a lot of rigor in defining new project guidelines. But the same is not true for killing existing projects. In the absence of clear guidelines, teams let infeasible projects drag on for months.

They put up a dog and pony show during project review meetings for fear of becoming the messengers of bad news. By streamlining the process to fail fast on infeasible projects, teams can significantly increase their overall success with AI initiatives.

AI projects are different from traditional software projects. They have a lot more unknowns: availability of right datasets, model training to meet required accuracy threshold, fairness and robustness of recommendations in production, and many more.

In order to fail fast, AI initiatives should be managed as a conversion funnel analogous to marketing and sales funnels. Projects start at the top of the five-stage funnel and can drop off at any stage, either to be temporarily put on ice or permanently suspended and added to the AI graveyard. Each stage of the AI funnel defines a clear set of unknowns to be validated with a list of time-bound success criteria.

The AI project funnel has five stages:

Image Credits: Sandeep Uttamchandani

1. Problem definition: “If we build it, will they come?”

This is the top of the funnel. AI projects require significant investments not just during initial development but ongoing monitoring and refinement. This makes it important to verify that the problem being solved is truly worth solving with respect to potential business value compared to the effort to build. Even if the problem is worth solving, AI may not be required. There might be easier human-encoded heuristics to solve the problem.

Developing the AI solution is only half the battle. The other half is how the solution will actually be used and integrated. For instance, in developing an AI solution for predicting customer churn, there needs to be a clear understanding of incorporating attrition predictions in the customer support team workflow. A perfectly powerful AI project will fail to deliver business value without this level of integration clarity.

To successfully exit this stage, the following statements need to be true:

  • The AI project will produce tangible business value if delivered successfully.
  • There are no cheaper alternatives that can address the problem with the required accuracy threshold.
  • There is a clear path to incorporate the AI recommendations within the existing flow to make an impact.

In my experience, the early stages of the project have a higher ratio of aspiration compared to ground realities. Killing an ill-formed project can avoid teams from building “solutions in search of problems.”

2. Data availability : “We have the data to build it.”

At this stage of the funnel, we have verified the problem is worth solving. We now need to confirm the data availability to build the perception, learning and reasoning capabilities required in the AI project. Data needs vary based on the type of AI project  —  the requirements for a project building classification intelligence will be different from one providing recommendations or ranking.

Data availability broadly translates to having the right quality, quantity and features. Right quality refers to the fact that the data samples are an accurate reflection of the phenomenon we are trying to model  and meet properties such as independent and identically distributed. Common quality checks involve uncovering data collection errors, inconsistent semantics and errors in labeled samples.

The right quantity refers to the amount of data that needs to be available. A common misconception is that a significant amount of data is required for training machine learning models. This is not always true. Using pre-built transfer learning models, it is possible to get started with very little data. Also, more data does not always mean useful data. For instance, historic data spanning 10 years may not be a true reflection of current customer behavior. Finally, the right features need to be available to build the model. This is typically iterative and involves ML model design.

To successfully exit this stage, the following statements need to be true:

  • The datasets for the required features are available.
  • The corresponding datasets meet the quality requirements.
  • There are enough historic data samples available in those datasets.

In my experience, projects often are put on ice at this stage. The required features are missing and may take several months for the application teams to gather the datasets.

3. Model training :  “The project meets the accuracy thresholds.”

At this stage, we have confirmed the data is available and have iterated on ML model features. Now, it’s time to verify whether a model can actually be built to satisfy the required accuracy threshold.

Training is an iterative process where different combinations of ML algorithms, model configuration, datasets and input features are tried iteratively with the goal to meet the accuracy threshold. Training is resource-intensive, and given large datasets, the infrastructure capacity can become the limiting factor. This stage verifies that it is feasible to build the model using the existing infrastructure resources or within a feasible cloud budget.

5 machine learning essentials nontechnical leaders need to understand

During the training phase, there is the potential for “false alarms,” when the team has achieved significantly high accuracy numbers that are too good to be true. Before getting excited, it is important to double-check for the training and validation datasets to have duplicate samples. Also, there have been times when the initial tests might be promising but may not generalize over the entire dataset. Randomization of the dataset before training helps to avoid the roller coaster of accuracy variations.

To successfully exit this stage, the AI project is able to meet the required accuracy threshold after training.

4. Results fairness : “Generated results are  not garbage in, garbage out.”

We have confirmed the project can meet accuracy thresholds. Now, it’s time to verify that the results generated are actually fair with respect to bias, explainability, and compliance to privacy and data rights regulations.

Ensuring the fairness of AI recommendations is a topic of significant research. Most datasets are inherently biased and may not capture all the available attributes. Understanding the original purpose and assumptions of the dataset are important. Another common form of bias is underrepresentation —  for instance, a loan underwriting application not trained for a certain category of users or income range scenarios. It is important to evaluate model performance not just for overall accuracy but also across various data slices.

It is not just sufficient for the AI solution to be accurate — it needs to be explainable, i.e., how the algorithm arrived at its conclusions. Several regulated industries using automated decision-making tools are required to provide meaningful information about the generated results to their customers. Explainability can be supported in different forms: result visualization, feature correlations, what-if analysis, model cause-effect interpretability, etc.

To successfully exit this stage, the following statements need to be true:

  • Results have the appropriate checks and bounds for bias and are explainable.
  • The data used by the AI project meets user privacy and compliance regulations such as GDPR and CCPA.

5. Operational fitness: “Is it ready for production ?”

The last stage is to confirm operational fitness. Not all projects require the same operational rigor. I divide projects in a 2×2 matrix based on whether the training and inference are online versus offline. Offline training and inference are the easiest, while online training requires robust data pipelines and monitoring.

There are three core dimensions of operational fitness: model complexity, data pipelines robustness and retraining governance. Complex models are difficult to maintain and debug in production. The key is striking the right balance between simplicity and accuracy: A simple model may be less accurate, while a complex model may be more accurate but may not generalize to new data samples due to overfitting. Similarly, data pipelines are complex to manage given changing data schemas, quality issues and nonstandard business metrics. Finally, retraining needs to take into account changing accuracy due to shifts in data distribution as well as the semantics of features, aka concept drift.

To successfully exit this stage, the following statements need to be true:

  • Models have been optimized with the right balance between complexity and accuracy.
  • Data pipelines are robust with the required level of monitoring.
  • The right level of data and concept drift monitoring is implemented for model retraining.

To succeed in AI initiatives, teams need to fail fast. The five-stage conversion funnel provides a vocabulary for AI teams to communicate the status of projects to business teams replacing their black-box perception of these projects with a list of known unknowns. The funnel also helps identify common dropoff stages across projects that are potential areas of improvement. In a fail-fast culture, the AI graveyard is celebrated for the lessons learned that can be applied to future projects.

How we dodged risks and raised millions for our open-source machine learning startup

More TechCrunch

On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own.…

Anthropic is expanding to Europe and raising more money

If you’re looking for a Starliner mission recap, you’ll have to wait a little longer, because the mission has officially been delayed.

TechCrunch Space: You rock(et) my world, moms

Apple devoted a full event to iPad last Tuesday, roughly a month out from WWDC. From the invite artwork to the polarizing ad spot, Apple was clear — the event…

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

Terri Burns, a former partner at GV, is venturing into a new chapter of her career by launching her own venture firm called Type Capital. 

GV’s youngest partner has launched her own firm

The decision to go monochrome was probably a smart one, considering the candy-colored alternatives that seem to want to dazzle and comfort you.

ChatGPT’s new face is a black hole

Apple and Google announced on Monday that iPhone and Android users will start seeing alerts when it’s possible that an unknown Bluetooth device is being used to track them. The…

Apple and Google agree on standard to alert people when unknown Bluetooth devices may be tracking them

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: Watch here

A human safety operator will be behind the wheel during this phase of testing, according to the company.

GM’s Cruise ramps up robotaxi testing in Phoenix

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and…

OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

6 hours ago
The women in AI making a difference

The expansion of Polar Semiconductor’s facility would enable the company to double its U.S. production capacity of sensor and power chips within two years.

White House proposes up to $120M to help fund Polar Semiconductor’s chip facility expansion

In 2021, Google kicked off work on Project Starline, a corporate-focused teleconferencing platform that uses 3D imaging, cameras and a custom-designed screen to let people converse with someone as if…

Google’s 3D video conferencing platform, Project Starline, is coming in 2025 with help from HP

Over the weekend, Instagram announced it is expanding its creator marketplace to 10 new countries — this marketplace connects brands with creators to foster collaboration. The new regions include South…

Instagram expands its creator marketplace to 10 new countries

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

Four-year-old Mexican BNPL startup Aplazo facilitates fractionated payments to offline and online merchants even when the buyer doesn’t have a credit card.

Aplazo is using buy now, pay later as a stepping stone to financial ubiquity in Mexico

We received countless submissions to speak at this year’s Disrupt 2024. After carefully sifting through all the applications, we’ve narrowed it down to 19 session finalists. Now we need your…

Vote for your Disrupt 2024 Audience Choice favs

Co-founder and CEO Bowie Cheung, who previously worked at Uber Eats, said the company now has 200 customers.

Healthy growth helps B2B food e-commerce startup Pepper nab $30 million led by ICONIQ Growth

Booking.com has been designated a gatekeeper under the EU’s DMA, meaning the firm will be regulated under the bloc’s market fairness framework.

Booking.com latest to fall under EU market power rules

Featured Article

‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Estate is an invite-only website that has helped hundreds of attackers make thousands of phone calls aimed at stealing account passcodes, according to its leaked database.

11 hours ago
‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Squarespace is being taken private in an all-cash deal that values the company on an equity basis at $6.6 billion.

Permira is taking Squarespace private in a $6.9 billion deal

AI-powered tools like OpenAI’s Whisper have enabled many apps to make transcription an integral part of their feature set for personal note-taking, and the space has quickly flourished as a…

Buy Me a Coffee’s founder has built an AI-powered voice note app

Airtel, India’s second-largest telco, is partnering with Google Cloud to develop and deliver cloud and GenAI solutions to Indian businesses.

Google partners with Airtel to offer cloud and GenAI products to Indian businesses

To give AI-focused women academics and others their well-deserved — and overdue — time in the spotlight, TechCrunch has been publishing a series of interviews focused on remarkable women who’ve contributed to…

Women in AI: Rep. Dar’shun Kendrick wants to pass more AI legislation

We took the pulse of emerging fund managers about what it’s been like for them during these post-ZERP, venture-capital-winter years.

A reckoning is coming for emerging venture funds, and that, VCs say, is a good thing

It’s been a busy weekend for union organizing efforts at U.S. Apple stores, with the union at one store voting to authorize a strike, while workers at another store voted…

Workers at a Maryland Apple store authorize strike

Alora Baby is not just aiming to manufacture baby cribs in an environmentally friendly way but is attempting to overhaul the whole lifecycle of a product

Alora Baby aims to push baby gear away from the ‘landfill economy’

Bumble founder and executive chair Whitney Wolfe Herd raised eyebrows this week with her comments about how AI might change the dating experience. During an onstage interview, Bloomberg’s Emily Chang…

Go on, let bots date other bots

Welcome to Week in Review: TechCrunch’s newsletter recapping the week’s biggest news. This week Apple unveiled new iPad models at its Let Loose event, including a new 13-inch display for…

Why Apple’s ‘Crush’ ad is so misguided

The U.K. AI Safety Institute, the U.K.’s recently established AI safety body, has released a toolset designed to “strengthen AI safety” by making it easier for industry, research organizations and…

UK agency releases tools to test AI model safety