Enterprise

Is the modern data stack just old wine in a new bottle?

Comment

Bottle in a paper bag on a gray background. Dark bottle of alcohol in a crumpled brown bag. Close-up. Selective focus.
Image Credits: Mikhail Dmitriev (opens in a new window) / Getty Images

Ashish Kakran

Contributor

Ashish Kakran, principal at Thomvest Ventures, is a product manager/engineer turned investor who enjoys supporting founders with a balance of technical know-how, customer insights, empathy with challenges and market knowledge.

More posts from Ashish Kakran

Remember the cable, phone and internet combo offers that used to land in our mailboxes? These offers were highly optimized for conversion, and the type of offer and the monthly price could vary significantly between two neighboring houses or even between condos in the same building.

I know this because I used to be a data engineer and built extract-transform-load (ETL) data pipelines for this type of offer optimization. Part of my job involved unpacking encrypted data feeds, removing rows or columns that had missing data, and mapping the fields to our internal data models. Our statistics team then used the clean, updated data to model the best offer for each household.

That was almost a decade ago. If you take that process and run it on steroids for 100x larger datasets today, you’ll get to the scale that midsized and large organizations are dealing with today.

For example, a single video conferencing call can generate logs that require hundreds of storage tables. Cloud has fundamentally changed the way business is done because of the unlimited storage and scalable compute resources you can get at an affordable price.

To put it simply, this is the difference between old and modern stacks:

Image Credits: Ashish Kakran, Thomvest Ventures

Why do data leaders today care about the modern data stack?

Self-service analytics

Citizen-developers want access to critical business dashboards in real time. They want automatically updating dashboards built on top of their operational and customer data.

For example, the product team can use real-time product usage and customer renewal data for decision-making. Cloud makes data truly accessible to everyone, but there is a need for self-service analytics compared to legacy, static, on-demand reports and dashboards.

Serving predictions

Once machine learning models are trained and ready to be used, there needs to be an easy way for different teams within an organization to benefit from them. This is typically achieved via a simple URL that accepts requests and returns predictions. Building these microservices and maintaining them is a core challenge when you are serving thousands of HTTP requests per second.

Data transformation

Data scientists want to be able to track older versions of data so that they can run experiments and know what version of data was used to complete training. This need is creating popular products that are optimized for in-place transformation of data.

Data quality

Some cutting-edge data organizations now prefer a data-centric approach to a model-centric approach. The belief that more data means better results is being replaced by the belief that the quality of data matters more. Typically, trained models are observed using two parameters, precision and recall. Precision tells you the proportion of positive identification that was actually correct, and recall tells you the proportion of actual positives that were correctly identified. Now imagine ensuring data quality for real-time data streams coming at you in a variety of different formats.

How do the legacy and modern data stacks compare?

Generally speaking, the modern data stack is about leveraging cloud resources to more effectively analyze complex streaming data.

Image Credits: Ashish Kakran, Thomvest Ventures

Here are a few key trends that enterprises should note:

  • The ETL process is becoming EL (T), which means the data is first dumped as it is received in certain locations like a data lake. This way, the storage systems don’t complain about the format of data as it is stored. Once the data is stored, then it can be processed in-place for analytics. By doing this, the firehose of continuous data can be more effectively managed, processed and analyzed.
  • Data observability has become critical. Data fails silently, and with rapidly evolving data stacks, it is necessary to be able to monitor data and set alerts to fix issues. You don’t want your trained models that teach Spanish to accidently train on English words or on missing data. One just can’t visually analyze and fix millions of rows of data.
  • The emergence of the chief data/AI/data and analytics officer. Data is such a complex problem that CIOs now have CDOs/CAOs/CDAOs reporting to them. While we started the 21st Century talking about data as competitive advantage, we are now in a time when unmanaged data becomes toxic. There are regulatory laws about how data can be used, shared or handled. How do you comply with a customer’s request to delete all their data if you don’t even know where and in what form it is stored in?

Opportunities

Each step of the data analysis process is ripe for disruption. While visionary founders are building cloud-native tools to win emerging data categories, the incumbents have been slower to react. Whether building data pipelines or ML pipelines, organizations today have a variety of open and closed source technologies to choose from.

Image Credits: Ashish Kakran, Thomvest Ventures

Practitioners are spoilt for choices when building enterprise data pipelines.

Image Credits: Ashish Kakran, Thomvest Ventures

The efficient data stack for data engineers, database developers and data scientists changes every four to five years. Companies moved to big data analytics to analyze large datasets in private data centers, and though it promised many benefits, big data remains technically complex to implement. The modern data stack makes this easy by leveraging the scale, reliability and resilience of the cloud.

The rules are being rewritten on how data will be used for competitive advantage, and it won’t be long before the winners emerge. Incumbents are redesigning their legacy software to run on the cloud, but our bet is on nimble teams run by visionary founders.

More TechCrunch

To give AI-focused women academics and others their well-deserved — and overdue — time in the spotlight, TechCrunch has been publishing a series of interviews focused on remarkable women who’ve contributed to…

Women in AI: Rep. Dar’shun Kendrick wants to pass more AI legislation

We took the pulse of emerging fund managers about what it’s been like for them during these post-ZERP, venture-capital-winter years.

A reckoning is coming for emerging venture funds, and that, VCs say, is a good thing

It’s been a busy weekend for union organizing efforts at U.S. Apple stores, with the union at one store voting to authorize a strike, while workers at another store voted…

Workers at a Maryland Apple store authorize strike

Alora Baby is not just aiming to manufacture baby cribs in an environmentally friendly way but is attempting to overhaul the whole lifecycle of a product

Alora Baby aims to push baby gear away from the ‘landfill economy’

Bumble founder and executive chair Whitney Wolfe Herd raised eyebrows this week with her comments about how AI might change the dating experience. During an onstage interview, Bloomberg’s Emily Chang…

Go on, let bots date other bots

Welcome to Week in Review: TechCrunch’s newsletter recapping the week’s biggest news. This week Apple unveiled new iPad models at its Let Loose event, including a new 13-inch display for…

Why Apple’s ‘Crush’ ad is so misguided

The U.K. Safety Institute, the U.K.’s recently established AI safety body, has released a toolset designed to “strengthen AI safety” by making it easier for industry, research organizations and academia…

U.K. agency releases tools to test AI model safety

AI startup Runway’s second annual AI Film Festival showcased movies that incorporated AI tech in some fashion, from backgrounds to animations.

At the AI Film Festival, humanity triumphed over tech

Rachel Coldicutt is the founder of Careful Industries, which researches the social impact technology has on society.

Women in AI: Rachel Coldicutt researches how technology impacts society

SAP Chief Sustainability Officer Sophia Mendelsohn wants to incentivize companies to be green because it’s profitable, not just because it’s right.

SAP’s chief sustainability officer isn’t interested in getting your company to do the right thing

Here’s what one insider said happened in the days leading up to the layoffs.

Tesla’s profitable Supercharger network is in limbo after Musk axed the entire team

StrictlyVC events deliver exclusive insider content from the Silicon Valley & Global VC scene while creating meaningful connections over cocktails and canapés with leading investors, entrepreneurs and executives. And TechCrunch…

Meesho, a leading e-commerce startup in India, has secured $275 million in a new funding round.

Meesho, an Indian social commerce platform with 150M transacting users, raises $275M

Some Indian government websites have allowed scammers to plant advertisements capable of redirecting visitors to online betting platforms. TechCrunch discovered around four dozen “gov.in” website links associated with Indian states,…

Scammers found planting online betting ads on Indian government websites

Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company.  Earlier this week, TechCrunch reported…

Motional cut about 550 employees, around 40%, in recent restructuring, sources say

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: What we know so far

The deck included some redacted numbers, but there was still enough data to get a good picture.

Pitch Deck Teardown: Cloudsmith’s $15M Series A deck

Unlike ChatGPT, Claude did not become a new App Store hit.

Anthropic’s Claude sees tepid reception on iOS compared with ChatGPT’s debut

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Look,…

Startups Weekly: Trouble in EV land and Peloton is circling the drain

Scarcely five months after its founding, hard tech startup Layup Parts has landed a $9 million round of financing led by Founders Fund to transform composites manufacturing. Lux Capital and Haystack…

Founders Fund leads financing of composites startup Layup Parts

AI startup Anthropic is changing its policies to allow minors to use its generative AI systems — in certain circumstances, at least.  Announced in a post on the company’s official…

Anthropic now lets kids use its AI tech — within limits

Zeekr’s market hype is noteworthy and may indicate that investors see value in the high-quality, low-price offerings of Chinese automakers.

The buzziest EV IPO of the year is a Chinese automaker

Venture capital has been hit hard by souring macroeconomic conditions over the past few years and it’s not yet clear how the market downturn affected VC fund performance. But recent…

VC fund performance is down sharply — but it may have already hit its lowest point

The person who claims to have 49 million Dell customer records told TechCrunch that he brute-forced an online company portal and scraped customer data, including physical addresses, directly from Dell’s…

Threat actor says he scraped 49M Dell customer addresses before the company found out

The social network has announced an updated version of its app that lets you offer feedback about its algorithmic feed so you can better customize it.

Bluesky now lets you personalize main Discover feed using new controls

Microsoft will launch its own mobile game store in July, the company announced at the Bloomberg Technology Summit on Thursday. Xbox president Sarah Bond shared that the company plans to…

Microsoft is launching its mobile game store in July

Smart ring maker Oura is launching two new features focused on heart health, the company announced on Friday. The first claims to help users get an idea of their cardiovascular…

Oura launches two new heart health features

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI considers allowing AI porn

Garena is quietly developing new India-themed games even though Free Fire, its biggest title, has still not made a comeback to the country.

Garena is quietly making India-themed games even as Free Fire’s relaunch remains doubtful

The U.S.’ NHTSA has opened a fourth investigation into the Fisker Ocean SUV, spurred by multiple claims of “inadvertent Automatic Emergency Braking.”

Fisker Ocean faces fourth federal safety probe