Enterprise

Is the modern data stack just old wine in a new bottle?

Comment

Bottle in a paper bag on a gray background. Dark bottle of alcohol in a crumpled brown bag. Close-up. Selective focus.
Image Credits: Mikhail Dmitriev (opens in a new window) / Getty Images

Ashish Kakran

Contributor
Ashish Kakran, principal at Thomvest Ventures, is a product manager/engineer turned investor who enjoys supporting founders with a balance of technical know-how, customer insights, empathy with challenges and market knowledge.

More posts from Ashish Kakran

Remember the cable, phone and internet combo offers that used to land in our mailboxes? These offers were highly optimized for conversion, and the type of offer and the monthly price could vary significantly between two neighboring houses or even between condos in the same building.

I know this because I used to be a data engineer and built extract-transform-load (ETL) data pipelines for this type of offer optimization. Part of my job involved unpacking encrypted data feeds, removing rows or columns that had missing data, and mapping the fields to our internal data models. Our statistics team then used the clean, updated data to model the best offer for each household.

That was almost a decade ago. If you take that process and run it on steroids for 100x larger datasets today, you’ll get to the scale that midsized and large organizations are dealing with today.

For example, a single video conferencing call can generate logs that require hundreds of storage tables. Cloud has fundamentally changed the way business is done because of the unlimited storage and scalable compute resources you can get at an affordable price.

To put it simply, this is the difference between old and modern stacks:

Image Credits: Ashish Kakran, Thomvest Ventures

Why do data leaders today care about the modern data stack?

Self-service analytics

Citizen-developers want access to critical business dashboards in real time. They want automatically updating dashboards built on top of their operational and customer data.

For example, the product team can use real-time product usage and customer renewal data for decision-making. Cloud makes data truly accessible to everyone, but there is a need for self-service analytics compared to legacy, static, on-demand reports and dashboards.

Serving predictions

Once machine learning models are trained and ready to be used, there needs to be an easy way for different teams within an organization to benefit from them. This is typically achieved via a simple URL that accepts requests and returns predictions. Building these microservices and maintaining them is a core challenge when you are serving thousands of HTTP requests per second.

Data transformation

Data scientists want to be able to track older versions of data so that they can run experiments and know what version of data was used to complete training. This need is creating popular products that are optimized for in-place transformation of data.

Data quality

Some cutting-edge data organizations now prefer a data-centric approach to a model-centric approach. The belief that more data means better results is being replaced by the belief that the quality of data matters more. Typically, trained models are observed using two parameters, precision and recall. Precision tells you the proportion of positive identification that was actually correct, and recall tells you the proportion of actual positives that were correctly identified. Now imagine ensuring data quality for real-time data streams coming at you in a variety of different formats.

How do the legacy and modern data stacks compare?

Generally speaking, the modern data stack is about leveraging cloud resources to more effectively analyze complex streaming data.

Image Credits: Ashish Kakran, Thomvest Ventures

Here are a few key trends that enterprises should note:

  • The ETL process is becoming EL (T), which means the data is first dumped as it is received in certain locations like a data lake. This way, the storage systems don’t complain about the format of data as it is stored. Once the data is stored, then it can be processed in-place for analytics. By doing this, the firehose of continuous data can be more effectively managed, processed and analyzed.
  • Data observability has become critical. Data fails silently, and with rapidly evolving data stacks, it is necessary to be able to monitor data and set alerts to fix issues. You don’t want your trained models that teach Spanish to accidently train on English words or on missing data. One just can’t visually analyze and fix millions of rows of data.
  • The emergence of the chief data/AI/data and analytics officer. Data is such a complex problem that CIOs now have CDOs/CAOs/CDAOs reporting to them. While we started the 21st Century talking about data as competitive advantage, we are now in a time when unmanaged data becomes toxic. There are regulatory laws about how data can be used, shared or handled. How do you comply with a customer’s request to delete all their data if you don’t even know where and in what form it is stored in?

Opportunities

Each step of the data analysis process is ripe for disruption. While visionary founders are building cloud-native tools to win emerging data categories, the incumbents have been slower to react. Whether building data pipelines or ML pipelines, organizations today have a variety of open and closed source technologies to choose from.

Image Credits: Ashish Kakran, Thomvest Ventures

Practitioners are spoilt for choices when building enterprise data pipelines.

Image Credits: Ashish Kakran, Thomvest Ventures

The efficient data stack for data engineers, database developers and data scientists changes every four to five years. Companies moved to big data analytics to analyze large datasets in private data centers, and though it promised many benefits, big data remains technically complex to implement. The modern data stack makes this easy by leveraging the scale, reliability and resilience of the cloud.

The rules are being rewritten on how data will be used for competitive advantage, and it won’t be long before the winners emerge. Incumbents are redesigning their legacy software to run on the cloud, but our bet is on nimble teams run by visionary founders.

More TechCrunch

New York-based Revel has made a lot of pivots since initially launching in 2018 as a dockless e-moped sharing service. The BlackRock-backed startup briefly stepped into the e-bike subscription business.…

Revel’s latest pivot: Ditching all-employee ride-hail in favor of gig worker model

Google says apps offering AI features will have to prevent the generation of restricted content.

Google Play cracks down on AI apps after circulation of apps for making deepfake nudes

The British retailers association also takes aim at Amazon’s “Buy Box,” claiming that Amazon manipulated which retailers were selected for the coveted placement.

UK retailers file a £1.1B collective action against Amazon over claims of data misuse

Featured Article

Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Rivian has changed 600 parts on its R1S SUV and R1T pickup truck in a bid to drive down manufacturing costs, while improving performance of its flagship vehicles.  The end goal, which will play out over the coming year, is an existential one. Rivian lost about $38,784 on every vehicle…

53 mins ago
Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Twitch has come up with a solution for the ongoing copyright issues that DJs encounter on the platform. The company announced Thursday a new program that enables DJs to stream…

Twitch DJs will now have to pay music labels to play songs in livestreams

Google said today it is partnering with RapidSOS, a platform for emergency first responders, to enable users to contact 911 through RCS (Rich Messaging Service).

Google partners with RapidSOS to enable 911 contact through RCS

Long before product-led growth became a buzzword, Atlassian offered free tiers for virtually all of its productivity and developer tools. Today, that mostly means free access for up to ten…

Atlassian now gives startups a year of free access

Featured Article

A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Artists have finally had enough with Meta’s predatory AI policies, but Meta’s loss is Cara’s gain. An artist-run, anti-AI social platform, Cara has grown from 40,000 to 650,000 users within the last week, catapulting it to the top of the App Store charts. Instagram is a necessity for many artists,…

1 hour ago
A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Google has developed a new AI tool to help marine biologists better understand coral reef ecosystems and their health, which can aid in conversation efforts. The tool, SurfPerch, created with…

Google looks to AI to help save the coral reefs

Only a few years ago, one of the hottest topics in enterprise software was ‘robotic process automation’ (RPA). It doesn’t feel like those services, which tried to automate a lot…

Tektonic AI raises $10M to build GenAI agents for automating business operations

SpaceX achieved a key milestone in its Starship flight test campaign: returning the booster and the upper stage back to Earth.

SpaceX launches mammoth Starship rocket and brings it back for the first time

There’s a lot of buzz about generative AI and what impact it might have on businesses. But look beyond the hype and high-profile deals like the one between OpenAI and…

Sirion, now valued around $1B, acquires Eigen as consolidation comes to enterprise AI tooling

Carlo Kobe and Scott Smith believed so strongly in the need for a debit card product designed specifically for Gen Zers that they dropped out of Harvard and Cornell at…

Kleiner Perkins leads $14.4M seed round into Fizz, a credit-building debit card aimed at Gen Z college students

A new app called MyGlimpact is intended not only to help people understand their environmental footprint, but why they shouldn’t feel guilty about it.

How many Earths does your lifestyle require?

Prolific Machines believes it has a way of transitioning away from molecules to something better: light.

Prolific Machines, with a $55M Series B, shines ‘light’ on a better way to grow lab proteins for food and medicine

It’s been 20 years since Shira Yevin, the lead singer of punk band Shiragirl drove a pink RV into the Vans Warped Tour grounds, the now-defunct punk rock festival notorious…

Punk singer Shira Yevin pushes for fair pay with InPink, a women-focused job marketplace

While the transport industry does use legacy software, many of these platforms are from an earlier era. Qargo hopes its newer technologies can help it leapfrog the competition.

Qargo raises $14M to digitize and decarbonize the trucking industry

When you look at how generative AI is being implemented across developer tools, the focus for the most part has been on generating code, as with Github Copilot. Greptile, an…

Greptile raises $4M to build an AI-fueled code base expert

The models tended to answer questions inconsistently, which reflects biases embedded in the data used to train the models.

Study finds that AI models hold opposing views on controversial topics

A growing number of businesses are embracing data models — abstract models that organize elements of data and standardize how they relate to one another. But as the data analytics…

Cube is building a ‘semantic layer’ for company data

Stock-trading app Robinhood is diving deeper into the cryptocurrency realm with the acquisition of crypto exchange Bitstamp.

Robinhood acquires global crypto exchange Bitstamp for $200M

Torpago’s Powered By product is geared for regional and community banks, with under $20 billion in assets, to launch their own branded cards and spend management programs.

Fintech Torpago has a unique way to compete with Brex and Ramp: turning banks into customers

Over half of Americans wear corrective glasses or contact lenses. While there isn’t a shortage of low-cost and luxury frames available online or in stores, consumers can only buy them…

Eyebot raised $6M for AI-powered kiosks that provide 90-second vision exams without an optometrist

Google on Thursday said it is rolling out NotebookLM, its AI-powered note-taking assistant, to over 200 new countries, nearly six months after opening its access in the U.S. The platform,…

Google’s updated AI-powered NotebookLM expands to India, UK and over 200 other countries

Inflation and currency devaluation have always been a growing concern for Africans with bank accounts.

Starting in war-torn Sudan, YC-backed Elevate now provides fintech to freelancers globally

Featured Article

Amazon buys Indian video streaming service MX Player

Amazon has agreed to acquire key assets of Indian video streaming service MX Player from the local media powerhouse Times Internet, the latest step by the e-commerce giant to make its services and brand popular in smaller cities and towns in the key overseas market.  The two firms reached a…

9 hours ago
Amazon buys Indian video streaming service MX Player

Dealt is now building a service platform for retailers instead of end customers.

Dealt turns retailers into service providers and proves that pivots sometimes work

Snowflake is the latest company in a string of high-profile security incidents and sizable data breaches caused by the lack of MFA.

Hundreds of Snowflake customer passwords found online are linked to info-stealing malware

The buy will benefit ChromeOS, Google’s lightweight Linux-based operating system, by giving ChromeOS users greater access to Windows apps “without the hassle of complex installations or updates.”

Google acquires Cameyo to bring Windows apps to ChromeOS

Mistral is no doubt looking to grow revenue as it faces considerable — and growing — competition in the generative AI space.

Mistral launches new services and SDK to let customers fine-tune its models