How to run data on Kubernetes: 6 starting principles

6:30 AM PST • November 25, 2022

Containers in the cloud; kubernetes — **Image Credits:** SerrNovik (opens in a new window) / Getty Images

Sylvain Kalache

Contributor

Sylvain Kalache is the co-founder of Holberton, an edtech company training digital talent in more than 10 countries. An entrepreneur and software engineer, he has worked in the tech industry for more than a decade. Part of the team that led SlideShare to be acquired by LinkedIn, he has written for CIO and VentureBeat.

Do it progressively

Kubernetes is on its way to being as popular as Linux and the de facto way of running any application, anywhere, in a distributed fashion. Using Kubernetes involves learning a lot of technical concepts and vocabulary. For instance, newcomers might struggle with the many Kubernetes logical units such as containers, pods, nodes and clusters.

If you are not running Kubernetes in production yet, don’t jump directly into data workloads. Instead, start with moving stateless applications to avoid losing data when things go sideways.

Understand the limitations and specificities

Once you are familiar with general Kubernetes concepts, dive into the specifics for stateful concepts. For example, because applications may have different storage needs, such as performance or capacity requirements, you must provide the correct underlying storage system.

What the industry generally calls storage “profiles” is termed Storage Classes in Kubernetes. They provide a way to describe the different types of classes a Kubernetes cluster can access. Storage classes can have different quality-of-service levels, such as I/O operations per second per GiB, backup policies or arbitrary policies such as binding modes and allowed topologies.

Another critical component to understand is StatefulSet. It is the Kubernetes API object used to manage stateful applications and offers key features such as:

Stable, unique network identifiers that let you keep track of volume, and allows you to detach and reattach them as you please.
Stable, persistent storage so that your data is safe.
Ordered, graceful deployment and scaling, which is required for many Day 2 operations.

While StatefulSet has been a successful replacement for the infamous PetSet (now deprecated), it is still imperfect and has limitations. For example, the StatefulSet controller has no built-in support for volume (PVC) resizing — which is a major challenge if the size of your application dataset is about to grow above the current allocated storage capacity. There are workarounds, but such limitations must be understood well ahead of time so that the engineering team knows how to handle them.

Come up with a plan

Once you are comfortable with Kubernetes stateful concepts, you can progressively migrate your data workloads in a specific order. This allows you to learn from your mistakes and avoid being overwhelmed, because not all data technologies are equally easy to run on Kubernetes.

Established technologies, such as databases and storage, should be migrated first, and emerging tech, such as AI and ML, should be done last. This is reflected in a recent report, which found database and persistent storage are the two most-run data workloads on Kubernetes. The main reason is the lack of tooling for Day 2 operations. We will explore this in the next section.

Check for operator availability

Moving your stateful workloads to Kubernetes is only half the job — also known as Day 1. Now you need to handle Day 2 operations (one of the most discussed topics at the last KubeCon). This is where things get tricky. There are tons of Day 2 operations that Kubernetes cannot handle natively such as patching and upgrading, backup and recovery, log processing, monitoring, scaling and tuning.

All these operations are application specific. For example, a PostgreSQL and MySQL cluster will require two completely different approaches when picking a new primary server in an HA cluster configuration. Kubernetes cannot possibly know all the application’s specific Day 2 operations. This is where operators come in.

Operators are programmable extensions that perform operations that Kubernetes cannot handle natively. Operators provide intelligent, dynamic management capabilities by extending the functionality of the Kubernetes API. One of the most common uses is conducting these Day 2 operations. These operators aren’t developed by the Kubernetes maintainers but by third-party developers and organizations.

Before moving a data workload to Kubernetes, make sure there is an operator for it. OperatorHub does a great job of indexing them. With 282 operators available on the site, the distribution echoes what we discussed earlier: Some workloads have supporting tools, and some don’t. For example, the database category has 38 operators — there are eight for PostgreSQL alone — while the entire ML/AI category only has seven.

Pick the right level of operator capability

Having an operator for your technology isn’t enough, because they can have different capabilities and often exist at various levels of maturity. The OperatorFramework suggests a capability model that categorizes operators according to their features:

Level 1: Works for basic installation, such as automated application provisioning and configuration management.
Level 2: Supports seamless upgrades, patches and minor version upgrades.
Level 3: Handles the full app and storage lifecycle (backup, failure recovery, etc.).
Level 4: Provides deep insights, metrics, alerts, log processing and workload analysis.
Level 5: Offers automatic horizontal/vertical scaling, auto-config tuning, abnormality detection and scheduling tuning.

When choosing an operator, make sure its capabilities match your needs. If you are unsure which level is right for you, the Data on Kubernetes Report 2022 found that most organizations are looking for operators that are at least at Level 3. Having a backup for your stateful workloads sounds like a good idea.

If you can’t find an operator that matches your needs, don’t worry because most of them are open source. You can extend existing operators’ capabilities with internal development or, even better, contribute to the open source project.

Understand the operator

Operators’ extensibility is their strength, but it’s also their weakness. The lack of standards means they are programmed differently, so you must look at their config files to pick the format you like best.

What’s more, operators may use different technical routes to achieve the same goal. For example, one of the eight PostgreSQL operators, CloudNativePG, does not use StatefulSets, and instead uses its own custom controller. That’s quite unexpected considering that StatefulSets is the foundation for stateful workloads on Kubernetes.

Its developers decided to go with this design because of the inability of StatefulSet to resize PVCs (as we discussed earlier). As the operator documentation explains, picking “different [design directions] lead to other compromises.” So when picking an operator, be sure to understand its implementation and trade-offs, and go with the one you are the most comfortable with.

It’s worth the effort

As you can see, running data on Kubernetes isn’t always easy, but the good news is that it’s worth the hard work: 54% of surveyed organizations attributed more than 10% of their revenue to the fact that they run data on Kubernetes. What’s more, 33% said it has a transformative impact on productivity and another 51% saw a significant positive impact.

As organizations increasingly adopt multicloud infrastructure to optimize their cost and infrastructure performance, Kubernetes has become the tool of choice. With an estimated 66% of countries having some sort of data privacy and consumer rights legislation, which often requires enforcing data sovereignty, companies must increasingly host user data in the countries they operate in. Kubernetes is here to stay.

More TechCrunch

Why being the last company to launch in a category can pay off

Rebecca Szkutak

5 hours ago

When Jordan Nathan launched his DTC nontoxic cookware company, Caraway, in 2019, he knew he was not the only founder trying to sell a new brand of pots and pans…

Why being the last company to launch in a category can pay off

This humanoid robot can drive cars — sort of

Kyle Wiggers

6 hours ago

Out of an abundance of caution, the car took two minutes to turn a corner.

This humanoid robot can drive cars — sort of

Transportation

Ahead of Tesla’s big shareholder vote, let’s re-read the judge’s opinion that got us here

Sean O'Kane

9 hours ago

There has been a silly amount of drama in the run-up to Tesla‘s annual shareholder meeting on Thursday. The company is set to hold a vote on “re-ratifying” the $56…

Ahead of Tesla’s big shareholder vote, let’s re-read the judge’s opinion that got us here

Apps

iOS 18 cracks down on apps asking for full address book access

Sarah Perez

10 hours ago

To give users more control over the contacts an app can and cannot access, the permissions screen has two stages.

iOS 18 cracks down on apps asking for full address book access

Robotics

Generative AI takes robots a step closer to general purpose

Brian Heater

10 hours ago

The push to produce a robotic intelligence that can fully leverage the wide breadth of movements opened up by bipedal humanoid design has been a key topic for researchers.

Generative AI takes robots a step closer to general purpose

Transportation

Ford’s secretive, low-cost EV team is growing with talent from Rivian, Tesla and Apple

Sean O'Kane

10 hours ago

A TechCrunch review of LinkedIn data found that Ford has built this team up to around 300 employees over the last year.

Ford’s secretive, low-cost EV team is growing with talent from Rivian, Tesla and Apple

Transportation

Tern AI wants to reduce reliance on GPS with low-cost navigation alternative

Rebecca Bellan

10 hours ago

The most critical systems of our modern world rely on GPS, from aviation and road networks to emergency and disaster response, from precision farming and power grids to weather forecasting…

Tern AI wants to reduce reliance on GPS with low-cost navigation alternative

Fintech

Fintech Brex abandons co-CEO model, talks IPO, cash burn and plans for a secondary sale

Mary Ann Azevedo

11 hours ago

Since fintech startup Brex’s inception in 2017, its two co-founders Henrique Dubugras and Pedro Franceschi have run the company as co-CEOs. But starting today, the pair told TechCrunch in an…

Fintech Brex abandons co-CEO model, talks IPO, cash burn and plans for a secondary sale

This Week in AI: Apple won’t say how the sausage gets made

Kyle Wiggers

12 hours ago

Hiya, folks, and welcome to TechCrunch’s regular AI newsletter. This week in AI, Apple stole the spotlight. At the company’s Worldwide Developers Conference (WWDC) in Cupertino, Apple unveiled Apple Intelligence,…

This Week in AI: Apple won’t say how the sausage gets made

Fintech

India’s 360 One acquires mutual fund app ET Money for $44M

Manish Singh

12 hours ago

India’s largest wealth manager focused on ultra-high-net-worth individuals, 360 One WAM, has agreed to acquire popular Indian mutual fund investment app ET Money for about $44 million. Earlier called IIFL…

India’s 360 One acquires mutual fund app ET Money for $44M

Helen Toner worries ‘not super functional’ Congress will flub AI policy

Kyle Wiggers

12 hours ago

Helen Toner, a former OpenAI board member and the director of strategy at Georgetown’s Center for Security and Emerging Technology, is worried Congress might react in a “knee-jerk” way where…

Helen Toner worries ‘not super functional’ Congress will flub AI policy

TechCrunch Disrupt 2024

Layoffs Got You Down? Get a Half-Price Expo+ Pass at Disrupt 2024

TechCrunch Events

13 hours ago

Layoffs are tough. This year alone, we’ve already seen 60,000 job cuts across 254 companies according to layoffs.fyi. Looking for ways to grow your network can be even harder during…

Layoffs Got You Down? Get a Half-Price Expo+ Pass at Disrupt 2024

Media & Entertainment

YouTube creators can now test multiple video thumbnails

Lauren Forristal

13 hours ago

YouTube announced this week the rollout of “Thumbnail Test & Compare,” a new tool for creators to see which thumbnail performs the best. The feature first launched to select creators…

YouTube creators can now test multiple video thumbnails

Transportation

Waymo issues second recall after robotaxi hit telephone pole

Rebecca Bellan

14 hours ago

Waymo has voluntarily issued a software recall to all 672 of its Jaguar I-Pace robotaxis after one of them collided with a telephone pole. This is Waymo’s second recall. The…

Waymo issues second recall after robotaxi hit telephone pole

Enterprise

Insight Partners backs Canary Technologies’ mission to elevate hotel guest experiences

Christine Hall

15 hours ago

The hotel guest management technology company’s platform digitizes the hotel guest journey from post-booking through checkout.

Insight Partners backs Canary Technologies’ mission to elevate hotel guest experiences

Here’s everything Apple announced at the WWDC 2024 keynote, including Apple Intelligence, Siri makeover

Christine Hall

15 hours ago

The TechCrunch team runs down all of the biggest news from the Apple WWDC 2024 keynote in an easy-to-skim digest.

Here’s everything Apple announced at the WWDC 2024 keynote, including Apple Intelligence, Siri makeover

Fintech

Lightspeed Venture Partners leads $4.3M seed in automated financial reporting fintech InScope

Christine Hall

16 hours ago

InScope leverages machine learning and large language models to provide financial reporting and auditing processes for mid-market and enterprises.

Venture

Foresite Capital raises $900M sixth fund for investing in life sciences companies

Marina Temkin

16 hours ago

Venture fundraising has been a slog over the last few years, even for firms with a strong track record. That’s Foresite Capital’s experience. Despite having 47 IPOs, 28 M&As and…

Foresite Capital raises $900M sixth fund for investing in life sciences companies

Enterprise

Databricks expands Mosaic AI to help enterprises build with LLMs

Frederic Lardinois

16 hours ago

A year ago, Databricks acquired MosaicML for $1.3 billion. Now rebranded as Mosaic AI, the platform has become integral to Databricks’ AI solutions. Today, at the company’s Data + AI…

Databricks expands Mosaic AI to help enterprises build with LLMs

Commerce

YC grad RetailReady raises $3.3M for an AI warehouse app that hopes to save brands billions

Christine Hall

16 hours ago

RetailReady targets the $40 billion compliance market to help reduce the number of retail compliance losses that shippers incur annually due to incorrectly shipped packages.

YC grad RetailReady raises $3.3M for an AI warehouse app that hopes to save brands billions

Enterprise

Databricks launches LakeFlow to help its customers build their data pipelines

Frederic Lardinois

16 hours ago

Since its launch in 2013, Databricks has relied on its ecosystem of partners, such as Fivetran, Rudderstack, and dbt, to provide tools for data preparation and loading. But now, at…

Databricks launches LakeFlow to help its customers build their data pipelines

TechCrunch Disrupt 2024

Bonus: An extra week to apply to Startup Battlefield 200

TechCrunch Events

16 hours ago

A big shoutout to the early-stage founders who missed the application window for the Startup Battlefield 200 (SB 200) at TechCrunch Disrupt. We have exciting news just for you! You…

Bonus: An extra week to apply to Startup Battlefield 200

Enterprise

Restate raises $7M for its lightweight workflows-as-code platform

Frederic Lardinois

17 hours ago

When one of the co-creators of the popular open source stream-processing framework Apache Flink launches a new startup, it’s worth paying attention. Stephan Ewen was among the founding team of…

Restate raises $7M for its lightweight workflows-as-code platform

Climate

Civic Renewables is rolling up residential solar installers to improve quality and grow the market

Tim De Chant

18 hours ago

With most residential solar panels installed by smaller companies, customer experience can be a mixed bag. To try to address the quality and consistency problem, Civic Renewables is buying small…

Civic Renewables is rolling up residential solar installers to improve quality and grow the market

Venture

Friends & Family Capital, a fund founded by ex-Palantir CFO and son of IVP’s founder, unveils third $118M fund

Marina Temkin

18 hours ago

Small VC firms require deep trust, mutual support and long-term commitment among the partners — a kinship that, in many ways, resembles a family dynamic. Colin Anderson (Palantir’s ex-CFO and…

Friends & Family Capital, a fund founded by ex-Palantir CFO and son of IVP’s founder, unveils third $118M fund

Transportation

Fisker’s troubled Ocean SUV gets its first recall

Sean O'Kane

18 hours ago

Fisker is issuing the first recall for its all-electric Ocean SUV because of problems with the warning lights, according to new information published by the National Highway Traffic Safety Administration…

Fisker’s troubled Ocean SUV gets its first recall

Climate

Gorilla, a Belgian startup that helps energy providers crunch big data, raises $25M

Paul Sawers

19 hours ago

Gorilla, a Belgian company that serves the energy sector with real-time data and analytics for pricing and forecasting, has raised €23 million ($25 million) in a Series B round led…

Gorilla, a Belgian startup that helps energy providers crunch big data, raises $25M

Fabless AI chip makers Rebellions and Sapeon to merge as competition heats up in global AI hardware industry

Kate Park

20 hours ago

South Korea’s fabless AI chip industry saw a slew of fundraising events over the last couple of years as demand for hardware to power AI applications skyrocketed, and it seems…

Fabless AI chip makers Rebellions and Sapeon to merge as competition heats up in global AI hardware industry

Apps

The apps that Apple sherlocked at WWDC 2024

Ivan Mehta

20 hours ago

Here’s a list of third-party apps that were Sherlocked by Apple at this year’s WWDC.

The apps that Apple sherlocked at WWDC 2024

Enterprise

Black Semiconductor nabs $273M in Germany to supercharge how chips work together

Ingrid Lunden

22 hours ago

Black Semiconductor, which is developing a chip-connecting technology based on graphene, has raised $273M in a combination of private and public funding.

How to run data on Kubernetes: 6 starting principles

Sylvain Kalache

More posts from Sylvain Kalache

Do it progressively

Understand the limitations and specificities

Come up with a plan

Check for operator availability

Pick the right level of operator capability

Understand the operator

It’s worth the effort

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags