Startups

How to run data on Kubernetes: 6 starting principles

Comment

Containers in the cloud; kubernetes
Image Credits: SerrNovik (opens in a new window) / Getty Images

Sylvain Kalache

Contributor
Sylvain Kalache is the co-founder of Holberton, an edtech company training digital talent in more than 10 countries. An entrepreneur and software engineer, he has worked in the tech industry for more than a decade. Part of the team that led SlideShare to be acquired by LinkedIn, he has written for CIO and VentureBeat.

More posts from Sylvain Kalache

Kubernetes is fast becoming an industry standard, with up to 94% of organizations deploying their services and applications on the container orchestration platform, per a survey. One of the key reasons companies deploy on Kubernetes is standardization, which lets advanced users double productivity gains.

Standardizing on Kubernetes gives organizations the ability to deploy any workload, anywhere. But there was a missing piece: The technology assumed that workloads were ephemeral, meaning that only stateless workloads could be safely deployed on Kubernetes. However, the community recently changed the paradigm and brought features such as StatefulSets and Storage Classes, which make using data on Kubernetes possible.

While running stateful workloads on Kubernetes is possible, it is still challenging. In this article, I provide ways to make it happen and why it is worth it.

Do it progressively

Kubernetes is on its way to being as popular as Linux and the de facto way of running any application, anywhere, in a distributed fashion. Using Kubernetes involves learning a lot of technical concepts and vocabulary. For instance, newcomers might struggle with the many Kubernetes logical units such as containers, pods, nodes and clusters.

If you are not running Kubernetes in production yet, don’t jump directly into data workloads. Instead, start with moving stateless applications to avoid losing data when things go sideways.

Understand the limitations and specificities

Once you are familiar with general Kubernetes concepts, dive into the specifics for stateful concepts. For example, because applications may have different storage needs, such as performance or capacity requirements, you must provide the correct underlying storage system.

What the industry generally calls storage “profiles” is termed Storage Classes in Kubernetes. They provide a way to describe the different types of classes a Kubernetes cluster can access. Storage classes can have different quality-of-service levels, such as I/O operations per second per GiB, backup policies or arbitrary policies such as binding modes and allowed topologies.

Another critical component to understand is StatefulSet. It is the Kubernetes API object used to manage stateful applications and offers key features such as:

  • Stable, unique network identifiers that let you keep track of volume, and allows you to detach and reattach them as you please.
  • Stable, persistent storage so that your data is safe.
  • Ordered, graceful deployment and scaling, which is required for many Day 2 operations.

While StatefulSet has been a successful replacement for the infamous PetSet (now deprecated), it is still imperfect and has limitations. For example, the StatefulSet controller has no built-in support for volume (PVC) resizing — which is a major challenge if the size of your application dataset is about to grow above the current allocated storage capacity. There are workarounds, but such limitations must be understood well ahead of time so that the engineering team knows how to handle them.

Come up with a plan

Once you are comfortable with Kubernetes stateful concepts, you can progressively migrate your data workloads in a specific order. This allows you to learn from your mistakes and avoid being overwhelmed, because not all data technologies are equally easy to run on Kubernetes.

Established technologies, such as databases and storage, should be migrated first, and emerging tech, such as AI and ML, should be done last. This is reflected in a recent report, which found database and persistent storage are the two most-run data workloads on Kubernetes. The main reason is the lack of tooling for Day 2 operations. We will explore this in the next section.

Check for operator availability

Moving your stateful workloads to Kubernetes is only half the job — also known as Day 1. Now you need to handle Day 2 operations (one of the most discussed topics at the last KubeCon). This is where things get tricky. There are tons of Day 2 operations that Kubernetes cannot handle natively such as patching and upgrading, backup and recovery, log processing, monitoring, scaling and tuning.

All these operations are application specific. For example, a PostgreSQL and MySQL cluster will require two completely different approaches when picking a new primary server in an HA cluster configuration. Kubernetes cannot possibly know all the application’s specific Day 2 operations. This is where operators come in.

Operators are programmable extensions that perform operations that Kubernetes cannot handle natively. Operators provide intelligent, dynamic management capabilities by extending the functionality of the Kubernetes API. One of the most common uses is conducting these Day 2 operations. These operators aren’t developed by the Kubernetes maintainers but by third-party developers and organizations.

Before moving a data workload to Kubernetes, make sure there is an operator for it. OperatorHub does a great job of indexing them. With 282 operators available on the site, the distribution echoes what we discussed earlier: Some workloads have supporting tools, and some don’t. For example, the database category has 38 operators — there are eight for PostgreSQL alone — while the entire ML/AI category only has seven.

Pick the right level of operator capability

Having an operator for your technology isn’t enough, because they can have different capabilities and often exist at various levels of maturity. The OperatorFramework suggests a capability model that categorizes operators according to their features:

  • Level 1: Works for basic installation, such as automated application provisioning and configuration management.
  • Level 2: Supports seamless upgrades, patches and minor version upgrades.
  • Level 3: Handles the full app and storage lifecycle (backup, failure recovery, etc.).
  • Level 4: Provides deep insights, metrics, alerts, log processing and workload analysis.
  • Level 5: Offers automatic horizontal/vertical scaling, auto-config tuning, abnormality detection and scheduling tuning.

When choosing an operator, make sure its capabilities match your needs. If you are unsure which level is right for you, the Data on Kubernetes Report 2022 found that most organizations are looking for operators that are at least at Level 3. Having a backup for your stateful workloads sounds like a good idea.

If you can’t find an operator that matches your needs, don’t worry because most of them are open source. You can extend existing operators’ capabilities with internal development or, even better, contribute to the open source project.

Understand the operator

Operators’ extensibility is their strength, but it’s also their weakness. The lack of standards means they are programmed differently, so you must look at their config files to pick the format you like best.

What’s more, operators may use different technical routes to achieve the same goal. For example, one of the eight PostgreSQL operators, CloudNativePG, does not use StatefulSets, and instead uses its own custom controller. That’s quite unexpected considering that StatefulSets is the foundation for stateful workloads on Kubernetes.

Its developers decided to go with this design because of the inability of StatefulSet to resize PVCs (as we discussed earlier). As the operator documentation explains, picking “different [design directions] lead to other compromises.” So when picking an operator, be sure to understand its implementation and trade-offs, and go with the one you are the most comfortable with.

It’s worth the effort

As you can see, running data on Kubernetes isn’t always easy, but the good news is that it’s worth the hard work: 54% of surveyed organizations attributed more than 10% of their revenue to the fact that they run data on Kubernetes. What’s more, 33% said it has a transformative impact on productivity and another 51% saw a significant positive impact.

As organizations increasingly adopt multicloud infrastructure to optimize their cost and infrastructure performance, Kubernetes has become the tool of choice. With an estimated 66% of countries having some sort of data privacy and consumer rights legislation, which often requires enforcing data sovereignty, companies must increasingly host user data in the countries they operate in. Kubernetes is here to stay.

More TechCrunch

When Jordan Nathan launched his DTC nontoxic cookware company, Caraway, in 2019, he knew he was not the only founder trying to sell a new brand of pots and pans…

Why being the last company to launch in a category can pay off

Out of an abundance of caution, the car took two minutes to turn a corner.

This humanoid robot can drive cars — sort of

There has been a silly amount of drama in the run-up to Tesla‘s annual shareholder meeting on Thursday. The company is set to hold a vote on “re-ratifying” the $56…

Ahead of Tesla’s big shareholder vote, let’s re-read the judge’s opinion that got us here

To give users more control over the contacts an app can and cannot access, the permissions screen has two stages.

iOS 18 cracks down on apps asking for full address book access

The push to produce a robotic intelligence that can fully leverage the wide breadth of movements opened up by bipedal humanoid design has been a key topic for researchers.

Generative AI takes robots a step closer to general purpose

A TechCrunch review of LinkedIn data found that Ford has built this team up to around 300 employees over the last year.

Ford’s secretive, low-cost EV team is growing with talent from Rivian, Tesla and Apple

The most critical systems of our modern world rely on GPS, from aviation and road networks to emergency and disaster response, from precision farming and power grids to weather forecasting…

Tern AI wants to reduce reliance on GPS with low-cost navigation alternative 

Since fintech startup Brex’s inception in 2017, its two co-founders Henrique Dubugras and Pedro Franceschi have run the company as co-CEOs. But starting today, the pair told TechCrunch in an…

Fintech Brex abandons co-CEO model, talks IPO, cash burn and plans for a secondary sale

Hiya, folks, and welcome to TechCrunch’s regular AI newsletter. This week in AI, Apple stole the spotlight. At the company’s Worldwide Developers Conference (WWDC) in Cupertino, Apple unveiled Apple Intelligence,…

This Week in AI: Apple won’t say how the sausage gets made

India’s largest wealth manager focused on ultra-high-net-worth individuals, 360 One WAM, has agreed to acquire popular Indian mutual fund investment app ET Money for about $44 million. Earlier called IIFL…

India’s 360 One acquires mutual fund app ET Money for $44M

Helen Toner, a former OpenAI board member and the director of strategy at Georgetown’s Center for Security and Emerging Technology, is worried Congress might react in a “knee-jerk” way where…

Helen Toner worries ‘not super functional’ Congress will flub AI policy

Layoffs are tough. This year alone, we’ve already seen 60,000 job cuts across 254 companies according to layoffs.fyi. Looking for ways to grow your network can be even harder during…

Layoffs Got You Down? Get a Half-Price Expo+ Pass at Disrupt 2024

YouTube announced this week the rollout of “Thumbnail Test & Compare,” a new tool for creators to see which thumbnail performs the best. The feature first launched to select creators…

YouTube creators can now test multiple video thumbnails

Waymo has voluntarily issued a software recall to all 672 of its Jaguar I-Pace robotaxis after one of them collided with a telephone pole. This is Waymo’s second recall. The…

Waymo issues second recall after robotaxi hit telephone pole

The hotel guest management technology company’s platform digitizes the hotel guest journey from post-booking through checkout.

Insight Partners backs Canary Technologies’ mission to elevate hotel guest experiences

The TechCrunch team runs down all of the biggest news from the Apple WWDC 2024 keynote in an easy-to-skim digest.

Here’s everything Apple announced at the WWDC 2024 keynote, including Apple Intelligence, Siri makeover

InScope leverages machine learning and large language models to provide financial reporting and auditing processes for mid-market and enterprises.

Lightspeed Venture Partners leads $4.3M seed in automated financial reporting fintech InScope

Venture fundraising has been a slog over the last few years, even for firms with a strong track record. That’s Foresite Capital’s experience. Despite having 47 IPOs, 28 M&As and…

Foresite Capital raises $900M sixth fund for investing in life sciences companies

A year ago, Databricks acquired MosaicML for $1.3 billion. Now rebranded as Mosaic AI, the platform has become integral to Databricks’ AI solutions. Today, at the company’s Data + AI…

Databricks expands Mosaic AI to help enterprises build with LLMs

RetailReady targets the $40 billion compliance market to help reduce the number of retail compliance losses that shippers incur annually due to incorrectly shipped packages.

YC grad RetailReady raises $3.3M for an AI warehouse app that hopes to save brands billions

Since its launch in 2013, Databricks has relied on its ecosystem of partners, such as Fivetran, Rudderstack, and dbt, to provide tools for data preparation and loading. But now, at…

Databricks launches LakeFlow to help its customers build their data pipelines

A big shoutout to the early-stage founders who missed the application window for the Startup Battlefield 200 (SB 200) at TechCrunch Disrupt. We have exciting news just for you! You…

Bonus: An extra week to apply to Startup Battlefield 200

When one of the co-creators of the popular open source stream-processing framework Apache Flink launches a new startup, it’s worth paying attention. Stephan Ewen was among the founding team of…

Restate raises $7M for its lightweight workflows-as-code platform

With most residential solar panels installed by smaller companies, customer experience can be a mixed bag. To try to address the quality and consistency problem, Civic Renewables is buying small…

Civic Renewables is rolling up residential solar installers to improve quality and grow the market

Small VC firms require deep trust, mutual support and long-term commitment among the partners — a kinship that, in many ways, resembles a family dynamic. Colin Anderson (Palantir’s ex-CFO and…

Friends & Family Capital, a fund founded by ex-Palantir CFO and son of IVP’s founder, unveils third $118M fund

Fisker is issuing the first recall for its all-electric Ocean SUV because of problems with the warning lights, according to new information published by the National Highway Traffic Safety Administration…

Fisker’s troubled Ocean SUV gets its first recall

Gorilla, a Belgian company that serves the energy sector with real-time data and analytics for pricing and forecasting, has raised €23 million ($25 million) in a Series B round led…

Gorilla, a Belgian startup that helps energy providers crunch big data, raises $25M

South Korea’s fabless AI chip industry saw a slew of fundraising events over the last couple of years as demand for hardware to power AI applications skyrocketed, and it seems…

Fabless AI chip makers Rebellions and Sapeon to merge as competition heats up in global AI hardware industry

Here’s a list of third-party apps that were Sherlocked by Apple at this year’s WWDC.

The apps that Apple sherlocked at WWDC 2024

Black Semiconductor, which is developing a chip-connecting technology based on graphene, has raised $273M in a combination of private and public funding. 

Black Semiconductor nabs $273M in Germany to supercharge how chips work together