3 dynamic use cases for Apache Flink and stream processing

We live in a world in motion. Stream processing allows us to record events in the real world so that we can take action or make predictions that will drive better business outcomes.

shutterstock 288379226 time-lapsed photo of spinning carousel at night
Michal Moravec / Shutterstock

The real world is made up of people and things in constant motion, and the applications developers build need to reflect this reality. Picture an airport with thousands of planes and passengers arriving and departing daily who need to be updated when delays or other changes happen as fast as possible. Or a payment network that processes millions of transactions each minute. If we can record and process these events at scale and in real time, we open the door to exciting new applications that can improve efficiency or drive better customer experiences.

Stream processing is the enabler here. Stream processing is a data processing technology used to collect, store, and manage continuous streams of data as they are produced or received. Also called event streaming or complex event processing (CEP), stream processing has grown rapidly in recent years because of its powerful ability to simplify data architectures, provide real-time insights and analytics, and react to time-sensitive events as they happen.

Apache Flink is the de facto standard for stream processing applications. It’s often used in conjunction with Apache Kafka, but Flink is a stand-alone stream processing engine that can be deployed independently. It solves many of the hard problems associated with distributed stream processing, such as fault tolerance, exactly once delivery, high throughput, and low latency. That’s why companies like Uber and Netflix use Flink for some of their most demanding real-time data needs.

When we think about stream processing use cases, we can group them into three categories, which we’ll explore with examples below:

  1. Event-driven applications
  2. Real-time analytics
  3. Streaming data pipelines

Event-driven applications

Event-driven applications observe or analyze streams of data and immediately trigger an alert when a certain event or pattern occurs. Fraud detection is among the most common scenarios, where stream processing is used to analyze transaction data and trigger alerts based on suspicious activity, but there are many more possibilities.

For instance, in retail, as online sales continue to climb, many shoppers want to be sure whether items are in stock, and know how long it will take for their delivery to arrive before they place an order. If they don’t have this information or the delivery will take too long, they will often go to a competing site to look for a better deal. Showing an item in stock and canceling the order a few hours or days later because the inventory was out of sync with the sale system is also a terrible experience for users. This means retailers need a real-time view of their inventory in all regions so that when new orders come in, they can quickly determine if an order needs to be rerouted to a closer warehouse and know how long it will take.

Time is a critical component for these event-driven applications, and Flink is an ideal solution because it offers advanced windowing capabilities that give developers fine-grained control over how time progresses and how data is grouped for processing.

Real-time analytics

Also called streaming analytics, this category involves analyzing real-time data streams to generate business insights that inform operational or strategic decisions. Apps that use real-time analytics analyze data as soon as it arrives from a stream and then make timely decisions based on the latest, up-to-date information. 

For example, online food delivery services have become extremely popular, and many services provide a dashboard for restaurant owners that gives them up-to-date information about order volumes, popular menu items, and how fast orders are being delivered. With this information, restaurants can make adjustments on the fly to increase sales and ensure their customers are receiving orders on time. 

Streaming media services are another popular use case for real-time analytics. The big streaming providers capture billions of data points about which shows are popular and who’s watching what. Real-time analytics allows these providers to determine what movies they should recommend to a customer next, based on the individual’s past viewing habits and viewing patterns from across their customer base. Doing these curated recommendations in real time enables users to get feeds that are adjusted almost instantly based on their actions.

Flink is ideal for real-time analytics because it’s designed to process large amounts of data with very low, sub-second latency. With interactive queries, a comprehensive set of out-of-the box functions, and some advanced pattern recognition functions, it enables some powerful analytics capabilities. 

Streaming data pipelines

Streaming data pipelines continuously ingest data streams from applications and systems and perform joins, aggregations, and transformations to create new, enriched data streams of higher value. Downstream systems can consume these events for their own purposes, starting from the beginning of the stream, the end of the stream, or anywhere in between.

Streaming data pipelines are useful for migrating data from legacy systems, such as a traditional on-prem data warehouse, to more modern, cloud-based platforms that better support event-driven applications and real-time analytics. Legacy systems often contain high-value data but don’t support these more modern application types. A streaming data pipeline can connect these legacy sources to new endpoints, allowing developers to gradually migrate data to a more modern cloud data warehouse while keeping current operations intact.

Another important use case for stream processing is machine learning, which is increasingly used to make predictions about real-world events so that businesses can adjust strategies accordingly. Machine learning pipelines can prepare data and stream it to an object storage service where they can train machine learning models. Once trained, the models can be continuously and incrementally updated, refining the machine learning recommendations in real time to accommodate changes in the real world. These models can then be called in real time to power predictive maintenance or fraud detection scenarios, for example. Stream processing can also be used to power real-time generative AI, and help to build applications that leverage always up-to-date data with the power tools such as ChatGPT as explained here.

Reacting to the world in real time

In each case, stream processing is used to record events in the real world so that companies can take action or make predictions that will drive better business outcomes. Thanks to the cloud, more systems are now connected online and more data is generated to give a detailed picture of the world and what’s happening in it. Stream processing allows us to harness that data to build powerful applications that respond and react to these changing events in real-time.

Jean-Sebastien Brunner is director of product management at Confluent.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Copyright © 2024 IDG Communications, Inc.