In the AI-driven world, you can no longer afford time amnesia in your software systems.

Event-driven machine learning will enable a new generation of businesses that will be able to make incredibly thoughtful decisions faster than ever, but is your data ready to take advantage of it?

Javier Toledo
The Agile Monkeys’ Journey

--

When you look at the speed of innovation in the AI world, it seems like each business out there will eventually fall within one of these two categories: the ones that took advantage of AI and the ones that went out of business because they could no longer compete. And the key differentiation factor is going to be the amount and quality of their data. Here, let me especially highlight the word quality because knowing what, when, and why something happened in your company will be crucial.

But how can you be certain that you can recover this information since you turned your system on for the first time? This is exactly the core value of event sourcing: Instead of storing only the current state of a system, everything that happens is stored permanently as a long series of immutable events. Then, the state shown to the users is calculated on demand, but the events are never discarded, they can always be inspected, reprocessed, or forwarded to other places.

Not all developers are willing to adopt this way of storing data though. Event sourcing and CQRS, an intimately related pattern, require a slightly different way of thinking. Software developers that are not used to them tend to fall back into implementing RESTful solutions instead, as they’re perceived simpler, but at design time it’s hard to predict who and when will need the extra data, so why going the extra mile? Why making the extra investment on development time and data storage?

In the era of AI, a company can no longer afford to discard a whole dimension of their data, especially when it is as relevant as time.

That is exactly the risk, once you’ve discarded the time-series data, it becomes unrecoverable. We’re sacrificing a dimension of the data that could become of high relevance in the mid-term for a perceived short-term benefit.

Data science teams mitigate this problem using Change Data Capture (CDC). Most databases keep an operations log that provide a reliable source of differential changes. But CDC misses another important variable: the semantics of each change, that is, why the change happened. It’s not the same to say “the order has been updated” as saying “the order has been paid,” the second statement is way more relevant from a business analytics point of view.

A development team that is aware of the importance of data will always make an effort to generate enough metadata to infer the context. Still, humans will make mistakes, and more frequently, we will easily decide to deprioritize this when there’s a hard deadline. A team that uses event sourcing generates high-quality data by design, without the team needing to make a conscious decision or extra effort.

Event sourcing removes the responsibility from the development team to decide whether a change must be stored, because the only way to change the system state is by creating an event that encapsulates what changed, when, and why.

But the advantages of storing events don’t end there. Events are also structurally different, they can be easily sorted and consumed in progressive chunks, and as they’re immutable, there’s no risk that a piece of data you’ve already processed gets out of sync because other process updates it. All event consumers can work assuming that they only need to process the next event in the queue and won’t miss a byte of data. Hence, they enable, among other things, what some authors are starting to describe as event-driven machine learning:

  • Collecting data becomes easier with a real-time flow of events available to data analysts, as opposed to relying on static data dumps that provide a snapshot of a database at a particular moment. The event stream is akin to a live sports broadcast, which can be paused to view specific details in a scene, replayed to analyze interesting moments from the past, or monitored in real-time to observe the most current data.
  • Continual learning becomes possible, this is a technique that emulates the human ability to adapt their behavior in a dynamic environment. The models are re-trained with recent events to continuously adapt them to changing environments (i.e. the recommendation system of Netflix requires constant fine-tuning in response to the introduction of new shows and evolving viewer preferences).
  • Fine-grained data manipulation enables data-driven decision-making. Analysts can view data from different angles by reprocessing events and uncover previously inaccessible information. Each event represents a change that can be analyzed at an individual level, such as individual transactions in a bank account, or aggregated to see a broader picture at a particular time, such as the account balance. Data can also be processed to create new perspectives, like categorizing expenses by type.

With nowadays tools, using the data in this way can be expensive, and still require the expertise of specialized developers or data scientists, but we’re expecting advancements in large language models capabilities to provide natural language interfaces for manipulating and visualizing data. These new data manipulation models will make these techniques available for anyone in an organization, fostering actual data-driven business cultures.

New advancements in large language models capabilities will provide natural language interfaces for manipulating and visualizing data, and make data-driven decisions at every level in an organization.

In short, with all the amazing AI products we’ve seen recently, such as DALL·E or ChatGPT, we’re barely scratching the surface of what’s to come. It’s clear that AI is here to stay and will deeply affect how we conduct business. To stay competitive in this new AI-driven world, companies will need to collect high-quality data of everything that happens in their systems.

Event sourcing captures high-quality data by design, packing the what, why, and when for every change, recording events that can be compared to the individual frames of a high-definition security camera for your data.

This constant stream of events provides extra benefits. Data can be analyzed or manipulated in real-time, and previous states can be replayed. Event-driven machine learning models enable accurate predictions and automatic reactions to complex scenarios, working as a restless security guard that continuously monitor the cameras and find inefficiencies and risks.

For all these reasons, we believe that companies that adopt event-driven designs and learn to take advantage of event-driven machine learning models will have a strong competitive advantage and lead the market in this new AI-driven generation.

--

--

Javier Toledo
The Agile Monkeys’ Journey

Cofounder and CTO at The Agile Monkeys . Co-creator of the Booster Framework. Breaking cutting-edge technology remotely from the beautiful Canary Islands.