FlyteInteractive: Interactive development for machine learning models

LinkedIn needed a better way to test and tune machine learning models, so it wrote its own tool that plugs into Visual Studio Code.

worried tired programmer developer

Machine learning (ML) is becoming an increasingly important part of the modern application stack. Whether it’s large-scale, public large language models (LLM) like GPT or small-scale, private models trained on company content, developers need to find ways of including those models in their code.

That means finding ways to test that code, without pushing it to production servers. We can build on some of the MLOps concepts used to manage the underlying models, merging them with familiar devops techniques. However, much of that work is bespoke, with developers constructing their own toolchains and building their own test harnesses. As teams change, the wheel gets reinvented, again and again.

As a result, projects take longer to deliver and key lessons aren’t passed from team to team. There’s no common framework for determining actual performance and the resulting inference costs, making it hard to make the business case for niche machine learning applications, even though they may well be able to deliver significant value to an enterprise.

Using Flyte for ML orchestration in Kubernetes

One route to an “ML devops” future is using workflow orchestration tools, like Flyte, to build and run ML applications, taking advantage of cloud-native platforms and containers to run at scale. There’s a lot of value in this approach, as it gives you a single platform for running inferencing, using technologies like Kubernetes to scale as necessary. But it only supports the operational side of things, while still requiring developers to work with inadequate test data and their own basic cloud infrastructure.

Flyte takes advantage of Kubernetes support for GPU access from containers, pairing container encapsulation of models and supporting code with a Python orchestration pipeline. The underlying orchestration engine is written in Go, giving you compatibility with Kubernetes and the high performance needed for run-time ML operations.

However, out of the box, Flyte still has many of the same problems as other pipeline tools, as it requires developers to complete development before the application can be tested against live data, even when you’re only making a slight change. As a result, local tests are often carried out on a curated, artificial data set, one that’s often a lot smaller than production data sets. Models also are constrained to run on laptops or other development systems, making it hard to determine appropriate quantization levels without multiple passes.

Flyte is used by many companies to host their machine learning pipelines, including Microsoft’s LinkedIn subsidiary. LinkedIn switched to using Flyte a year ago, and soon built cross-organizational tools on top of the framework. That removed some of the bottlenecks, particularly by giving developers a common set of components that reduced the need for bespoke tools. However, the LinkedIn team still was not able to fix all the problems that come with integrating machine learning into a devops environment. One important gap was the lack of interactive development tools.

An interactive developer experience for Flyte

The LinkedIn ML team began work on a tool based on Visual Studio Code code-server to fill that gap. Now available as an open source project, FlyteInteractive moves ML development away from current batch development processes toward an approach that’s much closer to modern application development.

FlyteInteractive is, as the name suggests, an interactive development mode for Flyte pipelines. FlyteInteractive allows developers to examine model operations and test different parameters while working in a near-production environment, taking advantage of the isolation provided by Flyte’s Kubernetes containers and the debugging capabilities of the Visual Studio Code remote server.

If you’re using Flyte and you’ve installed the FlyteInteractive components, using FlyteInteractive is as easy as adding a single line to the Python task that runs your ML job. This one line, a simple @vscode decorator, switches your pipeline from batch mode to interactive mode, deploying a container with a ready-to-use Visual Studio Code Server. You can then connect to the container runtime with VS Code, giving you access to a running ML job using the familiar debugging tools in VS Code.

The advantage of this approach is that developers can gain access to production-grade environments and real data. There’s even the option of working with models at scale. FlyteInteractive allows you to get a better feel for model performance and accuracy and reduce the risk of error. At the same time, you can work to tune and quantize your models, so that they will make efficient use of resources and keep operational costs to a minimum.

Plus, Visual Studio Code gives you access to Python debuggers and other language-specific tools that can help you examine the running model in more detail. Instead of waiting for the results of a batch job, you can set breakpoints and use the debugger to inspect operations, stepping through and making changes as you go.

FlyteInteractive also gives you access to Jupyter notebooks that contain your model and its code. This gives you access to Python’s numerical analysis and visualization tools to explore your test data and understand how accurate your model is, and if the results you are getting are statistically significant.

Getting started with FlyteInteractive

Installing FlyteInteractive is simple enough, as it’s available through the familiar Python pip installer. Once you’ve installed FlyteInteractive and added the @vscode decorator to your task, you can connect to it from Visual Studio Code via a URL in the Flyte console or by using Kubernetes’ command line to open a port on your test pod.

Then all you need to do is use the run and debug option in VS Code, taking advantage of its built-in interactive debugger. A debug run inherits the inputs from the task you created. You can use this to make changes to your model on the fly. You should ensure that these are copied to your own development environment, for example as commits to Git, as they will be lost when the interactive session is shut down.

One of the more useful features of FlyteInteractive is a built-in garbage collector. This makes sure your environment isn’t consuming resources without delivering results. One option sets a maximum time to live, while the other sets limits on idle time. If either limit is exceeded, the test container is shut down and deleted. This way you can ensure that failed experiments don’t cost more than they need to.

Future versions of FlyteInteractive could well provide observability hooks into models that provide data on resource usage to finops tools, allowing teams to quantize models so that they can be tailored not only for performance and accuracy, but also for cost. Adding observability features could help with performance tuning and with debugging an ML pipeline as part of a larger application.

LinkedIn estimates a 96% improvement in finding bugs and running experiments since it began using that FlyteInteractive. Before FlyteInteractive, the build and test cycle on live systems could take up to 15 minutes to test a workflow, with a mere 20% success rate finding bugs.

If you’re using models from Hugging Face, especially the new generation of small language models, FlyteInteractive should be hugely helpful for getting the right quantization level and the right accuracy for your applications—and no more sitting around waiting to see if a batch job will finish as expected. Tools like FlyteInteractive are going to be an essential part of the ML toolchain.

Copyright © 2024 IDG Communications, Inc.