A first look at Windows AI Studio

Available in an early preview, Microsoft’s AI development environment for the desktop lets you build small language models that run on PCs and mobile devices.

A user in thought rests his chin on his hand. [thinking / consideration / intelligence / planning]
Rowan Jordan / Getty Images

Microsoft used the developer-focused parts of its Ignite 2023 event to introduce a series of AI development tools. Azure AI Studio supports large-scale AI development for cloud-hosted applications, using Azure OpenAI models or others, while Copilot Studio extends the older Power Virtual Agents low-code AI tools with OpenAI-powered “boosts.”

Microsoft also announced a third tool, but it has taken a while for it to arrive on developer’s PCs. That tool is Windows AI Studio, now available in a preview. Let’s take a look.

Introducing Windows AI Studio

Windows AI Studio is intended to bring Microsoft and its partners’ library of AI models to PCs, using GPUs now but eventually also on-board AI accelerators, like the Arm and Intel NPUs in Microsoft’s latest Surface hardware. These NPUs were first delivered in the Surface Laptop Studio 2 that I’m writing this column on. With DirectML support for the integrated Intel NPUs in these and other devices due early in 2024, this option should prove to be attractive for developers and for other users.

Windows AI Studio is designed to help you train and customize models, getting them ready for use in your code. Once trained, you can convert models to run using the ONNX (Open Neural Network Exchange) cross-platform runtime for use in desktop and mobile applications. Delivered as a Visual Studio Code extension, Windows AI Studio will allow you to bring many different tools and AI models into one place, working alongside the rest of your tools, so you can refine models at the same time as build them into .NET applications.

Windows AI Studio offers an interesting hybrid of Windows and Linux tools, working across both your CPU and GPU, using the Windows Subsystem for Linux (WSL) to host and run models. It’s an approach that does demand capable hardware, with plenty of memory and a recent GPU. You won’t be able to use Windows AI Studio without a discrete GPU, which can be either a workstation-grade card or an external GPU working over a Thunderbolt connection.

Windows AI Studio installation and prerequisites

Windows AI Studio is simple enough to install. You download it from the Visual Studio Marketplace, where you will also find quick start instructions. Note that by default the Visual Studio Marketplace view in Visual Studio Code is set to install release versions, so you may need to switch the view to pre-release versions. Once you’ve made that change the download is quick and easy.

There are some important prerequisites. You need an Nvidia GPU and WSL running at least the Ubuntu 18.4 release as its default Linux. Once installed Windows AI Studio then checks for Conda and CUDA support in your WSL environment, so it can use the GPU. If those aren’t installed, Windows AI Studio offers a one-click option for ensuring that all the prerequisite libraries are in place.

This uses Visual Studio Code’s remote server options to load and run an installation script. If you want to see it in operation, open Visual Studio Code’s built-in terminal and switch to its Output view. The installation can take a while, as it will download and install the relevant libraries. Expect it to take at least five minutes, and much more if you have an older PC. Windows AI Studio documentation is currently only on GitHub; Microsoft Learn only shows a placeholder page.

Your first model in Windows AI Studio

Once installed, Windows AI Studio adds a new chip-like icon in the Visual Studio Code extensions sidebar. Tap this to launch the Windows AI Studio development environment. At launch it will check that your development environment still meets the necessary prerequisites. Once the checks have been passed, and any updates have been made to your WSL configuration, the extension loads a What’s New page and populates its actions pane with its current set of features. Four different actions are visible in the latest preview release, and more are planned. However, only one currently works, the model fine-tuning action.

Additional planned options include Retrieval Augmented Generation (RAG), a playground for working with Microsoft’s Phi-2 foundation models, and access to a library of ready-to-use models from services like Hugging Face. Working with Phi-2 will allow you to build and train your own small language models, without needing to rely on cloud-hosted services like Azure OpenAI.

RAG support will allow you to take an existing large language model and use it as a foundation for your own custom LLM without completely retraining it on your own data. RAG uses prompt engineering techniques to provide a more comprehensive context for the LLM to elicit more accurate answers. Using RAG, you can push more domain-specific or more up-to-date data into the LLM as part of a prompt, working with external data sources including your own business-specific information.

Adding tooling for RAG into Windows AI Studio should help you build and test vector indexes and embeddings for your data. Once you have this, you can start to develop search-driven pipelines that will ground your LLM applications, restricting their responses to your own domain using tools like TypeChat, Prompt Flow, and Semantic Kernel.

Quantizing a model with QLoRA

For now, however, this early preview release is focused on fine-tuning existing AI models, ready for conversion to ONNX and embedding in WinML projects. It’s worth using this feature alone, as it’s a key requirement for any custom machine learning product, where you want your model running on local hardware, not in the cloud.

To set up a model tuning environment, you start by choosing a local folder, then you pick a model. The initial selection is small, with five open-source text generation models available from Microsoft, Hugging Face, Mistral AI, and Meta. Here Microsoft is using the QLoRA tuning methodology: Quantized Low Rank Adapters, an approach that was developed at the University of Washington and has shown impressive results. The initial paper describes a model family that offers 99.3% of the performance of ChatGPT with only 24 hours of tuning on a single GPU.

If we’re to bring generative AI to our computers and handheld devices, this is the type of approach we need. We don’t need the complexity (or size) of a large language model; instead we need the same performance on our own data, in a small language model. QLoRA and similar techniques are a way to build these custom AIs on top of open-source foundational models.

Once you’ve chosen your model, click Configure project to begin setting up the project in both Windows and WSL. You may need to input an access token for Hugging Face or sign up for access before you can use a model. Windows AI Studio presents you with a set of tuning parameters that you will use to refine your model’s performance. For an initial test, simply accept the defaults and wait for the model to generate. There’s also the option to use additional datasets to improve tuning.

Fine-tuning a model using Olive

Once the model has been generated, you’re prompted to relaunch the Visual Studio Code window in a Windows AI Studio workspace. This switches you from Windows to WSL, ready to use the tools installed during setup. As part of the initial setup of your workspace, Windows AI Studio will install a Prompt Flow extension.

With the model workspace open, you can then use the Visual Studio Code terminal to start the Conda environment used to tune your model. You can now run Olive, using QLoRA on the default content or on your own dataset. This can take some time, so be prepared to wait. Even on a relatively high-end graphics card, tuning will take several hours.

When the tuning process is complete you can use a simple Gradio web interface to test your trained model, before packaging it and using it in your applications. This is a fun little tool, and worth running before and after tuning so you can see how the process affects interactions.

It’s important to remember that this is a very early release of what is a complex tool. Microsoft has done a lot to simplify working with AI models and tuning tools, but you do still need to know what you want out of the language model you’re building. There are a lot of variables that you can tweak as part of the turning process, and it pays to understand what each one controls, and how they affect the resulting model.

For now, Windows AI Studio may well be a tool for the AI experts. However, it shows a lot of promise. As it evolves, and adds more features, it could easily become an essential part of the Windows development workflow—especially if AI accelerators become a common component in the next generation of PCs.

Copyright © 2023 IDG Communications, Inc.