If you’re a software engineer, you’ve likely heard all about shift left, a practice that can streamline certain aspects of software development.
But shift left isn’t just for developers. It can be equally valuable for site reliability engineers (SREs). Although the main mission of SREs is ensuring the reliability of software after it has been deployed rather than actually developing software, encouraging shift left practices as part of a software development strategy can nonetheless make it easier to optimize reliability.
To prove the point, here’s an overview of what shift left means and why it matters for SREs.
What Does Shift Left Mean?
Shift left is the detection of application problems early in the software development process. The core idea behind shift left is that instead of waiting until just before deployment to test software for bugs, security issues or other problems, teams should begin the testing process as early as possible. In other words, they should shift some testing to the “left” of the software development life cycle instead of testing only in the middle—hence the term “shift left.”
In so doing, they can detect problems early in the development process, at which point they are typically easier to resolve. If you don’t catch a performance, reliability or security bug until just before you plan to push your application release into production, you may have to overhaul a large part of your codebase to fix the issue. In many cases, you’ll have to change not just the specific code that triggered the bug, but also other code that depends on or integrates with the buggy code. But by finding buggy code early, developers can often fix the problem with minimal adjustments to the application as a whole.
Shift left is a high-level concept, and it can be implemented in multiple ways. In some cases, developers may start running performance tests on new code right after they write it, even if it has not yet been integrated into the main codebase. In other cases, shift left testing could mean compiling some parts of the codebase in order to run tests against it before the application as a whole is built and tested in a dev/testing environment. Either way, the result is a testing process that begins earlier than it would under a conventional approach.
Importantly, shift left doesn’t mean that teams skip the tests that they would normally perform just prior to deployment. Those tests, which are important for catching bugs that materialize only when testing the application as a whole rather than individual parts of it, typically still happen as part of a shift left strategy. Thus, shift left means that testing begins earlier and coincides with a broader portion of the overall development pipeline, not that testing is simply shifted to the beginning of the pipeline at the expense of mid-pipeline tests.
Shift Left and SREs
Again, the main job of SREs is to manage the reliability of systems. They may spend part of their days working alongside software engineers to help devise architectures and coding strategies that maximize the resiliency of applications against reliability and performance issues. But SREs also devote much of their time to managing what happens post-deployment. They use monitoring and observability tools to detect problems that arise in production environments. They then take the lead in responding to them.
In that sense, SREs may not seem to have much to gain from shift left. Shift left is a strategy that aligns with software development first and foremost, and software development is not the core focus of SREs.
Nonetheless, a shift-left strategy can help SREs do their jobs better, for several reasons:
- Fewer problems in production: Most obviously, shift left reduces the rate at which reliability issues reach production environments. In turn, it minimizes the number of incidents SREs need to respond to.
- Optimizing software pre-deployment: By exposing reliability issues early in the development process, shift left is one way for SREs to acquire the insights they need to collaborate with developers on building reliability into applications. Shift left could highlight buggy dependencies or suboptimal architectural patterns, for example, which SREs can encourage developers to address.
- Granular reliability insights: Because shift left is a great way to trace software problems to the specific code that triggers them, it provides highly granular visibility into reliability. It can help SREs find the weakest links within an application’s codebase or architecture. These weak spots can be harder to detect when monitoring or observing an application as a whole.
How SREs can Follow the Shift Left Methodology
Because the implementation of shift left practices is a task that ultimately falls to developers, SREs need to work closely with development teams (and DevOps engineers, if they exist in the organization) to make shift left happen.
The first step in that process is to get buy-in for shift left among developers. SREs can do this by highlighting the ways in which shift left can streamline the work of developers and reduce the complexity of fixing bugs. That’s important to communicate because it ensures that developers understand that shift left benefits everyone, not just SREs.
SREs should also work with developers to identify the best approaches and practices to implementing shift left. A key factor to consider here is where reliability problems most often arise, and how shift left can help to detect them earlier. If most performance problems stem from code written by individual developers, for instance, testing code as soon as it is generated may be the best way to detect problems early on. In contrast, in situations where performance issues are triggered most often by environment configurations, the ability to test applications early under different environment variables could be the best way to surface issues early in the development process.
Finally, SREs should partner with developers to respond to the issues that shift left reveals. After all, while developers typically don’t pay close attention to what happens post-deployment, SREs have a special ability to understand how problems in development translate to problems in production. SREs may therefore be able to help developers identify the most effective way to fix bugs based on production environment requirements, even if it’s not the quickest or simplest way.
Why SREs Need to Shift Left, Too
Although the shift left methodology has not traditionally been closely associated with SRE, shift left is a crucial tool in the SRE toolbox. By helping teams to detect reliability issues when they are easier to resolve, shift left puts SREs in a stronger position to maximize the overall reliability of applications. At the same time, it reduces the number of production environment issues that SREs have to manage.