During an online SLOconf event today, Nobl9 revealed that the platform it created to enable IT teams to achieve service level objectives (SLOs) is now available under an open source Apache license.
Brian Singer, chief product officer for Nobl9, said the goal is to make a YAML specification format for defining SLOs that includes a parser and basic validation and testing capabilities more accessible via an OpenSLO project. Achieving that goal is critical at a time when organizations are deploying microservices-based applications that have dependencies that often adversely impact performance and availability.
While the concept of an SLO and/or a service level agreement (SLA) has been around for a long time, IT teams that have embraced DevOps best practices by hiring site reliability engineers (SREs) are programmatically attempting to assure that SLOs are attained and, just as importantly, maintained, said Singer. Most SLAs have, historically, not been well-enforced. OpenSLO makes it possible to achieve that goal by embedding SLO objectives within a YAML file that become part of a Git-based workflow, he noted.
Niall Richard Murphy, author of the book, Site Reliability Engineering, has signed on to be a core OpenSLO contributor working alongside committers from GitLab, Dynatrace and Nobl9. The ultimate goal is to make it simpler to share and migrate portable SLO specifications across multiple application development projects by constructing them using an object model, said Singer.
Nobl9, meanwhile, will continue to operate a software-as-a-service (SaaS) platform for managing SLOs based on OpenSLO. Singer noted there will be an increased need for OpenSLO as it becomes more widely employed. The Nobl9 SLO platform is compatible with monitoring platforms such as Datadog, New Relic and Prometheus, and uses data collected from monitoring and observability platforms to calculate acceptable rates of error per service. It can be configured to trigger alerts and even workflows in anticipation of outages.
Accessible via a CLI/GUI/API, the platform also enables DevOps teams to create business rules and define “facets” of users based on the application experiences required. DevOps teams can identify users who are being serviced poorly in addition to tracking groups of users. They also can associate contractual obligations to an SLO for a critical business period.
The platform also makes it easier to determine how to make systems more reliable or lower reliability goals to reduce costs when possible. IT organizations will also spend much less time determining the root cause of any issue using a combination of real-time and historical data that is more accessible. Today, it’s not uncommon for IT teams to spend weeks looking for the source of an application degradation issue, only to discover the problem can be fixed in a matter of minutes.
It’s still early days as far as adoption of SLO-as-code is concerned, but the days when SLAs or SLOs were, at best, aspirational statements of intent are coming to an end. The expectation is SLOs will soon be maintained via YAML files embedded within application code and distributed across a rapidly expanding enterprise with more interdependencies than any IT team can manually track.