OverOps is adding a series of dashboards to its analytics platform that employ machine learning algorithms to identify which software application anomalies in a pre-production environment are likely to have the most impact on a production environment.
Company CTO Tal Weiss said one of the biggest challenges DevOps teams face is prioritizing which issues to fix first in an application. The Reliability Scoring capability being added to the company’s namesake platform for analyzing code employ machine learning algorithms to keep track of introduced errors, regressions and slowdowns during every phase of pre-production. In addition to enabling DevOps teams to visually identify the root cause of an issue via a single click, that data is then used to identify which issues will impact reliability.
Weiss noted that the Reliability Scoring can also be employed to set thresholds of acceptable application performance that would also be used to determine whether there is an issue. The data collected using that capability then can be fed back into a continuous integration/continuous deployment (CI/CD) platform to set up gates that would then prevent suspect code from being promoted, he said.
The primary benefit, however, should be a major reduction in the number of troubleshooting instances any DevOps team should need to engage in after an application is deployed in production, said Weiss. In fact, the Reliability Dashboards will provide a central mechanism through which developers, testing professionals and IT operations teams can collaboratively track and measure the progress of an application development project, he added.
The Reliability Scoring tool OverOps developed employs “micro-agents” that operate between application code and the hardware. Weiss said those micro-agents can not only capture data that was never previously available, they deduplicate, classify and gate critical issues as they move into staging and production. The Reliability Dashboards visualize that data in Grafana, a set of open source tools for visualizing data. DevOps teams can also opt to feed that data into any existing dashboards they may already have in place.
In general, Weiss said the metrics surfaced by the Reliability Dashboard are critical to justifying investments in DevOps over the long term. Organizations expend a lot of time and effort making the transition to DevOps. But most of them don’t really know to what degree they are getting better at building and deploying applications. Reliability Scoring dashboards provide the historical context required to make those assessments, he said.
Far too many DevOps teams are still starved for data needed to make critical decisions. In the absence of that data, teams address issues as they arise, rather than based on the impact they might have on the business. Worse yet, it’s difficult to identify new issues over those that might be another manifestation of a known issue that suddenly becomes a much higher priority.
Regardless of the root cause of any problem, however, DevOps teams would do well to remember things that get measured are usually the first things to get fixed.