Buildkite Adds Analytics Tools to Identify Flaky App Tests

Buildkite has added an analytics tool to its continuous integration/continuous delivery (CI/CD) platform that identifies flaky tests.

Buildkite’s co-CEO Keith Pitt said the company’s Test Analytics tool enables continuous performance monitoring and real-time insights for test suites that, when not properly written, only serve to waste DevOps teams’ time. Rather than eliminating those tests, most DevOps teams just opt to run them again on the assumption that the code they are testing will eventually pass because the test is poorly constructed, noted Pitt.

Buildkite reports there are more than 800 million seconds—or the equivalent of 25 years—of flaky tests manually re-run on its platform by developers each month.

In addition to being integrated with the Buildkite CI/CD platform, the Test Analytics tool can also be applied to tests residing in CI platforms such as CircleCI, GitHub Actions and Jenkins.

Test Analytics includes a dashboard that identifies the slowest and least reliable tests to help prioritize test remediation efforts. Historical views and trend graphs are also available to provide further insight into recurring issues.

DevOps teams can also take advantage of user-defined thresholds to define whether a test is slow or unreliable, which generate custom alerts when discovered. Finally, there are integrations with commonly used testing frameworks to determine where tests are, for example, wasting time running database queries against application programming interfaces (APIs) that have timed out.

Pitt noted that most tests today are created by developers and software engineers rather than dedicated testing teams. As such, the chances there will be an issue with a test that is returning flaky results are high.

Organizations are, of course, trying to determine how far left to shift application testing. Developers can write their own tests, but how thorough and objective those tests are is debatable. There are also certain tests that should be applied to all applications that organizations can embed within their DevOps workflows to make sure they are always run.

Obviously, the more testing that occurs the fewer issues DevOps teams will encounter after an application is deployed. However, when tests are flaky, it becomes easier for developers to justify not running them even though that test might still surface an issue they should address. In addition to increasing confidence in the quality of those tests, knowing which tests are flaky allows a DevOps team to determine which tests should be run first. That’s critical at a time when more organizations are trying to increase developer productivity; that productivity is often adversely impacted by waiting for tests to be completed before more code can be written.

Eventually, the overall quality of tests should improve as machine learning algorithms and other forms of data science are applied. In the meantime, however, the onus for improving application testing is clearly on the engineers that create them. The challenge is many of those engineers may not even be aware how flaky a test really is until someone shows them some empirical data that can’t be disputed.