Remove products service-reliability-management automated-slo-tracking
article thumbnail

Incident Review: Meta-Review, August 2020

Honeycomb

At Honeycomb, we’re lucky to work on a product that has a track record of relatively few outages and major incidents — so few that we sometimes fret about how to keep our incident response skills sharp. Our on-call engineer became aware of the issue when he was paged by an SLO burn alert. The Incidents.

article thumbnail

DevOps vs Site Reliability Engineering: Concepts, Practices, and Roles

Altexsoft

For over a decade, two similar concepts — DevOps and Site Reliability Engineering (SRE) — have been coexisting in the world of software development. This article explains how DevOps and SRE facilitate building reliable software, where they overlap, how they differ from each other, and when they can efficiently work side by side.

DevOps 96
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

AI Product Management After Deployment

O'Reilly Media - Ideas

The field of AI product management continues to gain momentum. As the AI product management role advances in maturity, more and more information and advice has become available. One area that has received less attention is the role of an AI product manager after the product is deployed.

article thumbnail

Evaluating Splunk On-Call Alternatives

xmatters

Splunk On-Call (Formerly VictorOps) is a popular incident response and on-call management platform that allows engineering and operations teams to collaborate with ease and resolve issues faster. These more distinct offerings focus on the initial steps of the incident management lifecycle, detect and respond.

article thumbnail

Impact in Production

LaunchDarkly

On April 23, Dylan Etkin, CEO and Founder of sleuth.io , spoke at our Test in Production Meetup on Twitch. Dylan shared about what it means to track impact in a production environment, what happens if you don’t, and some actions you can take today to move in the right direction. Watch Dylan’s full talk. “…the

article thumbnail

Building a Kubernetes-Based Platform

Daniel Bryant

Kubernetes has been widely adopted as a container manager, and has been running in production across a variety of organisations for several years. As such, it provides a solid foundation on which to support the other three capabilities of a cloud native platform: progressive delivery, edge management, and observability.