The Mechanics of Deploying the Envoy Proxy at Lyft: Lessons Learned for Ambassador

Published in

Ambassador Labs

5 min readJun 28, 2018

There is no denying that the popularity of the Envoy Proxy project has rocketed over the past year. Matt Klein, one of the founders of the project at Lyft, has been a big part of driving the success of Envoy Proxy by sharing his knowledge within blogs and at conferences. The rise of surrounding open source projects like the Envoy-powered Istio service mesh and the Ambassador API gateway have also contributed.

The design and technical qualities of Envoy make it an ideal proxy for modern (micro)service-based architectures and other data store and middleware services that communicate via Layer 7 (L7) protocols. Indeed, this is why the Ambassador Labs team chose to implement the Ambassador API Gateway using the Envoy Proxy.

At last year’s Microservices Virtual Practitioner Summit Matt gave a great talk about the journey and mechanics of deploying Envoy at Lyft, and I was keen to revisit some of the wisdom shared within this talk and see how this relates to the work being undertaken in the open source Ambassador project.

Envoy Proxy 101

Matt began the talk with an Envoy refresher and reminded everyone of the core features of the cloud native open source edge and service proxy. The out-of-process architecture of Envoy (typically running as a sidecar) allows the project’s developers at Lyft to “do a lot of the really hard [networking and communication] stuff in one place”, and crucially allows application developers to focus on the business logic. The use of modern C++11 is fast and productive and is a notable exception to the current Golang-focused world of operational utilities and tooling (although Rust is also up and coming in this space).

At the core of Envoy is a L3/L4 byte proxy. This can be used for things other than HTTP — indeed the Lyft team uses Envoy in front of all MongoDB TCP connections in their stack. The L7 filter architecture means that it is easy to customize and augment Envoy.

The proxy was built in an “HTTP/2 first” fashion, meaning that support for this is excellent, as is the support for gRPC. Envoy Proxy also supports service discovery and active/passive health checking, and advanced load balancing features like timeouts, circuit breaking, rate limiting, and shadowing etc. Lots of thought was also given to observability, and there is excellent support for monitoring, logging and distributed tracing.

It Starts at the Edge

Several years ago, the Lyft architecture was relatively simple and typical of web apps at this time. The primary Lyft application was deployed into the AWS cloud. A PHP monolith stored data in a MongoDB database, and end-user requests were served via an AWS elastic load balancer (ELB). Today Lyft has embraced the (micro)service approach, and the monolith has been split and augmented with multiple services built in a variety of languages.

Envoy runs both as an edge proxy with routing and TLS support (exposed publicly via a TCP ELB) and as an internal service mesh for both services and data stores. Although this architectural journey has provided clear value, Matt cautioned that you couldn’t go from before to after overnight.

One of Matt’s biggest quick wins with deploying Envoy Proxy was that it was easy to show value very quickly by collecting associated monitoring data, enhanced load balancing and routing, and protocol support. I have read exactly the same comment about the benefits of observability when using the Ambassador API gateway. Alex Gervais, a staff software developer at AppDirect, recently wrote about this on the AppDirect blog post “Evolution of the AppDirect Kubernetes Network Infrastructure”:

“With the underlying Envoy component built-in metrics, we have full observability over the API gateway traffic and behavior through our existing Grafana dashboard.”

First Mongo, Second Everywhere

The use of Envoy at Lyft started as a rate limiter for connections between PHP applications and MongoDB. Not only could Envoy throttle traffic to prevent a “death spiral”, but it could also parse MongoDB traffic at L7 and generate interesting metrics and usage statistics.

Because of the organisation’s dependence on MongoDB, this meant that Envoy had to be running alongside nearly all services. It was then a small jump to use Envoy, which was running as a language-agnostic sidecar, to perform networking functionality for all services, such as ingress buffering, circuit breaking and observability.

After first using Envoy with AWS ELBs, which handled load balancing and associated service discovery, Lyft realised that there would be even more value in creating a direct connection between Envoy instances. However, a service discovery mechanism would be needed.

Rather than embracing an existing solution like etcd or ZooKeeper, which would require the operation of an additional data store component, the Lyft team built a simple, eventually consistent system. From here, the sky (or at least the edge of the Lyft networking perimeter) was the limit.

Envoy Proxy Configuration and the Control Plane

The initial rollout of Envoy Proxy was so successful within Lyft that they determined it needed to be run everywhere to get the full value of (what would later be called) a service mesh. Configuration was specified using JSON files that were bundled within a deployment of Envoy, one for each type of ingress Envoy and service-to-service Envoy.

However, the team soon realised that these configuration files were tedious to write and update. The solution was to write a tool that automatically generated these configs — a type of control plane — that would read in templates and inputs, run the Jinja templating engine, and produce final outputs.

The Ambassador Labs team found similar issues with generating Envoy configuration files, and the Kubernetes-native Ambassador Edge Stack API gateway was partly written to address this. Developers (or operators) can specify simple Kubernetes annotations that map to the core functions provided by Envoy Proxy, such as routing, authentication, and rate limiting. In effect, Ambassador is another form of control plane for the Envoy Proxy. An example Ambassador config can be seen below:

apiVersion: v1
kind: Service
metadata:
  name: httpbin
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name:  httpbin_mapping
      prefix: /httpbin/
      service: httpbin.org:80
      host_rewrite: httpbin.org

Get Involved with the Communities!

Matt concluded his talk by inviting anyone interested in getting involved with the Envoy community, and watching out for the development of additional internal service mesh control planes like Istio. I would also encourage interested engineers to get involved with the Ambassador Labs community, which can be found on the OSS Ambassador Labs Slack.