How to setup regional fail-over for Google cloud serverless NEGs

In this blog, we like to show you how to setup a highly available, regional fail-over setup when using serverless network endpoint groups (NEGs). Traditionally network endpoint groups rely on gRPC health checks to determine whether a backend is available and ready to receive traffic. However, these health checks are not supported for serverless network endpoint groups. Without health checks, traffic will be sent to backends in available regions and dropped if the backend is unhealthy. So, how would we instead redirect traffic from an unhealthy NEG?

To solve this issue outlier detection was introduced by Google. Outlier detection works by analyzing HTTP responses and ejecting unhealthy backends when certain thresholds are met. This functionality is available on global backend services with serverless NEGs.

Let’s see how we can enable this on our backend service terraform resource:

resource "google_compute_backend_service" "backend" {
  ...

  outlier_detection = {
    base_ejection_time = "30s"
    consecutive_errors = 5
  }
}

With the configuration above, the backend service ejects a backend when it responds with five consecutive 5xx HTTP error codes. When ejected, the backend will not receive any traffic for 30 seconds before retesting the backend.

Now let’s take a look at a full example:

locals {
  locations = toset(["europe-west1", "europe-west4"])
}

resource "google_compute_global_address" "ip" {
  name = "service-ip"
}

resource "google_compute_region_network_endpoint_group" "neg" {
  for_each = toset(local.locations)

  name                  = "neg-${each.key}"
  network_endpoint_type = "SERVERLESS"
  region                = each.key

  cloud_run {
    service = google_cloud_run_service.service[each.key].name
  }
}

resource "google_compute_backend_service" "backend" {
  name     = "backend"
  protocol = "HTTP"

  dynamic "backend" {
    for_each = local.locations

    content {
      group = google_compute_region_network_endpoint_group.neg[backend.key].id
    }
  }

  outlier_detection = {
    base_ejection_time = "30s"
    consecutive_errors = 5
  }
}

resource "google_compute_url_map" "url_map" {
  name            = "url-map"
  default_service = google_compute_backend_service.backend.id
}

resource "google_compute_target_http_proxy" "http_proxy" {
  name    = "http-proxy"
  url_map = google_compute_url_map.url_map.id
}

resource "google_compute_global_forwarding_rule" "frontend" {
  name       = "frontend"
  target     = google_compute_target_http_proxy.http_proxy.id
  port_range = "80"
  ip_address = google_compute_global_address.ip.address
}

Conclusion

In absence of health check support on serverless NEGs, always specify outlier detection on serverless NEGs backend services to ensure availability in the case of regional failures. The outlier detection ejects unhealthy backends and avoids traffic to be sent to services in unavailable regions.

Photo by Kevin Schmid on Unsplash

How to setup regional fail-over for Google cloud serverless NEGs

Conclusion

Get in touch with us to learn more about the subject and related solutions

Explore related posts