Blog

Why Federate ArgoCD?

25 Jan, 2023
Xebia Background Header Wave

The successful revolution and evolution of GitOps practices in mainstream enterprises stem from the ability to give teams a process to streamline their unique paradigms and sets of practices, with the sole intention of producing more efficient integration, testing, delivery, deployment, analytics, and governance of code. Traditionally, ArgoCD within GitOps has been deployed as a centralized CD tool within the agile architecture of CI/CD pipelines. This approach is acceptable when ArgoCD needs to serve teams whose end goals are all the same, use cases are aligned, and are not concerned about security regarding who gets access to the cluster hosting ArgoCD. However, when the use case requires a unique architecture to uphold a dynamic ArgoCD platform that can deliver the specific use cases of every team while maintaining a robust centralized, security-intensive environment to prevent unnecessary changes in the entire ecosystem due to those individual and sometimes conflicting use cases, then a federated ArgoCD platform is required.

The concept of Federation is a decentralization of power (i.e., decision-making) wherein a particular entity or entities is given autonomy over something with the intent of adhering to a core power that binds all the federated entities. Why go through the pain of explaining federation? Well, because ArgoCD gives power to teams to automate their Continuous Delivery processes into Kubernetes. What it doesn’t explicitly do is dictate security, RBAC, autonomy, who can delete, update, or create ArgoCD resources like clusters, applications, projects, etc. Hence, this brings into question a fundamental security flaw when relying on a centralized approach. If you give autonomy to one group(s) to add or delete resources, you implicitly give autonomy to all persons in that group(s) to add or delete clusters. Furthermore, the issue of access to the ArgoCD cluster (which is hosting ArgoCD components) becomes problematic. If multiple teams are to be onboarded, their cluster will need to be added to ArgoCD – which means there is a need to give access to the individual(s) so they can add their cluster or dedicate an entire resource to manage this process.

So why a federated architecture?

The purpose of this architecture is to achieve three central goals.

  1. Have a centralized ArgoCD 'Master' controller wherein all decisions regarding RBAC policies, the addition of clusters, deployment, and management of the entire ArgoCD platform are succeeded by this central controller.
  2. To create a robust platform wherein the source of truth for ArgoCD is from Git and not the Kubernetes cluster or end user. Hence, if any component of 'worker' ArgoCD instances is deleted (including namespaces), the Master ArgoCD is more than capable of restoring (self-healing) the desired state. Moreover, this approach achieves a level of security wherein end users, who are not part of the ArgoCD core group, will not have access to the main ArgoCD clusters (master-prod, worker-prod, and worker-uat, etc.).
  3. To give autonomy to each team. Though we have a centralized master controller, it is the purpose of the master to deploy a worker ArgoCD on behalf of each team. That means every team will have its own dedicated ArgoCD instance. This decision allows teams to innovate their CI/CD processes in isolation of other teams’ unique use cases while maintaining an interconnected ecosystem with other teams, as they are still tied to the same controller. Similarly, the issue of scalability is truly solved as each team has its own dedicated ArgoCD instance that can manage as many cluster(s) as they want – without compromising the resource and algorithmic integrity of ArgoCD within the Kubernetes cluster and within the entire ArgoCD ecosystem as well.

Federated ArgoCD Architecture

Explaining the Architecture

The ‘Master’ ArgoCD

As per below, the master ArgoCD will be deployed 'manually' or through automation via any event-driven system into a dedicated Kubernetes cluster – hence it will not be managed by ArgoCD. Within the cluster, it will have a metadata.namespace: 'argocd', an ingress-nginx-controller, and an ingress rule pointing to a hostname. For this article, we are using the hostname: argocd.dev.47deg.com; however, as long as the hostname is registered to a DNS record, we can use the hostname: argocd..47deg.com. How? Because the real identity of this hostname is *.47deg.com. Hence, we are leveraging a round-robin ingress architecture wherein we redirect traffic based on the sub.domain rather than the main domain (47deg.com).

The Git Repository

The source code of our master ArgoCD will be hosted in a git repository. Similarly, all other 'worker' ArgoCD instances are also hosted in the same repository. The schema of the repository has been designed intentionally to assist smooth continuous deployment of ArgoCD(s). For example, when the master has been deployed, it will host the below application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd--
  namespace: argocd
spec:
  destination:
    namespace: argocd-
    server:  # change depending on uat or prod i.e. https://rd-argocd-aks-uat-xxxxxx.xxx.eastus2.azmk8s.io:443
  project: 47deg-argocd--
  source:
    path: argocdv2.0/prod/deploy/
    repoURL: /47deg-argocd.git
    targetRevision: main # git branches can be determined by (( teamName }}
  syncPolicy:
    automated:
      selfHeal: true

This will give master ArgoCD the ability to manage all other ArgoCD instances (workers) based on one source of truth. By utilizing any event-driven system, each time a team wants to be onboarded, the event-driven system will request a few inputs from the end user, as per below:

## Inputs which are stored in Azure KeyVault ##

 # echo <client_secret> |base64. Put the decrypted secret as input.
## Inputs required during automation ##
 # This has to be a **unique** name which has not been claimed by any other team.
 #i.e prod, uat - these are the only two options

# Groups (object_id) is tied to Enterprise applications. Below groups permissions are based on the project.

# 26ed52e9-xxxx-xxxx-xxxx-a4056dff490a = 47deg-ArgoCD-ArgoCD-Dev
# 7be98a90-xxxx-xxxx-xxxx-30a9687f5c88 = 47deg-ArgoCD-Logging-Dev
# 4ceca436-xxxx-xxxx-xxxx-6f079137a4d1 = 47deg-ArgoCD-Monitoring-Dev
# b216ca0f-xxxx-xxxx-xxxx-c90a0c25ba7d = 47deg-ArgoCD-Team1-Dev
# 772af769-xxxx-xxxx-xxxx-f025e3c00db5 = 47deg-ArgoCD-Team2-Dev
# 65a7e9ae-xxxx-xxxx-xxxx-f3989bfb9e94 = fpxxxxxx@47deg.com
 # rolepermission can only be 'admin' or 'readonly'. If not stated, by default, it is
'readonly'.
 # client_id found in azure enterprise application
argocdv2.0/
├── master
│   ├── argocd-cm-manager.yml
│   ├── argocd-rbac-cm-manager.yml
│   ├── argocd-secret-manager.yml
│   ├── clusterole.yml
│   ├── configurecluster
│   │   ├── configure-cluster.yml
│   │   └── configure-sa.yml
│   ├── deploy
│   │   └── argocddeploy.yaml
│   ├── master-ingress.yaml
│   ├── master_hpa.yml
│   ├── namespaces-master.yaml
│   └── nginx-controller
│       ├── metricsserver.yml
│       ├── nginx-hpa.yml
│       └── nginxingresscontroller.yaml
└── prod
    ├── addapp
    │   ├── hpa_app.yml
    │   ├── ingress_app.yml
    │   ├── logging_app.yml
    │   ├── namespace_app.yml
    │   └── rbac_app.yml
    ├── addprojects
    │   └── master-project.yml
    ├── addrepo
    │   └── master_git.yaml
    ├── clusterole.yml
    ├── configurecluster
    │   ├── configure-cluster.yml
    │   └── configure-sa.yml
    ├── deploy
    │   └── logging
    │       └── argocddeploy.yaml
    ├── hpa
    │   ├── hpa.yml
    │   └── metricsserver.yml
    ├── ingress
    │   └── argocdingress.yaml
    ├── namespaces
    │   └── namespaces.yaml
    ├── nginx-controller
    │   ├── metricsserver.yml
    │   ├── nginx-hpa.yml
    │   └── nginxingresscontroller.yaml
    └── rbac
        ├── argocd-cm-logging.yml
        ├── argocd-rbac-cm-logging.yml
        └── argocd-secret-logging.yml

Features of ‘Master and worker’ ArgoCD

To solve questions around elasticity, availability, redundancy, and resource integrity, we have designed the entire federated ecosystem to behave independently within the Kubernetes cluster(s) while relying on a few central core objects within the Kubernetes cluster(s).

Namespace(s): For the 'worker' ArgoCD(s), we are using the concept of 'multi-tenancy' as a design tool wherein we have unique teams (tenants) all being onboarded onto the same platform (ArgoCD). Hence, within the cluster, we have demarcated the cluster based on namespaces, wherein we can ensure that each ArgoCD instance is unique from the other, while co-existing (sharing cluster resources, static IP, domain, and ingress-nginxcontroller) with other ArgoCD instances in the same cluster. This decision is to ensure a cost-effective budget. However, we have retained a 'core-tenant' design with the 'master' ArgoCD wherein it resides in its own dedicated cluster. This is to ensure redundancy and availability for the entire ArgoCD ecosystem.

HPA: Horizontal Pod Autoscaling is a present feature in all of these core components (argocd-application-controller, argocd-server, argocd-reposerver) of ArgoCD. The ingress-nginx-controller, which is deployed in all three ArgoCD Kubernetes clusters, also has this feature. This feature is to ensure elasticity and resource integrity.

Pod-AntiAffinity: This has been applied to all ArgoCD components, though specific core components (i.e., argocd-application-controller, amongst others) have been prioritized with 'weight: 100'. Pod-AntiAffinity aims to ensure that pods with similar applications are not scheduled to the same node based on label constraints and to ensure that kube-scheduler spreads pods across nodes in a way wherein its intention is to reduce correlated failures.

Recommended future features

  • Node-affinity based on NodePools. Though this brings into question the ability to scale, it is an option if costs (or budgets) are not an issue. It guarantees resource integrity and availability, amongst others.
  • Autoscaling of Nodes. This answers the above point of not being able to scale. Again, if cost is not a constraint, it is worth considering when building the infrastructure.
  • Cert-Manager SSL – hardens the cluster and applications.
  • PodSecurityBudgets – strengthens the core features of elasticity and redundancy.

Networking of ‘Worker’ ArgoCD(s).

All worker ArgoCD(s) are deployed into the same cluster. Hence, teams onboarding to UAT will have their individual ArgoCD instance(s) deployed on the 'rd-ArgoCD-AKS-uat-admin'. Similarly, teams onboarding to PRODUCTION will have their individual ArgoCD instance(s) deployed on the 'rd-ArgoCD-AKS-prod-admin'. Since all worker ArgoCD(s) instances will have their dedicated namespace(s), it was necessary to simplify the design of networking within the Kubernetes cluster. Hence, we have chosen the round-robin approach wherein there is one internal ingress-nginx-controller 'LoadBalancer' with a Static IP orchestrating all HTTPS protocol requests. Azure manages this IP via {service.beta.kubernetes.io/azure-load-balancer-internal: "true"}

Hence, all worker ArgoCD instances share this Static IP and domain (47deg.com) within a Kubernetes cluster. But the ingress-nginxcontroller is routing traffic based on sub.domain(s), which is declared in the ingress resource(s).

Federated ArgoCD Network

Conclusion

The purpose of federating ArgoCD is to ensure credibility in security for both ArgoCD and the Kubernetes cluster(s). Moreover, a federated architecture induces an environment of innovation, an ecosystem that allows teams to have autonomy over their ArgoCD instances – without impeding the overall progress of others. Lastly, algorithmic and resource integrity is sustained due to the nature of decentralization. ArgoCD can be scaled in a way where it can manage numerous clusters and applications given the right resources are provided. Ultimately, federation allows the safeguarding of that core concept of GitOps – that is, a single ‘source of truth’.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts