Skip to content

Service Mesh

What is Service Mesh?

Kubernetes is fantastic for building microservices. But inside, behind the scene, the networking is transparent, and the services are not easily aware of each other. The communication between services is unencrypted, and is very hard to trace network issues inside Kubernetes cluster sometimes. This opens up a lot of possibilities for attacks, you could potentially deploy a service to listen on the network inside the cluster.

Service Mesh brings copious benefits to your cluster:

  • Security: The communication between pods become mTLS encrypted.
  • Monitoring: Service Mesh usually provide tools for detail monitoring and telemetry of the cluster's network.
  • Multi-cluster communication: Service Mesh can be used for secure communication between services in different clusters (not all Service Meshes support this).
  • Load balancing: Service Mesh can be used for load balancing between pods/services, some can even detect which pod is responding faster and send the request to that pod. All that automatically.
  • Service discovery: Service Mesh can be used for service discovery, you can find out which pods are running which services.
  • Retries and Timeouts: Service Mesh can be used for automatic retries, you can configure how many retries and timeouts should there be and what would be the "plan B" if the primary service is not reachable.
  • Ingress: Some Service Mesh implementations support own custom Ingresses, or can use already installed ingress, like Traefik.
  • Traffic Permissions: Service Mesh can enforce which pod can talk to which pod. You could use Kubernetes RBAC, but this way is a bit more flexible.

There are many more nice features, depending on which mesh you choose. Sounds pretty cool and useful, right?

Which one?

As with many other options for Kubernetes, there are many Service Meshes. There are various project in various stage of development. Some of them are:

  • Linkerd - My choice for K3s on Raspberry Pi 4, it seems to be the most lightweight and have also nice UI.
  • Istio - Would be my choice for normal Kubernetes cluster, the UI is really pleasant.
  • Kuma
  • Consul Connect
  • Maesh - Also interesting project from Traefik guys, but I think it's not mature enough and no UI (as far as I know)

And these are just the one I could name from the top of my head.

Linkerd

I have chosen Linkerd for my K3s cluster. Mostly because of its lightness, features and very nice UI. It is also quite popular, being the first to come up with service mesh, so there is a chance to find solution for potential issues on Slack or just forums.

How does it work?

You can "tag" your service and Linkerd will then load a sidecar container with proxy that will handle all the traffic to and from your pod.

Linkerd diagram

Services:

  • Identity: This service act as TLS Certificate Authority, and will generate a certificate for mTLS communication between pods
  • Destination: It dictates pod proxy behavior, what it can connect to, how many retires and so on.
  • Proxy-injector: Every time a pod is created, and it's tagged for mesh, it mutates the pod by adding proxy-init and linkerd-proxy containers to the pod.

Installing Linkerd

Again, there is more than one way to install the Linkerd. I'm going with the CLI installation.

On your control node under root, run:

curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh

That will install linkerd binary, you need to add thefollowing to the bottom of `~/.bash_profile:

export PATH=$PATH:/root/.linkerd2/bin

You can also execute it as is, but you will have to do it every time you log in, I recommend you to add it to your ~/.bash_profile file. Re log in, and you should be good to go.

Confirm success with:

root@control01:~# linkerd version
# Your version might be different.
Client version: stable-2.11.3
Server version: unavailable

Do a pre-check:

root@control01:~# linkerd check --pre
Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

pre-kubernetes-setup
--------------------
√ control plane namespace does not already exist
√ can create non-namespaced resources
√ can create ServiceAccounts
√ can create Services
√ can create Deployments
√ can create CronJobs
√ can create ConfigMaps
√ can create Secrets
√ can read Secrets
√ can read extension-apiserver-authentication configmap
√ no clock skew detected

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

Status check results are √

All should be √, if yes, continue with the installation.

#Will generate the install yaml files
linkerd install

# Install directly
linkerd install | kubectl apply -f -

# I preffer to save the yaml and apply manually or via argo CD
linkerd install > linkerd_install_2.11.3.yaml
kubectl apply -f linkerd_install_2.11.3.yaml

Give it a few minutes to deploy and run the post check:

root@control01:~# linkerd check
Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks can be verified
√ cluster networks contains all node podCIDRs

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
√ control plane proxies and cli versions match

Status check results are √

Hopefully we have all marked as √.

If so, you just deployed Linkerd service mesh core. And you could leave it like this, and start using it, but let's get some insight into the mesh with UI. Linkerd by default do not affect already running services, that could cause outage or other issues.

Note

Meshed services and un-meshed can't talk to each other.

Dude where is my UI?

I know, I know, I have promised nice UI for Linkerd, and I will deliver. But it took me a while to get it working correctly. The thing is, linkerd-viz is UI and metric deployment for Linkerd. It does come with full stack, Prometheus and Grafana... but, I already am running both instances on their own. Monitoring Guide I would not like to add duplicate services just to consume more resources.

To generate the YAML files, run:

linkerd viz install  --set prometheus.enabled=false --set grafana.enabled=false --set prometheusUrl=http://192.168.0.205:9090 --set grafana.externalUrl=http://192.168.0.206:3000 --set dashboard.enforcedHostRegexp=".*"
  • prometheusUrl: This is IP of my Prometheus instance exported via Metallb service, you can change it to your own. Or even use internal Kubernetes names for it, when you have it just as ClusterIP. It should follow the format <service_name>.<namespace>. svc.cluster.local (I think, I did not test it yet).
  • grafana.externalUrl: This is IP of my Grafana instance exported via Metallb service, you can change it to your own. Or even use internal Kubernetes names for it, when you have it just as ClusterIP. It should follow the format <service_name>.<namespace>. svc.cluster.local same as above.

Note

It's important for both values to use http:// and port number because the pod is parsing it and expecting that format. I had pod failing to start when I used just IP.

  • prometheus.enabled: Disable installation of another Prometheus instance.
  • grafana.enabled: Disable installation of another Grafana instance.
  • dashboard.enforcedHostRegexp: Host header validation regex for the dashboard. Helps if exposed via Traefik or other load balancer.

You can either save the output to a YAML file or pipe it to install directly, same as the core installation. Me, I'm storing the YAML files in Git and applying via Argo CD. Man, I love that system.

I have also created a small service to expose the UI on external IP 192.168.0.215:80, so I can get to it easier.

apiVersion: v1
kind: Service
metadata:
  name: linkerd-viz-web
  namespace: linkerd-viz
spec:
  ports:
    - name: http
      port: 80
      targetPort: 8084
    - name: admin-http
      port: 9994
      targetPort: 9994
  type: LoadBalancer
  selector:
    component: web
    linkerd.io/extension: viz
  loadBalancerIP: 192.168.0.215

Once deployed, I had to restart these deployments, because UI was complaining about: pods missing tap configurations.

Linkerd Missing Tap

#Get all deployments in the namespace linkerd
root@control01:~# kubectl get deployment -n linkerd
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
linkerd-destination      1/1     1            1           4d2h
linkerd-identity         1/1     1            1           4d2h
linkerd-proxy-injector   1/1     1            1           4d2h
#Restart the deployments
kubectl rollout restart deploymen linkerd-destination -n linkerd
kubectl rollout restart deploymen linkerd-identity -n linkerd
kubectl rollout restart deploymen linkerd-proxy-injector -n linkerd

After that, no more complaints in UI.

Grafana and Prometheus

Since we are using our own instances, we need to tell Prometheus where to look for the endpoints and scrape them. I'm very much referring to my own implementation of Prometheus via Prometheus Operator.

Implementation of PodMonitors for Prometheus to scrape.

---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app: linkerd
    release: prometheus
    name: linkerd-controller
  name: linkerd-controller
  namespace: monitoring
spec:
  namespaceSelector:
    matchNames:
      - linkerd-viz
      - linkerd
  selector:
    matchLabels: {}
  podMetricsEndpoints:
    - interval: 10s
      scrapeTimeout: 10s
      relabelings:
      - sourceLabels:
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: admin-http
      - sourceLabels:
        - __meta_kubernetes_pod_container_name
        action: replace
        targetLabel: component
      # Replace job value
      - sourceLabels:
        - __address__
        action: replace
        targetLabel: job
        replacement: linkerd-controller
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app: linkerd
    release: prometheus
    name: linkerd-service-mirror
  name: linkerd-service-mirror
  namespace: monitoring
spec:
  namespaceSelector:
    any: true
  selector:
    matchLabels: {}
  podMetricsEndpoints:
    - interval: 10s
      scrapeTimeout: 10s
      relabelings:
      - sourceLabels:
        - __meta_kubernetes_pod_label_linkerd_io_control_plane_component
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: linkerd-service-mirror;admin-http$
      - sourceLabels:
        - __meta_kubernetes_pod_container_name
        action: replace
        targetLabel: component
      # Replace job value
      - sourceLabels:
        - __address__
        action: replace
        targetLabel: job
        replacement: linkerd-service-mirror
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app: linkerd
    release: prometheus
    name: linkerd-proxy
  name: linkerd-proxy
  namespace: monitoring
spec:
  namespaceSelector:
    any: true
  selector:
    matchLabels: {}
  podMetricsEndpoints:
    - interval: 10s
      scrapeTimeout: 10s
      relabelings:
      - sourceLabels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_container_port_name
        - __meta_kubernetes_pod_label_linkerd_io_control_plane_ns
        action: keep
        regex: ^linkerd-proxy;linkerd-admin;linkerd$
      - sourceLabels: [__meta_kubernetes_namespace]
        action: replace
        targetLabel: namespace
      - sourceLabels: [__meta_kubernetes_pod_name]
        action: replace
        targetLabel: pod
      - sourceLabels: [__meta_kubernetes_pod_label_linkerd_io_proxy_job]
        action: replace
        targetLabel: k8s_job
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_job
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_(.+)
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
        replacement: __tmp_pod_label_$1
      - action: labelmap
        regex: __tmp_pod_label_linkerd_io_(.+)
        replacement:  __tmp_pod_label_$1
      - action: labeldrop
        regex: __tmp_pod_label_linkerd_io_(.+)
      - action: labelmap
        regex: __tmp_pod_label_(.+)
      # Replace job value
      - sourceLabels:
        - __address__
        action: replace
        targetLabel: job
        replacement: linkerd-proxy

Deploy this YAML to monitoring namespace.

kubetectl apply -f linkerd-viz-prometheus.yaml -n monitoring

We need to edit out Promethus Instance to include PodMonitoring. I'm referring to prometheus.yaml from here: Prometheus

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-persistant
  namespace: monitoring
spec:
  replicas: 1
  retention: 7d
  resources:
    requests:
      memory: 400Mi
  nodeSelector:
    node-type: worker
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - longhorn-prometheus-servicemonitor
      - kube-state-metrics
      - node-exporter
      - kubelet
      - traefik
  podMonitorSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - linkerd-controller
      - linkerd-proxy
      - linkerd-service-mirror
  serviceMonitorNamespaceSelector: {}
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: longhorn-fast
        resources:
          requests:
            storage: 30Gi

I have added the following to prometheus.yaml:

  podMonitorSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - linkerd-controller
      - linkerd-proxy
      - linkerd-service-mirror
  serviceMonitorNamespaceSelector: {}
  • podMonitorSelector: Basically tells the Prometheus to only scrape pods that match the selector.
  • serviceMonitorNamespaceSelector: This part gave me a grief, without this Prometheus could not access pods in different namespaces.

Apply the edited YAML and wait for Prometheus to redeploy. It should be quick. Then go into the Prometheus UI and click on the StatusTargets. You should see the linkerd endpoints there.

Prometheus endpoints

Now we can use Dashboards in Grafana to see the metrics. Here is a list of IDs that worked for me (official dashboards):

  • Linkerd Health - 15486
  • Linkerd Route - 15481
  • Linkerd Namespace - 15478
  • Linkerd Deployment - 15475

That is it for installation and setup of Linkerd. Explore the UI for now and in the next part we will check how to inject Linkerd into deployments and have secure communication between services. Also, probably if you want to expose meshed service via Traefik, we need to setup Traefik foe that. And lastly, I'm running OpenFaaS and would like to integrate it there as well, so the communication between OpenFaaS gateway and functions is all secured. Where you might not want to implement Linkerd is Longhorn, I would guess. The encryption of the communication would induce latency.

Usage

We have the Linkerd Service Mesh installed, but how do we use it? First, we need some deployment to experiment on. Linkerd prepared sample deployments for us. In essence, you only need to tag your deployments with the correct label to enable sideloading of the Linkerd proxy.

Warning

This part is still being written, please come back soon :)


Last update: July 22, 2022

Comments