To be completely honest, I'm writing this bit reluctantly as I do not have 100% understanding of this monitoring setup. Especially how each component is interconnected. So there might be inaccuracy or setup that is not needed. If you have more knowledge about this, please correct me in comments and I will fix it. However in the end you will have working monitoring using Prometheus and cool looking dashboards using Grafana, hopefully also some understanding how things work. Method of deployment I have chosen is not the easiest but allow more control of whats going to be deployed.
I like Helm, but running something like
helm install Prometheus will deploy full stack of monitoring, sprawling over whole cluster deploying tons of services and permissions left and right, multiple instances and so on... (I'm sure that can be setup to your liking, but I guarantee that you will not know how the monitoring works. What is sending what and where...)
I have on purpose skipped functionality to report issues via email or teams, I was interested only in monitoring and logging status no active mailing me when one node dies or something. I might integrate this later though.
I'm going to deploy Prometheus to collect data from various services in my K3s Kubernetes cluster. I assume same setting and services as I have, mainly Longhorn for persistent storage. Data from Prometheus will be displayed in single instance of Grafana.
Here are some juicy pictures to keep you motivated:
Whole monitoring is build from several components. I'll try to explain them as best as I can and as simply as I can so you don't get lost.
This is solo deployment, one instance that will help us provision Prometheus and some of its components. It extends the Kubernetes API so when we create some of the yaml deployments it will looks like we telling Kubernetes to deploy something, but its actually telling to Prometheus Operator to do it for us. Official git: Prometheus Operator
This is the collector of metrics, it use something called service monitors that provide information Prometheus can come and scrape. Prometheus will use persistent storage and we will specify for how long it will keep the data. You can have more than one instance of Prometheus in your cluster collecting separate data. Having multiple instances of Prometheus would ensure that if one died not whole monitoring is dead. In our case we will have only two, one that come with OpenFaaS if you followed my guide and one we deploy to collect monitoring data from everything else. This is mainly because Prometheus keeps its stuff in RAM, and that is at premium on Raspberry Pi 4.
These are another containers / deployments. They are kind of middle steps between the data and Prometheus. We will deploy some that are single deployment, for example for Kubernetes API to collect metrics from server or longhorn service monitor is also single deployment. Then there is another kind of deployment
daemonset, which deploy container to each node,
node-exporter is using this kind of deployment and thats because it collects underlying OS information per node. Make sense ?
Prometheus can do some graphing of data, it have its own web UI... however Grafana is on another level. Grafana as its name suggest makes graphs, subtle I know 🙂. You can create custom dashboards and display data collected by Prometheus. It can display data from multiple Prometheus instances, combine them into single dashboard etc...
How is everything connected ?
This is a picture how I think everything is connected in the deployment we going to do.
Enough talk, lets move on to deploying Prometheus Operator