The Kubernetes Metrics and Monitoring Architecture
K8S Metrics types
Kubernetes supports three types of metrics
Resource metrics. Raw resource usage data including CPU and Memory
Custom metrics. Application/service level metrics. E.g. latency, request rate
External metrics. Metrics from outside of the K8S cluster. E.g. Google PubSub metrics
The Monitoring Architecture in Kubernetes
The K8S monitoring architecture consists of the resource metrics pipeline and the custom metrics pipeline.
The resource metrics pipeline consists of the Kubernetes core components. It is the pipeline supported by K8S out-of-the-box.
The custom metrics pipeline is implemented by 3rd applications or service providers.
The goal of this architecture is to provide a stable, versioned API that core Kubernetes components can use along with a set of abstractions for custom/3rd party monitoring applications to easily create and expose custom metrics.
Resource metrics pipeline
The following core components implement the pipeline:
Kubelet. Provides node/pod/container resource usage data (CPU/Memory)
Metrics-server. A cluster-wide aggregator of resource usage data. It scrapes resource usage data from Kubelet through the Summary API. It is registered with the main API server and exposes metrics via the Metrics API.
The Metrics API. Metrics-server exposes resource metrics via the Metrics API which serves metrics to external clients via the main API server
The main API server. The Kubernetes main API server
The resource metrics pipeline does not support metrics long-term retention and storage.
Custom metrics pipeline
Kubernetes also defines the Custom Metrics API and External Metrics API interfaces to support 3rd party full monitoring solutions integration (e.g. Prometheus, datadog, and GCP Stackdriver).
It enables K8S HPAs to auto scale your workloads based on custom/external metrics by accessing those two API endpoints.
Any 3rd party monitoring solutions can implement an adapter to expose custom metrics via the above two API interfaces, on top of their central storage backends.
For example, the diagram below shows the high level architecture of GCP Stackdrvier Monitoring (aka. Cloud Operation Kit) in GKE:
Typically, a metric pipeline consists of three stages:
- Metric generation and exposing to an HTTP endpoint
- Metric scraping
- Metric storage (in local DBs or remote backends)
- Metric aggregation and querying (optional)
As outlined above, all Kubernetes metrics either come from the Resource Metrics Pipeline or Custom Metrics Pipeline, depending on the metric type.
Resource metrics come from the Resource Metrics Pipeline.
Metrics are created by kubelet.
The Metrics-Server scapes metrics from Kubelet and expose the metrics via the Metrics API.
Custom metrics come from the Custom Metrics Pipeline.
Metrics are generated by applications and/or 3rd party monitoring solutions and then being exposed through a HTTP endpoint.
The HTTP endpoint can be either locally hosted, or accessible remotely.
One example is Prometheus. Your application emits metrics in the prometheus format, and exposes them through http://localhost:port.
App Pod (generates metrics) → App Pod (exposes metrics through the local endpoint) → Prometheus server (scraping the metrics from the endpoints) → Prometheus (stores the metrics in its DB)
Another example is the GKE system metrics, where the gke-monitoring-agent expose the system metrics to the remote endpoint hosted in GCP Cloud Monitoring. This is what’s happening behind the scene:
gke-monitoirng-agent (run as pods, generates metrics) → Cloud Monitoring API (metrics exposing) → GCP Cloud Monitoring Dashboard (the remote endpoint)
A metrics adapter is needed if you want to view the metrics via the Kubenetes API. The adapter implements the Customer API interface, scraps external metrics and exposes the metrics through the API. This is required for your HPAs to work with your custom metrics.
External metrics come from external products or services. E.g. Google Cloud PubSub
A metrics adapter is needed. The adapter implements the External API interface, scraps metrics and exposes the metrics via the API.
In the next section, I’ll cover how Horizontal Pod Autoscaler (HPAs) works with different metrics.
Metric and Time series
Metric refers to data of the same attribute that we collect over time as a series of “data points”.
e.g. “request_total”: Points[(100. 2021–11–09pm), (50, 2021–11–23–10am)…]
Time series is the data structure under a metric, it contains a set of data points.
Latency in a metric pipeline
There is latency introduced in a pipeline between the time when metrics are generated by the metric source, collected by a separate scraper.
For instance, the resource metrics have the latency between the time when metrics are generated on the node (refreshed by kubelet once every 15s), scraped by Metrics server (once every 30s).