Prometheus
0. Required Metrics
Container and pod metrics are used to determine recommendations for individual workloads. Node metrics are used to determine cost and overall cluster health.
The complete list of metrics that Flightcrew reads from are as follows:
container_cpu_usage_seconds_total
container_memory_usage_bytes
container_memory_working_set_bytes
kube_node_status_allocatable
kube_node_status_capacity
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_container_status_restarts_total
kube_pod_info
kube_pod_status_phase
kube_pod_status_ready
If you have custom metric names, please contact us for further assistance.
1. Set up kube-state-metrics
Skip this step if: kube-state-metrics is already installed on your cluster.
Actions: Install kube-state-metrics.
If Prometheus has been installed via Helm, the add the following lines to values.yaml
here:
kube-state-metrics:
enabled: true
If Prometheus has been installed another way, kube-state-metrics can be installed as a standalone Helm chart with the following commands:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
2. Set up scrape configs
Skip this step if: Prometheus was installed via helm chart. It will have relevant scrape configs and you can skip to the next step.
Actions: Add the scrape configs.
- Copy the Prometheus Helm chart's scrape configs into your Prometheus server
- Restart Prometheus (
kubectl delete pods ...
) to enact the new configuration.
Note: if you've configured any scrape configs in
values.yaml
, these defaults may be overwritten, so your overrides should be moved toextraScrapeConfigs
instead. Enact the new configuration usinghelm uninstall
andhelm install
.
Verify: Examine the scrape configs with the following commands:
kubectl get configmaps -n <namespace-of-prometheus>
kubectl get configmap <configmap-name> -n <namespace-of-prometheus> -o yaml
The scrape_configs should look something like this if correctly configured:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
...
3. Verify API Access
Actions: Curl Prometheus directly and make sure the response has the data Flightcrew is looking for.
First, port-forward your Prometheus instance:
# Replace "prometheus-service" and "monitoring" namespace
# if they are named differently in your cluster.
export PROMETHEUS_CONTAINER_PORT=$(kubectl get service prometheus-service --namespace=monitoring -ojsonpath="{.spec.ports[].port}")
kubectl port-forward "service/prometheus-service" --namespace monitoring "9090:${PROMETHEUS_CONTAINER_PORT}"
Then, in another terminal window, paste the following curl commands to query Prometheus:
# Some example metrics that Flightcrew reads from:
curl --silent http://localhost:9090/api/v1/query?query=kube_pod_info | head --bytes=150
curl --silent http://localhost:9090/api/v1/query?query=kube_node_status_allocatable | head --bytes=150
curl --silent http://localhost:9090/api/v1/query?query=container_cpu_usage_seconds_total | head --bytes=150
See a complete list of metrics and their usage in Required Metrics.
Verify: The output should contain data in the "result"
section. For example:
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"kube_pod_info","app_kubernetes_io_component":"metrics", ... }}]}}
If instead the list is empty and just looks like "result":[]
, it could be that one of the above steps was not followed correctly, or check further Troubleshooting steps below.
Troubleshooting
Common problems we've seen while configuring Prometheus:
-
Ensure Prometheus has been restarted for the config changes to take effect.
-
The service is pointing at the wrong port - Ensure
spec.ports.targetPort
on the Service should matchspec.template.spec.containers.ports.containerPort
.See the snippet below as an example for when the ports are correctly aligned:apiVersion: v1
kind: Service
metadata:
name: ...
namespace: ...
annotations: ...
spec:
selector:
app: ...
type: ClusterIP # ClusterIP gives internal DNS
ports:
- port: 9090 # The service listens on this port
targetPort: 9090 # The service communicates with pods on this port
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: monitoring
labels:
app: prometheus-server
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-server
template:
metadata:
labels:
app: prometheus-server
spec:
containers:
- ...
ports:
- containerPort: 9090 # should match targetPort in the service