Skip to main content

Prometheus

0. Required Metrics

Container and pod metrics are used to determine recommendations for individual workloads. Node metrics are used to determine cost and overall cluster health.

The complete list of metrics that Flightcrew reads from are as follows:

  • container_cpu_usage_seconds_total
  • container_memory_usage_bytes
  • container_memory_working_set_bytes
  • kube_node_status_allocatable
  • kube_node_status_capacity
  • kube_pod_container_resource_limits
  • kube_pod_container_resource_requests
  • kube_pod_container_status_restarts_total
  • kube_pod_info
  • kube_pod_status_phase
  • kube_pod_status_ready

If you have custom metric names, please contact us for further assistance.

1. Set up kube-state-metrics

Skip this step if: kube-state-metrics is already installed on your cluster.

Actions: Install kube-state-metrics.

If Prometheus has been installed via Helm, the add the following lines to values.yaml here:

kube-state-metrics:
enabled: true

If Prometheus has been installed another way, kube-state-metrics can be installed as a standalone Helm chart with the following commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system

2. Set up scrape configs

Skip this step if: Prometheus was installed via helm chart. It will have relevant scrape configs and you can skip to the next step.

Actions: Add the scrape configs.

  1. Copy the Prometheus Helm chart's scrape configs into your Prometheus server
  2. Restart Prometheus (kubectl delete pods ...) to enact the new configuration.

Note: if you've configured any scrape configs in values.yaml, these defaults may be overwritten, so your overrides should be moved to extraScrapeConfigs instead. Enact the new configuration using helm uninstall and helm install.

Verify: Examine the scrape configs with the following commands:

kubectl get configmaps -n <namespace-of-prometheus>
kubectl get configmap <configmap-name> -n <namespace-of-prometheus> -o yaml

The scrape_configs should look something like this if correctly configured:

- job_name: 'kubernetes-pods'

kubernetes_sd_configs:
- role: pod

relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
...

3. Verify API Access

Actions: Curl Prometheus directly and make sure the response has the data Flightcrew is looking for.

First, port-forward your Prometheus instance:

# Replace "prometheus-service" and "monitoring" namespace
# if they are named differently in your cluster.
export PROMETHEUS_CONTAINER_PORT=$(kubectl get service prometheus-service --namespace=monitoring -ojsonpath="{.spec.ports[].port}")
kubectl port-forward "service/prometheus-service" --namespace monitoring "9090:${PROMETHEUS_CONTAINER_PORT}"

Then, in another terminal window, paste the following curl commands to query Prometheus:

# Some example metrics that Flightcrew reads from:
curl --silent http://localhost:9090/api/v1/query?query=kube_pod_info | head --bytes=150
curl --silent http://localhost:9090/api/v1/query?query=kube_node_status_allocatable | head --bytes=150
curl --silent http://localhost:9090/api/v1/query?query=container_cpu_usage_seconds_total | head --bytes=150

See a complete list of metrics and their usage in Required Metrics.

Verify: The output should contain data in the "result" section. For example:

{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"kube_pod_info","app_kubernetes_io_component":"metrics", ... }}]}}

If instead the list is empty and just looks like "result":[], it could be that one of the above steps was not followed correctly, or check further Troubleshooting steps below.

Troubleshooting

Common problems we've seen while configuring Prometheus:

  1. Ensure Prometheus has been restarted for the config changes to take effect.

  2. The service is pointing at the wrong port - Ensure spec.ports.targetPort on the Service should match spec.template.spec.containers.ports.containerPort.See the snippet below as an example for when the ports are correctly aligned:

    apiVersion: v1
    kind: Service
    metadata:
    name: ...
    namespace: ...
    annotations: ...
    spec:
    selector:
    app: ...
    type: ClusterIP # ClusterIP gives internal DNS
    ports:
    - port: 9090 # The service listens on this port
    targetPort: 9090 # The service communicates with pods on this port
    protocol: TCP

    ---

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: prometheus-deployment
    namespace: monitoring
    labels:
    app: prometheus-server
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: prometheus-server
    template:
    metadata:
    labels:
    app: prometheus-server
    spec:
    containers:
    - ...
    ports:
    - containerPort: 9090 # should match targetPort in the service