How to integrate thanos in existing prometheus setup

Kubernetes
By Vikrant
January 17, 2019

Problem Statement : I have existing prometheus setup which is using Ceph RBD as a storage. Now I want to introduce the thanos component into the environment to take the advantage of object storage long term retention and downsampling.

Solution: Prometheus operator provides official example [1] of introducting thanos into existing prometheus setup but latest (at the time of writing) image v27.0 doesn’t contain the PR [2] which provides the functionality of using object store configuration in manifest.

I created my own image which includes the fix to include the object storage. I will show the steps to create the image and I have pushed the image to my docker hub repo page just in case if someone wants to use it for testing purpose.

Let’s start with lab work. We will cover the following steps in this guide.

  • Prepare the Prometheus operator and prometheus-config-reloader-image image.
  • Deploy the ceph using ROOK on kubernetes.
  • Install prometheus using operator with new prometheus operator, config reloader image and use ceph for storage.
  • Introduce the thanos-sidecar into the existing prometheus setup.
  • Deploy the thanos-store and thanos-compactor components.

Prepare the prometheus operator and prometheus config reloader image

  • Started the ubuntu vagrant box which has docker capability.

  • Install the go on ubuntu following link

  • Clone the prometheus operator repo.

# mkdir -p ~/prometheus/src/github.com/coreos
# cd ~/prometheus/src/github.com/coreos
# git clone https://github.com/coreos/prometheus-operator
  • Set the GOPATH
root@vagrant:~# cat ~/.profile
# ~/.profile: executed by Bourne-compatible login shells.

if [ "$BASH" ]; then
  if [ -f ~/.bashrc ]; then
    . ~/.bashrc
  fi
fi

tty -s && mesg n
export GOPATH=$HOME/prometheus
export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
  • Issue the following command to prepare the images.
# cd ~/prometheus/src/github.com/coreos/prometheus-operator
# make operator
# make hack/operator-image   << To generate the prometheus operator image. 
# make hack/prometheus-config-reloader-image  << To generate the config reloader image.
  • Copy the images from vagrant box to your minikube setup. If you are new to docker world follow link to copy the docker image from one machine to another. By default we don’t know the credentials of root user in minikube setup. We can create a new user and use that user to copy the image to minikube from vagrant box.

Tip: you will not find useradd command in minikube hence use adduser command to add the user.

you may destroy your vagrant box after this activity.

  • If you want to download the image from docker hub repo.

ervikrant06/prometheusoperator:prometheus-config-reloader ervikrant06/prometheusoperator:prometheusoperator

Deploy ceph using ROOK.

Follow this article to complete the ceph installation on kubernetes.

Starting prometheus operator using new images with ceph storage.

  • I am using the official examples provided in the kubernetes repo for the installation. For ceph related changes this link can be used.

  • Following diff is showing all changes included the image and ceph related changes. I intentionally kept the retenttion of prometheus data in local storage to 10m value so that I can quickly show the thanos working.

diff --git a/contrib/kube-prometheus/manifests/0prometheus-operator-deployment.yaml b/contrib/kube-prometheus/manifests/0prometheus-operator-deployment.yaml
index 1ddbae2f..038329f4 100644
--- a/contrib/kube-prometheus/manifests/0prometheus-operator-deployment.yaml
+++ b/contrib/kube-prometheus/manifests/0prometheus-operator-deployment.yaml
@@ -20,8 +20,10 @@ spec:
         - --kubelet-service=kube-system/kubelet
         - --logtostderr=true
         - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
-        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.26.0
-        image: quay.io/coreos/prometheus-operator:v0.26.0
+        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:0714fe4
+        #image: quay.io/coreos/prometheus-operator:v0.26.0
+        image: quay.io/coreos/prometheus-operator:0714fe4
+        imagePullPolicy: Never
         name: prometheus-operator
         ports:
         - containerPort: 8080
diff --git a/contrib/kube-prometheus/manifests/prometheus-prometheus.yaml b/contrib/kube-prometheus/manifests/prometheus-prometheus.yaml
index c16914b0..1ff97a25 100644
--- a/contrib/kube-prometheus/manifests/prometheus-prometheus.yaml
+++ b/contrib/kube-prometheus/manifests/prometheus-prometheus.yaml
@@ -18,6 +18,7 @@ spec:
   resources:
     requests:
       memory: 400Mi
+  retention: 10m
   ruleSelector:
     matchLabels:
       prometheus: k8s
@@ -30,3 +31,13 @@ spec:
   serviceMonitorNamespaceSelector: {}
   serviceMonitorSelector: {}
   version: v2.5.0
+  storage:
+      volumeClaimTemplate:
+        metadata:
+          name: prometheusstorage
+        spec:
+          storageClassName: "rook-ceph-block"
+          accessModes: [ "ReadWriteOnce" ]
+          resources:
+            requests:
+              storage: 2Gi
diff --git a/contrib/kube-prometheus/manifests/prometheus-service.yaml b/contrib/kube-prometheus/manifests/prometheus-service.yaml
index 85b007f8..e954b3af 100644
--- a/contrib/kube-prometheus/manifests/prometheus-service.yaml
+++ b/contrib/kube-prometheus/manifests/prometheus-service.yaml
@@ -13,3 +13,4 @@ spec:
   selector:
     app: prometheus
     prometheus: k8s
+  type: NodePort
diff --git a/example/rbac/prometheus/prometheus.yaml b/example/rbac/prometheus/prometheus.yaml
index 2adaef13..c3d870af 100644
--- a/example/rbac/prometheus/prometheus.yaml
+++ b/example/rbac/prometheus/prometheus.yaml
@@ -10,6 +10,10 @@ spec:
   serviceMonitorSelector:
     matchLabels:
       team: frontend
+  ruleSelector:
+    matchLabels:
+      role: prometheus-rulefiles
+      prometheus: k8s
   alerting:
     alertmanagers:
     - namespace: default
  • Once the setup is deployed following PODs are running in monitoring namespace. Notice the count of containers (3) present in prometheus deployment.
# cd ~/Documents/REPOS/prometheus-operator/contrib/kube-prometheus/manifests
# kubectl get pod -n monitoring
NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   0          13m
alertmanager-main-1                    2/2       Running   0          13m
alertmanager-main-2                    2/2       Running   0          13m
grafana-df9bfd765-66f26                1/1       Running   0          13m
kube-state-metrics-687d566cfc-lg5bw    4/4       Running   0          13m
node-exporter-jq2zx                    2/2       Running   0          13m
prometheus-adapter-69466cc54b-zxq6h    1/1       Running   0          13m
prometheus-k8s-0                       3/3       Running   0          1m
prometheus-k8s-1                       3/3       Running   0          1m
prometheus-operator-5c5bbb576b-s8rmw   1/1       Running   0          13m
thanos-compactor-0                     1/1       Running   0          6m
thanos-store-0                         1/1       Running   0          6m

Introduce thanos-sidecar into existing deployment

  • Create the configuration file which will provide swift related information.
# cat /Documents/thanos-objstore-config.yaml
type: SWIFT
config:
  auth_url: "http://10.121.xx.xx:5000/v3"
  username: admin
  password: 4019b525ee414a4c
  tenant_name: admin
  container_name: test5
  domain_name: Default
  • Create secrett using this file.
# kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=/Documents/thanos-objstore-config.yaml
  • Use the created secret in the manifest which we are gonna use to inject the thanos-sidecar into existing prometheus deployment.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: thanos-peers
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: cluster
    port: 10900
    targetPort: cluster
  selector:
    # Useful endpoint for gathering all thanos components for common gossip cluster.
    thanos-peer: "true"
---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  namespace: monitoring
  labels:
    prometheus: k8s
    app: prometheus
spec:
  replicas: 2
# Please edit the object below. Lines beginning with a '#' will be ignored,
  serviceAccountName: prometheus-k8s
  podMetadata:
    labels:
      thanos-peer: 'true'
  serviceMonitorSelector: {}
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager
      port: web
  ruleSelector:
    matchLabels:
      role: prometheus-rulefiles
      prometheus: k8s
  thanos:
    peers: thanos-peers.default.svc:10900
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objstore-config
EOF
  • If you look closely, after few mins, number of containers preset in prometheus POD has increased to 4 from 3. newly introduced container is thanos-sidecar.
kubectl get pod -n monitoring
NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   2          15h
alertmanager-main-1                    2/2       Running   2          15h
alertmanager-main-2                    2/2       Running   2          15h
grafana-df9bfd765-66f26                1/1       Running   1          15h
kube-state-metrics-687d566cfc-lg5bw    4/4       Running   5          15h
node-exporter-jq2zx                    2/2       Running   2          15h
prometheus-adapter-69466cc54b-zxq6h    1/1       Running   2          15h
prometheus-k8s-0                       4/4       Running   4          15h
prometheus-k8s-1                       4/4       Running   4          15h
prometheus-operator-5c5bbb576b-s8rmw   1/1       Running   2          15h
thanos-compactor-0                     1/1       Running   1          15h
thanos-store-0                         1/1       Running   1          15h
  • Similarly we can start thanos-query, thanos-store and thanos-compactor. To know more about these components refer the old post related to thanos.
cat <<EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  labels:
    app: thanos-query
    thanos-peer: "true"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-query
      thanos-peer: "true"
  template:
    metadata:
      labels:
        app: thanos-query
        thanos-peer: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10902"
    spec:
      containers:
      - name: thanos-query
        # Always use explicit image tags (release or master-<date>-sha) instead of ambigous `latest` or `master`.
        image: improbable/thanos:v0.2.1
        args:
        - "query"
        - "--log.level=debug"
        - "--cluster.peers=thanos-peers.default.svc.cluster.local:10900"
        - "--query.replica-label=replica"
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        - name: cluster
          containerPort: 10900
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: http
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: thanos-query
  name: thanos-query
spec:
  externalTrafficPolicy: Cluster
  ports:
  - port: 9090
    protocol: TCP
    targetPort: http
    name: http-query
  selector:
    app: thanos-query
  sessionAffinity: None
  type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
spec:
  serviceName: "thanos-store"
  replicas: 1
  selector:
    matchLabels:
      app: thanos
      thanos-peer: "true"
  template:
    metadata:
      labels:
        app: thanos
        thanos-peer: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10902"
    spec:
      containers:
      - name: thanos-store
        # Always use explicit image tags (release or master-<date>-sha) instead of ambigous `latest` or `master`.
        image: improbable/thanos:v0.2.1
        args:
        - "store"
        - "--log.level=debug"
        - "--data-dir=/var/thanos/store"
        - "--cluster.peers=thanos-peers.default.svc.cluster.local:10900"
        - "--objstore.config-file=/creds/swift_access_information/swift_access_information.yaml"
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        - name: cluster
          containerPort: 10900
        volumeMounts:
        - name: swiftconfig
          mountPath: /creds
        - name: data
          mountPath: /var/thanos/store
      volumes:
      - name: data
        emptyDir: {}
        # configmap is not recommended for production use. It's used only for example.
      - name: swiftconfig
        secret:
          secretName: thanos-objstore-config
          items:
          - key: thanos.yaml
            path: swift_access_information/swift_access_information.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
  namespace: monitoring
spec:
  serviceName: "thanos-compactor"
  replicas: 1
  selector:
    matchLabels:
      app: thanos-compactor
  template:
    metadata:
      labels:
        app: thanos-compactor
    spec:
      containers:
      - name: thanos-compactor
        # Always use explicit image tags (release or master-<date>-sha) instead of ambigous `latest` or `master`.
        image: improbable/thanos:v0.2.1
        args:
        - "compact"
        - "--log.level=debug"
        - "--data-dir=/tmp/thanos-compact"
        - "--objstore.config-file=/creds/swift_access_information/swift_access_information.yaml"
        - "--sync-delay=5m"
        - "--retention.resolution-raw=10m"
        - "--retention.resolution-5m=1h"
        - "--retention.resolution-1h=2h"
        - "-w"
        volumeMounts:
        - name: swiftconfig
          mountPath: /creds
        - name: data
          mountPath: /tmp/thanos-compact
      volumes:
      - name: data
        emptyDir: {}
      - name: swiftconfig
        secret:
          secretName: thanos-objstore-config
          items:
          - key: thanos.yaml
            path: swift_access_information/swift_access_information.yaml
EOF
  • After sometime we can see the following messages in the thanos-sidecare logs indicating that blocks are shipped to object storage.
kubectl -n monitoring logs prometheus-k8s-0 -c thanos-sidecar  | grep 'shipper.go'
level=warn ts=2019-01-17T06:49:43.528640532Z caller=shipper.go:147 msg="reading meta file failed, removing it" err="unexpected end of JSON input"
level=info ts=2019-01-17T06:49:44.085718976Z caller=shipper.go:201 msg="upload new block" id=01D1D7APWSFJ53Y3BJXT6E57YV
level=info ts=2019-01-17T06:50:13.758337658Z caller=shipper.go:201 msg="upload new block" id=01D1D7APWSFJ53Y3BJXT6E57YV
level=info ts=2019-01-17T06:50:15.095102932Z caller=shipper.go:201 msg="upload new block" id=01D1D7AQPJ594QDFEYRY3W4EYA
level=info ts=2019-01-17T07:00:14.094347108Z caller=shipper.go:201 msg="upload new block" id=01D1D9R173842S2P0YBW9CN4A2

thanos-compactor is doing it’s own job.

kubectl logs thanos-compactor-0 -n monitoring | tail -4
level=info ts=2019-01-17T08:23:18.563273933Z caller=compact.go:220 msg="start second pass of downsampling"
level=info ts=2019-01-17T08:23:19.016306884Z caller=compact.go:225 msg="downsampling iterations done"
level=info ts=2019-01-17T08:23:19.016342222Z caller=retention.go:17 msg="start optional retention"
level=info ts=2019-01-17T08:23:20.367448379Z caller=retention.go:46 msg="optional retention apply done"

We can confirm that objects are shipped into swift storage successfully by checking the stat of swift container.

[root@packstack1 ~(keystone_admin)]# swift stat test5
               Account: AUTH_45a6706c831c42d5bf2da928573382b1
             Container: test5
               Objects: 6
                 Bytes: 2340
              Read ACL:
             Write ACL:
               Sync To:
              Sync Key:
         Accept-Ranges: bytes
      X-Storage-Policy: Policy-0
         Last-Modified: Wed, 16 Jan 2019 15:06:43 GMT
           X-Timestamp: 1547651202.85613
            X-Trans-Id: tx23b91a63eb124585a4924-005c40292a
          Content-Type: application/json; charset=utf-8
X-Openstack-Request-Id: tx23b91a63eb124585a4924-005c40292a

[1] https://github.com/coreos/prometheus-operator/blob/master/example/thanos/prometheus.yaml [2] https://github.com/coreos/prometheus-operator/pull/2264#issuecomment-453101463