Create monitoring cluster with Thanos, Loki and Istio.

How to create monitoring cluster with Thanos and Loki and send all metrics, logs from other clusters to one place. All encrypted and authenticate via mTLS by Istio Service Mesh.

Piotr Gawędakubernetsthanoslokiistiografanametricslogs

Create monitoring cluster with Thanos, Loki and Istio.

In this article, I will present a solution how to create a monitoring kubernetes(K8S) cluster for processing metrics and logs from multiple k8s clusters in one place.

I will use Istio as Service mash for encrypt communication and in Istio Ingress Gateway authorize requests sent from clients by issued mutual TLS by custom Certificate Authority (CA)

When you manage a few clusters, you need one place to check metrics and logs.

Your microservices generate metrics and prometheus pulls that data and save in local storage. This action is repeated on each cluster.

To preview those metrics you probably use grafana dashboards, but how to set prometheus from multiple clusters to save metrics in one place, and how to install logs collector for all nodes and save in one monitoring server?

Answer is simple, use Thanos for metrics, and Loki for logs.

Below I will present how to set up 3 clusters:

  1. Monitoring cluster with Thanos and Loki - collect metrics and logs
  2. Client cluster 0 - send metrics and logs to Monitoring cluster
  3. Client cluster 1 - send metrics and logs to Monitoring cluster

Diagram of our deployment

Cluster1
Cluster0
Monitoring
Cluster
Send log with mTls
Send logs with mTls
Send metrics by mTls
Send metrics by mTls
Query live metrics
Query live metrics
Save logs
Save metrics in S3
Dump logs
Query metrics, downsample etc
get live metrics
get live metrics
Promtail - K8s logs
Prom with Thanos Sidecar
Istio Gateway
Promtail - K8s logs
Prom with Thanos Sidecar
Istio Gateway
IstioGateway
Loki
Minio
Thanos

In above diagram you see 3 clusters. Monitoring cluster has Ingress Istio Gateway. What's i ingress? Ingress it's a gateway for all requests. In Istio Gateway request are decrypted, authenticate and route.

Clusters 0 and 1 have Promtail daemon and Prometheus pod with Thanos Sidecar. Promtail is installed on everyone nodes in Kubernetes cluster and send logs with mTLS to Loki Rest distributor on Monitoring Cluster through Istio Gateway. Istio Gateway check that TLS certificate is issue by local CA if yes pass request to Loki, otherwise drop connection.

On the other hand Thanos Sidecar installed in prometheus pod in Cluster 0 and 1, push metrics to Minio through Istio Ingress Gateway on Monitoring Cluster. Istio Gateway verify that request from Thanos Sidecar has issued TLS by Root CA.

Thanos on Monitoring Cluster query historical metrics from Minio and live metrics direct from Thanos Sidecar on Cluster-0 and Cluster-1. Requests are encrypted by mTLS pass through again Istio Gateway.

Above stack will be installed on local machine thanks to Kind (Kubernetes in docker). If you have a laptop with low memory please install it in a cloud instance. 16GB of memory is required.

All clusters will be installed in your machine. An environment will be emulated by docker.

Virtual container
Virtual container
Virtual container
Your notebook / Cloud instance
Docker
Monitoring Machine
Cluster 0
Cluster 1

Requirements software

Please check that you have installed below software. Below lists with versions which we tested this deployment

All software are built in golang, so you can download simple binary file and start this tutorial.

1. System limits

For linux OS (Ubuntu or Docker) you have to increase limits. Without below lines you cannot install 3 virtual clusters in your machine.

#https://kind.sigs.k8s.io/docs/user/known-issues/
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

2. Clone our blog github repository

Download prepared helm charts and change your console path

git clone https://github.com/gawsoftpl/gawsoft.com
cd blog/thanos-loki-istio-cluster/

2. Create clusters

If you create enviroment not in localhost but in cloud with load balancer. Skip this paragraph and got to Step 3 Prepare certificates for mTLS

2.1 Create clusters

Below commands will create two client clusters and main monitoring cluster. All in one machine thanks to docker container and kind(K8s in docker)

2.1.1 Create first client (cluster-0) This cluster will generate metrics, and push to monitoring cluster.

kind create cluster --config client/kind/kind-cluster.yaml --name=cluster-0

After 60 seconds you should have ready kubernetes cluster in docker container. For test execute below command

kubectl get pod -A --context kind-cluster-0
#Output:

#NAMESPACE            NAME                                              READY   STATUS    RESTARTS   AGE
#kube-system          coredns-565d847f94-k966x                          1/1     Running   0          4m58s
#kube-system          coredns-565d847f94-ml9b2                          1/1     Running   0          4m58s
#kube-system          etcd-cluster-1-control-plane                      1/1     Running   0          5m10s
#kube-system          kindnet-4jbnt                                     1/1     Running   0          4m53s
#kube-system          kindnet-92skm                                     1/1     Running   0          4m58s
#kube-system          kube-apiserver-cluster-1-control-plane            1/1     Running   0          5m10s
#kube-system          kube-controller-manager-cluster-1-control-plane   1/1     Running   0          5m12s
#kube-system          kube-proxy-2gtjd                                  1/1     Running   0          4m53s
#kube-system          kube-proxy-tn2xv                                  1/1     Running   0          4m58s
#kube-system          kube-scheduler-cluster-1-control-plane            1/1     Running   0          5m10s
#local-path-storage   local-path-provisioner-684f458cdd-d7wdn           1/1     Running   0          4m58s

2.1.2 Create second client (cluster-1) and monitoring cluster

And now we must repeat that command but with other parameters to create cluster-1 and monitoring cluster

# Create cluster-1
kind create cluster --config client/kind/kind-cluster.yaml --name=cluster-1

# Create monitoring cluster
kind create cluster --config monitoring-cluster/kind/kind-cluster.yaml

Check that all clusters are created:

kind get clusters
# Output:
# kind get clusters
# cluster-0
# cluster-1
# monitoring

Check that all pods are running, if not please wait 5-10 minutes. It depends on your internet connection speed to download pod container images.

kubectl get pod -A --context kind-cluster-0
kubectl get pod -A --context kind-cluster-1
kubectl get pod -A --context kind-monitoring

You can also run multiple view of 3 clusters in one window using tool like Terminator and k9s

k9s --context=kind-monitoring
k9s --context=cluster-0
k9s --context=cluster-1
image

2.2 Setup load balancer for ingress with MetalLB

Very important All below commands have to be run in the same terminal.

Because for communication you will use Istio Ingress with Load Balancer you should send requests to ingress endpoint, not to NodePort.

More info about difference between NodePort, Load Balancer you find here

For set up ingress load balancer in our local network you have to check network subnet for kind clusters.

KIND_SUBNET=`docker network inspect -f '{{range .IPAM.Config}}{{.Subnet}}{{end}}' kind`
echo $KIND_SUBNET

On my environment I have subnet: 172.19.0.0/16, but in yours it can be different so please run below commands to prepare endpoint for our clusters:

CLUSTER_0_IP=`bin/generate-ip-from-cidr.sh $KIND_SUBNET 135`
CLUSTER_1_IP=`bin/generate-ip-from-cidr.sh $KIND_SUBNET 136`
MONITORING_CLUSTER_IP=`bin/generate-ip-from-cidr.sh $KIND_SUBNET 137`
CLUSTER_0_ENDPOINT="${CLUSTER_0_IP}.nip.io"
CLUSTER_1_ENDPOINT="${CLUSTER_1_IP}.nip.io"
MONITORING_ENDPOINT="${MONITORING_CLUSTER_IP}.nip.io"

echo "Cluster-0 endpoint: $CLUSTER_0_ENDPOINT"
echo "Cluster-1 endpoint: $CLUSTER_1_ENDPOINT"
echo "Monitoring cluster endpoint: $MONITORING_ENDPOINT"

# Output:
# Cluster-0 endpoint: 172.19.0.135.nip.io
# Cluster-1 endpoint: 172.19.0.136.nip.io
# Monitoring cluster endpoint: 172.19.0.137.nip.io

What is nip.io? When clusters will start send request first will check dns for resolve ip.

Because we want to simplify things we will not set up dns server, for that I use dead simple wildcard DNS for any IP Address.

nip.io resolve host to specific ip: example:

dig prometheus.10.10.10.10.nip.io # -> resolve to 10.10.10.10
dig thanos.10.10.10.10.nip.io # -> resolve to 10.10.10.10
dig thanos.20.20.20.20.nip.io # -> resolve to 20.20.20.20

Install Load Balancer (MetalLB) on each clusters. Run this command in same terminal. ENV CLUSTER_0_IP, CLUSTER_1_IP and MONITORING_CLUSTER_IP will be applied

Cluster-0

kubectl --context kind-cluster-0 apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.5/config/manifests/metallb-native.yaml
kubectl --context kind-cluster-0 wait --for=condition=ready pod -l app=metallb -n metallb-system

cat <<EOF | kubectl apply --context kind-cluster-0  -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: example
  namespace: metallb-system
spec:
  addresses:
  - ${CLUSTER_0_IP}-${CLUSTER_0_IP}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: empty
  namespace: metallb-system
EOF

Cluster-1

kubectl --context kind-cluster-1 apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.5/config/manifests/metallb-native.yaml
kubectl --context kind-cluster-1 wait --for=condition=ready pod -l app=metallb -n metallb-system

cat <<EOF | kubectl apply --context kind-cluster-1  -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: example
  namespace: metallb-system
spec:
  addresses:
  - ${CLUSTER_1_IP}-${CLUSTER_1_IP}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: empty
  namespace: metallb-system
EOF

Monitoring-cluster

kubectl --context kind-monitoring apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.5/config/manifests/metallb-native.yaml
kubectl --context kind-monitoring wait --for=condition=ready pod -l app=metallb -n metallb-system

cat <<EOF | kubectl apply --context kind-monitoring  -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: example
  namespace: metallb-system
spec:
  addresses:
  - ${MONITORING_CLUSTER_IP}-${MONITORING_CLUSTER_IP}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: empty
  namespace: metallb-system
EOF

Great now you have ready 3 clusters with installed load balancer ready to serve requests via Ingress.

After this step our pods looks like this. All pods are running, Ip in your clusters can be different: image

3. Prepare certificates for mTLS

Create directory for certificates

mkdir certs

3.1 Generate Certificate Authority (CA)

If you don't know what's a CA please read first 3 paragraphs from previous post: https://gawsoft.com/blog/mtls-how-to-setup-server-and-client-in-nodejs.html or watch youtube video:

https://www.youtube.com/watch?v=x_I6Qc35PuQ

For production you should use Intermediate CA and store Root CA in offline machine or use some PKI infrastructure for sign certificates. For this example we focus on how to set up kubernetes cluster with istio, thanos and loki to collect and authorize requests.

In real-life deployment, you have to install cert-manger and Hashicorp Vault, but that would be too much to start with for begin.

# Generate CA certificate, and CA key
openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -subj '/O=example Inc./CN=example.com' -keyout certs/ca.key -out certs/ca.crt

Above CA keys we will use to issue tls certificate in cluster-0 and cluster-1. Monitoring server thanks to Istio will check that certificate is issued by our Root CA certificate, if not request will be rejected.

3.2 Create tls certificate and key for Cluster-0

Generate private key for Cluster-0 and prepare certificate sign request file.

cat<<EOF > certs/cluster-0-v3.ext
subjectKeyIdentifier   = hash
authorityKeyIdentifier = keyid:always,issuer:always
basicConstraints       = CA:TRUE
keyUsage               = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign
issuerAltName          = issuer:copy
subjectAltName         = DNS:prom-cluster.${CLUSTER_0_ENDPOINT}, DNS:*.nip.io
EOF
openssl req -out certs/cluster-0.csr -newkey rsa:2048 -nodes -keyout certs/cluster-0.key -subj "/CN=prom-cluster.${CLUSTER_0_ENDPOINT}/O=example organization"

Create certificate and issue with our CA root private key

openssl x509 -req -sha256 -days 365 -CA certs/ca.crt -CAkey certs/ca.key -set_serial 0 -in certs/cluster-0.csr -out certs/cluster-0.crt -extfile certs/cluster-0-v3.ext

Create certificate for promtail (service for send logs to monitoring cluster)

openssl req -out certs/cluster-0-promtail.csr -newkey rsa:2048 -nodes -keyout certs/cluster-0-promtail.key -subj "/CN=promtail-cluster-0/O=example organization"
openssl x509 -req -sha256 -days 365 -CA certs/ca.crt -CAkey certs/ca.key -set_serial 0 -in certs/cluster-0-promtail.csr -out certs/cluster-0-promtail.crt

Now our certs directory looks like that:

certs
├── ca.crt
├── ca.key
├── cluster-0.crt
├── cluster-0.csr
├── cluster-0.key
├── cluster-0-promtail.crt
├── cluster-0-promtail.csr
├── cluster-0-promtail.key
└── cluster-0-v3.ext

Validate that our cert is issue with CA

openssl verify -CAfile certs/ca.crt certs/cluster-0.crt
openssl verify -CAfile certs/ca.crt certs/cluster-0-promtail.crt
# Output:
# certs/cluster-0.crt: OK
# certs/cluster-0-promtail.crt: OK

3.2 Create tls certificate and key for Cluster-1 and Monitoring-Cluster

We have to repeat above step but for different CN (Common name)

Generate private key and certificate for Cluster-1

cat<<EOF > certs/cluster-1-v3.ext
subjectAltName         = DNS:prom-cluster.${CLUSTER_1_ENDPOINT}, DNS:*.nip.io
EOF
openssl req -out certs/cluster-1.csr -newkey rsa:2048 -nodes -keyout certs/cluster-1.key -subj "/CN=prom-cluster.${CLUSTER_1_ENDPOINT}/O=example organization"
openssl x509 -req -sha256 -days 365 -CA certs/ca.crt -CAkey certs/ca.key -set_serial 0 -in certs/cluster-1.csr -out certs/cluster-1.crt -extfile certs/cluster-1-v3.ext

Create certificate for promtail

openssl req -out certs/cluster-1-promtail.csr -newkey rsa:2048 -nodes -keyout certs/cluster-1-promtail.key -subj "/CN=promtail-cluster-1/O=example organization"
openssl x509 -req -sha256 -days 365 -CA certs/ca.crt -CAkey certs/ca.key -set_serial 0 -in certs/cluster-1-promtail.csr -out certs/cluster-1-promtail.crt

Generate private key and certificate for Monitoring-cluster

cat<<EOF > certs/cluster-monitoring-v3.ext
subjectAltName         = DNS:*.${MONITORING_ENDPOINT}, DNS:*.nip.io
EOF
openssl req -out certs/cluster-monitoring.csr -newkey rsa:2048 -nodes -keyout certs/cluster-monitoring.key -subj "/CN=*.${MONITORING_ENDPOINT}/O=example organization"
openssl x509 -req -sha256 -days 365 -CA certs/ca.crt -CAkey certs/ca.key -set_serial 0 -in certs/cluster-monitoring.csr -out certs/cluster-monitoring.crt -extfile certs/cluster-monitoring-v3.ext

Validate that our certs that are issued with CA

openssl verify -CAfile certs/ca.crt certs/cluster-1.crt
openssl verify -CAfile certs/ca.crt certs/cluster-1-promtail.crt
openssl verify -CAfile certs/ca.crt certs/cluster-monitoring.crt
# Output:
# certs/cluster-1.crt: OK
# certs/cluster-1-promtail.crt: OK
# certs/cluster-monitoring.crt: OK

4. Install helm charts

What we will build? Below you can see detailed schema of all components which will be used to build Central Monitoring Cluster.

Cluster-1
Cluster-0
Monitoring Main Cluster
Save in S3
Save in S3
Parse S3
Get Live metrics
Parse S3
Get Live metrics
Cache metrics
Query metrics
Old metrics
Query metrics
Old metrics
Compress old metrics
Compress old metrics
Save data from Thanos sidecar in bucket
Save logs from clusters
Save logs
Read logs
Share with Thanos
Get live metrics
Push logs
Share with Thanos
Get live metrics
Push logs
Thanos Sidecar
Prometheus 1
Istio Ingress
Promtail
Thanos Sidecar
Prometheus 0
Istio Ingress
Promtail
Thanos Query Frontend
Grafana
Thanos Query
Redis
Thanos Query Cluster 0
Thanos Store Cluster 0
Thanos Query Cluster 1
Thanos Store Cluster 1
Object store
Thanos Compactor 0
Thanos Compactor 1
Istio Ingres
Loki Distributor
IM.
Loki Ingester
LD.
Loki Querier

Because for set up our clusters we will use multiple helm charts to make work easier I will use helmfile. This free software can declarative deploy Kubernetes manifests, Kustomize configs, and Charts as Helm releases in one shot.

What's a helmfile in on word? Install multiple helm charts from one file.

Special for this article I prepared helmile.yaml for each cluster.

client/helmfile/helmfile.yaml # For Cluster-0 and Cluster-1
monitoring-cluster/helmfile/helmfile.yaml

Directory "client" has all files that are required to start send metrics and logs from your client cluster. You can of course customize parameters.

All files are available in our github repo.

For add third cluster you should, but this is not required now.

  • create new bucket in minio on monitoring cluster
  • generate new tls cert and key
  • issue new cert with Root CA
  • install client/helmfile/helmfile.yaml

4.1 Installs helm charts in Cluster-0

Prepare secrets with:

  • credentials to object storage (save metrics in example S3)
  • mTLS certificate for authenticate (connection with loki ingress are encrypted, additional with mtls requests are authenticated)

Create namespaces:

kubectl --context kind-cluster-0 create ns monitoring || true
kubectl --context kind-cluster-0 create ns istio-system || true

Create secret for object bucket storage. This object bucket is installed on monitoring cluster

# Create secrets for you object bucket, where metrics will be stored
# This article is only tutorial so we don't encrypt traffic for minio
cat <<EOF | kubectl apply --context kind-cluster-0 -n monitoring -f -
apiVersion: v1
kind: Secret
metadata:
  name: cluster-objstore-secret
stringData:
  objstore.yml: |
    type: s3
    config:
      bucket: monitoring-cluster-0
      endpoint: minio.${MONITORING_ENDPOINT}:80
      access_key: admin
      secret_key: adminadmin
      insecure: true
EOF

Create secret with tls certificate for ingress thanos sidecar. That certificate is used to encrypt traffic and authorize by istio gateway when monitoring cluster will connect with Thanos sidecar.

# Create secret with certificate for sidecar to authorize request in monitoring cluster
kubectl create --context kind-cluster-0 -n istio-system \
secret generic thanos-sidecar-cluster-tls \
--from-file=tls.key=certs/cluster-0.key \
--from-file=tls.crt=certs/cluster-0.crt \
--from-file=ca.crt=certs/ca.crt

Create secret with certificate for promtail. That certificate is used to encrypt traffic and authorize in monitoring cluster.

# Create secret with certificate for promtail to authorize request in monitoring cluster
kubectl create --context kind-cluster-0 secret generic \
promtail-cluster-tls -n monitoring \
--from-file=tls-key=certs/cluster-0-promtail.key \
--from-file=tls-cert=certs/cluster-0-promtail.crt \
--from-file=ca-cert=certs/ca.crt

Now time for main process - install helm files, you have to set up required environment variables:

  • LOKI_ENDPOINT - address where to send logs from promtail
  • TENANT_ID - id of your client

During this process, pods will be restarted couple times.

kubectx kind-cluster-0
LOKI_ENDPOINT="https://loki.${MONITORING_ENDPOINT}/loki/api/v1/push" TENANT_ID="cluster-0" helmfile \
--kube-context kind-cluster-0 \
-f client/helmfile/helmfile.yaml sync

Wait for all pods be ready and verify installation:

kubectl get pod --context kind-cluster-0 -n monitoring
#Output:
#NAME                                                   READY   STATUS    RESTARTS      AGE
#prometheus-kube-prometheus-operator-768c4bcb68-67h8m   2/2     Running   1 (53s ago)   55s
#prometheus-kube-state-metrics-d4d674cb7-zxzg9          2/2     Running   2 (51s ago)   55s
#prometheus-prometheus-kube-prometheus-prometheus-0     4/4     Running   0             52s
#prometheus-prometheus-node-exporter-dtbxk              1/1     Running   0             54s
#prometheus-prometheus-node-exporter-kzmd8              1/1     Running   0             50s
#promtail-nlc7b                                         2/2     Running   0             24s
#promtail-rwp8w                                         2/2     Running   0             23s

Final step is create Istio Gateway and VirtualService. In our test project we will authorize requests by mTLS. Client have to send tls certificate issued by our CA.

cat <<EOF | kubectl apply --context kind-cluster-0 -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: nip-io
spec:
  selector:
    istio: gateway # use istio default controller
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*.nip.io"
    - port:
        number: 443
        name: https
        protocol: HTTPS
      hosts:
        - "*.nip.io"
      tls:
        mode: MUTUAL # Setup mTLS for ingress gateway
        credentialName: thanos-sidecar-cluster-tls
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: thanos-sidecar
spec:
  gateways:
    - nip-io
  hosts:
    - "prom-cluster.${CLUSTER_0_ENDPOINT}"
  http:
    - name: "prometheus-sidecar-http"
      match:
        - uri:
            prefix: "/metrics"
      route:
        - destination:
            host: prometheus-kube-prometheus-thanos-discovery.monitoring.svc.cluster.local
            port:
              number: 10902

    - name: "prometheus-sidecar-grpc"
      match:
        - uri:
            prefix: "/"
      route:
        - destination:
            host: prometheus-kube-prometheus-thanos-discovery.monitoring.svc.cluster.local
            port:
              number: 10901
EOF

Check that thanos sidecar return code 200. This endpoint is without encryption and authenticate (Only for test, do not run this service in production)

curl -H "Host: prom-cluster.${CLUSTER_0_ENDPOINT}" \
    -s -o /dev/null -w "Return code: %{http_code}" \
    --resolve "prom-cluster.${CLUSTER_0_ENDPOINT}:80:${CLUSTER_0_IP}" \
    "http://prom-cluster.${CLUSTER_0_ENDPOINT}/metrics"
# Output:
# Return code: 200

Now time is for test grpc port with mTLS and authenticate. Should return http code 503

curl -H "Host: prom-cluster.${CLUSTER_0_ENDPOINT}" \
    -s -o /dev/null -w "Return code: %{http_code}" \
    --cacert certs/ca.crt --cert certs/cluster-monitoring.crt --key certs/cluster-monitoring.key \
    --resolve "prom-cluster.${CLUSTER_0_ENDPOINT}:443:${CLUSTER_0_IP}" \
    "https://prom-cluster.${CLUSTER_0_ENDPOINT}/"
# Output:
# Return code: 503

If you have problem with installation remove all helm charts and start from beginning. For destroy helmfile use command:

#kubectx kind-cluster-0
#LOKI_ENDPOINT="https://loki.${MONITORING_ENDPOINT}/loki/api/v1/push" TENANT_ID="cluster-0" helmfile \
#--kube-context kind-cluster-0 \
#-f client/helmfile/helmfile.yaml destroy

4.2 Install helm charts in Cluster-1

We have to run same operations as in previous step but with different values

kubectl --context kind-cluster-1 create ns monitoring || true
kubectl --context kind-cluster-1 create ns istio-system || true
cat <<EOF | kubectl apply --context kind-cluster-1 -n monitoring -f -
apiVersion: v1
kind: Secret
metadata:
  name: cluster-objstore-secret
stringData:
  objstore.yml: |
    type: s3
    config:
      bucket: monitoring-cluster-1
      endpoint: minio.${MONITORING_ENDPOINT}:80
      access_key: admin
      secret_key: adminadmin
      insecure: true
EOF
# Create secret with certificate for sidecar to authorize request in monitoring cluster
kubectl create --context kind-cluster-1 -n istio-system \
secret generic thanos-sidecar-cluster-tls \
--from-file=tls.key=certs/cluster-1.key \
--from-file=tls.crt=certs/cluster-1.crt \
--from-file=ca.crt=certs/ca.crt
# Create secret with certificate for promtail to authorize request in monitoring cluster
kubectl create --context kind-cluster-1 secret generic \
promtail-cluster-tls -n monitoring \
--from-file=tls-key=certs/cluster-1-promtail.key \
--from-file=tls-cert=certs/cluster-1-promtail.crt \
--from-file=ca-cert=certs/ca.crt
kubectx kind-cluster-1
LOKI_ENDPOINT="https://loki.${MONITORING_ENDPOINT}/loki/api/v1/push" TENANT_ID="cluster-1" helmfile \
--kube-context kind-cluster-1 \
-f client/helmfile/helmfile.yaml sync
cat <<EOF | kubectl apply --context kind-cluster-1 -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: nip-io
spec:
  selector:
    istio: gateway # use istio default controller
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*.nip.io"
    - port:
        number: 443
        name: https
        protocol: HTTPS
      hosts:
        - "*.nip.io"
      tls:
        mode: MUTUAL # Setup mTLS for ingress gateway
        credentialName: thanos-sidecar-cluster-tls
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: thanos-sidecar
spec:
  gateways:
    - nip-io
  hosts:
    - "prom-cluster.${CLUSTER_1_ENDPOINT}"
  http:
    - name: "prometheus-sidecar-http"
      match:
        - uri:
            prefix: "/metrics"
      route:
        - destination:
            host: prometheus-kube-prometheus-thanos-discovery.monitoring.svc.cluster.local
            port:
              number: 10902

    - name: "prometheus-sidecar-grpc"
      match:
        - uri:
            prefix: "/"
      route:
        - destination:
            host: prometheus-kube-prometheus-thanos-discovery.monitoring.svc.cluster.local
            port:
              number: 10901
EOF

Now time is for test grpc port with mTLS and authenticate. Should return http code 503

curl -H "Host: prom-cluster.${CLUSTER_1_ENDPOINT}" \
    -s -o /dev/null -w "Return code: %{http_code}" \
    --cacert certs/ca.crt --cert certs/cluster-monitoring.crt --key certs/cluster-monitoring.key \
    --resolve "prom-cluster.${CLUSTER_1_ENDPOINT}:443:${CLUSTER_1_IP}" \
    "https://prom-cluster.${CLUSTER_1_ENDPOINT}/"

4.3 Set up helm charts for Monitoring Cluster (main cluster)

Time for final process. Now we will install Main cluster for collect all metrics and logs. All clusters will be connect with that environment.

4.3.1 Install charts

kubectl --context kind-monitoring create ns storage || true
kubectl --context kind-monitoring create ns istio-system || true
kubectl --context kind-monitoring create ns clients || true

4.3.2 Create secret for object storage

On cluster-0 and cluster-1 thanos sidecar push metrics to Object Storage.

So we have to create secrets with credentials for that S3.

Because Our S3 in hosted in same cluster we just set as endpoint local service endpoint.

cat <<EOF | kubectl apply --context kind-monitoring -n clients -f -
apiVersion: v1
kind: Secret
metadata:
  name: cluster-0-objstore-secret
stringData:
  objstore.yml: |
    type: s3
    config:
      bucket: monitoring-cluster-0
      endpoint: "minio.storage.svc.cluster.local:9000" # K8s local endpoint
      access_key: admin
      secret_key: adminadmin
      insecure: true
---
apiVersion: v1
kind: Secret
metadata:
  name: cluster-1-objstore-secret
stringData:
  objstore.yml: |
    type: s3
    config:
      bucket: monitoring-cluster-1
      endpoint: "minio.storage.svc.cluster.local:9000" # K8s local endpoint
      access_key: admin
      secret_key: adminadmin
      insecure: true
---
EOF

Now our Thanos Query from Cluster-Monitor will send requests to:

  • Cluster-0 Thanos Sidecar
  • Cluster-1 Thanos Sidecar

So we need to set tls for Thanos Query Cluster-0, and Thanos Query Cluster-1

# Create secret with certificate for sidecar to authorize request in monitoring cluster
kubectl create --context kind-monitoring -n clients \
secret generic thanos-query-cluster-0 \
--from-file=tls-key=certs/cluster-monitoring.key \
--from-file=tls-cert=certs/cluster-monitoring.crt \
--from-file=ca-cert=certs/ca.crt

kubectl create --context kind-monitoring -n clients \
secret generic thanos-query-cluster-1 \
--from-file=tls-key=certs/cluster-monitoring.key \
--from-file=tls-cert=certs/cluster-monitoring.crt \
--from-file=ca-cert=certs/ca.crt

In our cluster we host loki gateway, which is also encrypted and required authenticate. Please add secret for istio ingress.

# Create secret with certificate for sidecar to authorize request in monitoring cluster
kubectl create --context kind-monitoring -n istio-system \
secret generic wildcard-endpoint \
--from-file=tls.key=certs/cluster-monitoring.key \
--from-file=tls.crt=certs/cluster-monitoring.crt \
--from-file=ca.crt=certs/ca.crt

We added all required secrets now time for install helm charts: Very important for set THANOS_SIDECAR_ENDPOINT, Do not add https://, as endpoint you set GRPC endpoint.

4.3.3 Install Helm charts

kubectx kind-monitoring
export CLUSTER_0_THANOS_SIDECAR_ENDPOINT="prom-cluster.${CLUSTER_0_ENDPOINT}:443"
export CLUSTER_1_THANOS_SIDECAR_ENDPOINT="prom-cluster.${CLUSTER_1_ENDPOINT}:443"
helmfile \
--kube-context kind-monitoring \
-f monitoring-cluster/helmfile/helmfile.yaml sync

If you have problem with install remove deployment and start from beginning

#kubectx kind-monitoring
#export CLUSTER_0_THANOS_SIDECAR_ENDPOINT="prom-cluster.${CLUSTER_0_ENDPOINT}:443"
#export CLUSTER_1_THANOS_SIDECAR_ENDPOINT="prom-cluster.${CLUSTER_1_ENDPOINT}:443"
#helmfile \
#--kube-context kind-monitoring \
#-f monitoring-cluster/helmfile/helmfile.yaml destroy

4.3.4 Install Istio manifests

Create loki gateway vi Istio encrypted with mTLS and non encrypted minio endpoint. Minio will server on normal port 80, loki on 443.

cat <<EOF | kubectl apply --context kind-monitoring -n loki -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: loki
spec:
  selector:
    istio: gateway # use istio default controller
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      hosts:
        - "loki.${MONITORING_ENDPOINT}"
      tls:
        mode: MUTUAL # Setup mTLS for ingress gateway
        credentialName: wildcard-endpoint
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: loki
spec:
  gateways:
    - loki
  hosts:
    - "loki.${MONITORING_ENDPOINT}"
  http:
    - name: "loki"
      match:
        - uri:
            prefix: "/"
      route:
        - destination:
            host: loki-loki-distributed-distributor.loki.svc.cluster.local
            port:
              number: 3100
EOF
cat <<EOF | kubectl apply --context kind-monitoring -n thanos -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: minio
spec:
  selector:
    istio: gateway # use istio default controller
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "minio.${MONITORING_ENDPOINT}"
        - "dashboard-minio.${MONITORING_ENDPOINT}"
      tls:
        httpsRedirect: false
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: minio
spec:
  gateways:
    - minio
  hosts:
    - "minio.${MONITORING_ENDPOINT}"
  http:
    - name: "minio"
      match:
        - uri:
            prefix: "/"
      route:
        - destination:
            host: minio.storage.svc.cluster.local
            port:
              number: 9000
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: minio-dashboard
spec:
  gateways:
    - minio
  hosts:
    - "dashboard-minio.${MONITORING_ENDPOINT}"
  http:
    - name: "miniodashboard"
      match:
        - uri:
            prefix: "/"
      route:
        - destination:
            host: minio.storage.svc.cluster.local
            port:
              number: 9001

EOF

4.3.5 Verify installation

Test minio, should return error code: 403.

curl -H "Host: minio.${MONITORING_ENDPOINT}" \
    -s -o /dev/null -w "Return code: %{http_code}" \
    "http://minio.${MONITORING_ENDPOINT}/"

Test loki ingress, should return 405

curl -H "Host: loki.${MONITORING_ENDPOINT}" \
    -s -o /dev/null -w "Return code: %{http_code}" \
    --cacert certs/ca.crt --cert certs/cluster-0.crt --key certs/cluster-0.key \
    --resolve "loki.${MONITORING_ENDPOINT}:443:${MONITORING_CLUSTER_IP}" \
    "https://loki.${MONITORING_ENDPOINT}/loki/api/v1/push"

Final list of pods in Monitoring clusters looks like this:

kubectl --context kind-monitoring get pod -A
# Output:
#NAMESPACE            NAME                                                    READY   STATUS    RESTARTS      AGE
#clients              cluster-0-thanos-bucketweb-84f587fbf4-d7wzb             1/1     Running   0             13m
#clients              cluster-0-thanos-compactor-6766f65585-g52wm             1/1     Running   2 (13m ago)   13m
#clients              cluster-0-thanos-query-67fcff756c-2h8z8                 1/1     Running   0             13m
#clients              cluster-0-thanos-query-frontend-7dbfb868cf-9q6v9        1/1     Running   0             13m
#clients              cluster-0-thanos-storegateway-0                         1/1     Running   3 (13m ago)   13m
#clients              cluster-1-thanos-bucketweb-769ff88849-fkbvz             1/1     Running   0             13m
#clients              cluster-1-thanos-compactor-695b4cf855-4n494             1/1     Running   2 (13m ago)   13m
#clients              cluster-1-thanos-query-6f958cf8fd-2ss5f                 1/1     Running   0             13m
#clients              cluster-1-thanos-query-frontend-64fc7f44d8-w24bq        1/1     Running   0             13m
#clients              cluster-1-thanos-storegateway-0                         1/1     Running   3 (13m ago)   13m
#istio-system         istio-gateway-5ff4dc958-pbjg4                           1/1     Running   0             2m29s
#istio-system         istiod-6bdff644fd-5tvgd                                 1/1     Running   0             13m
#kube-system          coredns-565d847f94-np7cw                                1/1     Running   0             10h
#kube-system          coredns-565d847f94-ts42m                                1/1     Running   0             10h
#kube-system          etcd-monitoring-control-plane                           1/1     Running   0             10h
#kube-system          kindnet-m5gdt                                           1/1     Running   0             10h
#kube-system          kindnet-pznbn                                           1/1     Running   0             10h
#kube-system          kindnet-z5vr4                                           1/1     Running   0             10h
#kube-system          kube-apiserver-monitoring-control-plane                 1/1     Running   0             10h
#kube-system          kube-controller-manager-monitoring-control-plane        1/1     Running   0             10h
#kube-system          kube-proxy-jjq8m                                        1/1     Running   0             10h
#kube-system          kube-proxy-vbc27                                        1/1     Running   0             10h
#kube-system          kube-proxy-zp7h4                                        1/1     Running   0             10h
#kube-system          kube-scheduler-monitoring-control-plane                 1/1     Running   0             10h
#local-path-storage   local-path-provisioner-684f458cdd-kffwj                 1/1     Running   0             10h
#loki                 loki-loki-distributed-distributor-6f47cd4d84-qgt9c      1/1     Running   0             13m
#loki                 loki-loki-distributed-ingester-0                        1/1     Running   0             13m
#loki                 loki-loki-distributed-querier-0                         1/1     Running   0             13m
#loki                 loki-loki-distributed-query-frontend-58478d9844-twhb2   1/1     Running   0             13m
#metallb-system       controller-8689779bc5-mfhmg                             1/1     Running   0             9h
#metallb-system       speaker-7xfbb                                           1/1     Running   0             9h
#metallb-system       speaker-ch27k                                           1/1     Running   0             9h
#metallb-system       speaker-cmq8m                                           1/1     Running   0             9h
#monitoring           prometheus-grafana-5c969f576d-47zjc                     4/4     Running   0             8m34s
#storage              minio-57655695b9-zrsq8                                  2/2     Running   0             13m
#thanos               redis-master-0                                          2/2     Running   0             2m32s
#thanos               thanos-query-85f9bfbd84-9sgf5                           2/2     Running   0             2m37s
#thanos               thanos-query-frontend-59b79dd4d7-2kq2j                  2/2     Running   0             2m37s

Go to thanos query service to check that all graph has green color with label UP. First create port-forward:

kubectl port-forward svc/thanos-query -n thanos 9090:9090

Open browser and visit page: http://localhost:9090, click on Tab Stores You should see image like this. If everything is green installation of clusters is completed image

6. Execute tests

Run loki Canary on each clusters. This small service will generate logs.

6.1 Test loki

kubectl --context kind-cluster-0 apply -f tests/loki-canary.yml
kubectl --context kind-cluster-1 apply -f tests/loki-canary.yml

6.2 Test Thanos

I create bash script for thanosbench to auto generate prometheus block and upload to pod via S3:

# Generate samples metrics and load to prometheus in cluster-0
bin/generate-datas.sh kind-cluster-0 cluster-0
bin/load-datas.sh kind-cluster-0 cluster-0

# Generate samples metrics and load to prometheus in cluster-1
bin/generate-datas.sh kind-cluster-1 cluster-1
bin/load-datas.sh kind-cluster-1 cluster-1

6.3 Run Grafana

Now time to final process. Run grafana for query Logs and Metrics. First create port-forward for grafana service:

kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80

Go in browser to address http://localhost:3000 and login with default credentials:

user: admin
password: prom-operator

In helm files I prepare that grafana will have arleady installed data-source for Cluster-0, Cluster-1 and for overall Thanos View, also loki source will be set up, so you don't have to do anything.

image image

Watch video from grafana:

Final grafana view

Conclusion

Above I present how to set up one monitoring cluster for all clusters in our company. All traffic from Thanos Sidecar to Thanos query are encrypted by mutual TLS. For that I used Istio service mesh.

I did not encrypt traffic to minio, because in production you will use AWS, Google, Azure or Wasabi Object Storage. Additional secrets and certificates would be created problems for users who begin a journey with K8S. For test we only encrypt and authenticate traffic to thanos sidecar and loki distributor.

As you can see a lot of trouble are signing certificates and create new secrets. In real world deployment you should Use PKI (Public Key Infrastructure) with Cert Manager. For PKI you can use Hashicorp Vault or AWS Private Certificate Authority.

That infrastructure with Istio and Cert Manager would be generate a short-lived certificate and auto issue with Intermediate CA by Cert Manager. More about Intermediate mtl you can read in our blog: here

See example image:

Cluster-0
Cluster-1
Sign intermediate ca
Sign intermediate ca
Register CA
Sign cert
Inject certificates
Register CA
Sign cert
Inject certificates
Cert Manager
Intermediate CA
Istio
Pod
Cert Manager
Intermediate CA
Istio
Pod
ROOT CA

Now you can install Alert Manager and create rules, but for that you have to do by own.

Cleanup

Remove test clusters

kind delete cluster --name=monitoring
kind delete cluster --name=cluster-0
kind delete cluster --name=cluster-1