12.3 Troubleshooting the Metrics Server

If the Kubernetes Metric Server does not reach the READY 1/1 state, run the following commands:
kubectl describe pod <metrics-server-pod> -n kube-system
kubectl logs <metrics-server-pod> -n kube-system
If you see errors such as:
Readiness probe failed: HTTP probe failed with statuscode: 500
and:
E0907 13:07:50.937308       1 scraper.go:140] "Failed to scrape node" err="Get \"https://X.X.X.X:10250/metrics/resource\": x509: cannot validate certificate for 100.105.18.113 because it doesn't contain any IP SANs" node="worker-node1"
then you may need to install a valid cluster certificate for your Kubernetes cluster.
For testing purposes, you can resolve this issue by:
  1. Delete the Kubernetes Metrics Server by running the following command:
    kubectl delete -f $WORKDIR/kubernetes/hpa/components.yaml
  2. Edit the $WORKDIR/hpa/components.yaml and locate the args: section. Add kubelet-insecure-tls to the arguments. For example:
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --kubelet-insecure-tls
        - --metric-resolution=15s
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
     ...
  3. Deploy the Kubernetes Metrics Server using the command:
    kubectl apply -f components.yaml
  4. Run the following and make sure the READY status shows 1/1:
    kubectl get pods -n kube-system | grep metric
    The output should look similar to the following:
    metrics-server-d9694457-mf69d           1/1     Running   0             40s