Understanding Kubernetes Resource Utilization: Unveiling the Mystery of Memory and CPU Usage Beyond 100%

Table of contents

No heading

No headings in the article.

Ever had a moment of surprise while checking the metrics of your node during a Kubernetes experiment? I definitely did! I was scratching my head when I noticed resource utilization going beyond 100%. It got me curious, so I dug deeper to figure out what was going on. In this article, I'll share my findings and demystify how Kubernetes manages to go above and beyond the 100% mark in resource usage.

As you might be familiar, kubectl top can be used to get the point-in-time resource utilization of various resources in the cluster.

$ kubectl top node
NAME                 CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
aritra-node-1        878m         94%     6973Mi          102%

After some digging, I found some discussions on Github about the issue.

  • The CPU and memory utilization (the actual number) of a node is the actual resource utilization on the node, based on the metrics API reported through Cadvisor. This may include resources not running outside of the pods. This is also why sum of the resources consumed by pods (kubectl top pod ) running on the node may not match the aggregate resource utilization shown in kubectl top node
  • The percentages in the above calculation are based on allocatable resources, not the actual capacity of the node

    For example

  •   Memory(%)= node_memory_working_set_bytes *100/ Allocatable memory
    
  • Node capacity

    Source: Kubernetes

    allocatable = Node capacity - kube-reserved- system-reserved - eviction-threshold

    This is a logical concept that guides pod scheduling in Kubernetes. The scheduler assigns pods to nodes based on the allocatable memory/cpu and the requests/limits of the pod. Thus, it is possible that custom pods can end up using resources beyond allocatable (likely for pods without limits). However, if the node resource reaches the eviction-threshold, these pods might get evicted if the node has memory pressure.

  • From Kubernetes version 1.23, there is an option to get percentage resource utilization based on capacity. This flag is false by default and can be specified explicitly

    kubectl top node --show-capacity=true

      a@cloudshell:~$ kubectl top node 
      NAME                                              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
      gke-cluster-1-aritra-default-pool-cc5f2b3e-kzpe   163m         17%    1062Mi          37%       
      gke-cluster-1-aritra-default-pool-cc5f2b3e-mwgg   156m         16%    1012Mi          36%       
      gke-cluster-1-aritra-default-pool-cc5f2b3e-p0sr   162m         17%    1001Mi          35%       
      a@cloudshell:~$ kubectl top node --show-capacity=true
      NAME                                              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
      gke-cluster-1-aritra-default-pool-cc5f2b3e-kzpe   163m         8%     1062Mi          27%       
      gke-cluster-1-aritra-default-pool-cc5f2b3e-mwgg   156m         7%     1012Mi          25%       
      gke-cluster-1-aritra-default-pool-cc5f2b3e-p0sr   162m         8%     1001Mi          25%
    

As you can see, the actual resource utilization is the same, but the percentages are different based on allocatable vs node capacity.

  • This discussion doesn't apply to kubectl top pod that doesn't display percentage utilization

  • On the other hand, kubectl describe node displays the requests and limits of the pods all of which are based on allocatable

Today, numerous observability tools have alerts set up based on these metrics. To make the most of these tools and effectively respond with appropriate actions, it's crucial to have a solid understanding of the metric specifics.