How to Enable and Query Control Plane Logs in AKS with Azure Monitor

How to Enable and Query Control Plane Logs in AKS with Azure Monitor

Reference: Microsoft documentation

In Kubernetes, the Control Plane is responsible for managing the state of the cluster, including scheduling, scaling, and deploying applications. Control Plane logs are a set of logs that provide insight into the operation of the Kubernetes Control Plane components. These logs can be used to diagnose issues and monitor the health of the Control Plane.

The Control Plane logs are generated by various Kubernetes components such as API server, audit, authenticator, controller manager, and scheduler. Each log type corresponds to a component of the Kubernetes Control Plane. For example, API server logs contain information about the API server component that exposes the Kubernetes API. Audit logs provide a record of the individual users, administrators, or system components that have affected your cluster.

As a managed service, AKS operates the control plane for customers. Customers can configure Control Plane Logging through Diagnostic settings to collect these logs.

Categories of Control Plane Logs in AKS

As of the time of this writing, there are around 10 different log categories mentioned here. A screenshot of the categories from the official Microsoft documentation is below

💡
Audit Logs (kube-audit) data which logs all requests can be verbose and impose significant costs. Please enable it only if these are required for your use case.

Configure Control Plane Logs in AKS

Here are the steps to configure control plane logs in Azure Kubernetes Service (AKS):

  1. Open the Azure portal and navigate to your AKS cluster.

  2. Click on the Monitoring tab on the Overview page.

  3. Click on Diagnostic settings.

  4. Click on Add diagnostic setting.

  5. In the Name field, enter a name for your diagnostic setting.

  6. Under Destination details, select Log Analytics as the destination type. This is by far the most common option and is also used by Container Insights. You can also configure it to send to a 3rd party tool (such as Splunk) using the Send to Partner solution

  7. Choose an existing Log Analytics workspace or create a new one.

  8. Under Logs, select the control plane logs you want to collect.

  9. NOTE: Under the destination table for the Log Analytics workspace, there are two options: Azure Diagnostics and Resource Specific. Later in the article, we will discuss the implications of choosing one vs the other

  10. Click on Save to save your diagnostic setting.

This can also be configured using the CLI as shown below. More details in the documentation here

az monitor diagnostic-settings create --name AKS-Diagnostics --resource /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.ContainerService/managedClusters/my-cluster --logs '[{""category"": ""kube-audit"",""enabled"": true}, {""category"": ""kube-audit-admin"", ""enabled"": true}, {""category"": ""kube-apiserver"", ""enabled"": true}, {""category"": ""kube-controller-manager"", ""enabled"": true}, {""category"": ""kube-scheduler"", ""enabled"": true}, {""category"": ""cluster-autoscaler"", ""enabled"": true}, {""category"": ""cloud-controller-manager"", ""enabled"": true}, {""category"": ""guard"", ""enabled"": true}, {""category"": ""csi-azuredisk-controller"", ""enabled"": true}, {""category"": ""csi-azurefile-controller"", ""enabled"": true}, {""category"": ""csi-snapshot-controller"", ""enabled"": true}]'  --workspace /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/myresourcegroup/providers/microsoft.operationalinsights/workspaces/myworkspace
💡
Diagnostic settings requires no agent deployed on the users nodes to be able to collect these logs. Thus, there are no performance or resource concerns when using these logs.

Querying Control Plane Logs in AKS

Depending on the configuration in Step 9, the Control Plane logs are stored in Azure Diagnostics in the Azure Diagnostics mode or AKSControlPlane, AKSAudit, AKSAuditAdmin in the Resource Specific mode. The benefit of the Resource Specific mode is that it can be used with Basic logs which provides tremendous cost benefits. The logs in are split based on the Category field

To query Control Plane logs in Azure Kubernetes Service (AKS) on the portal, you can follow these steps:

  1. Open the Azure portal and navigate to your AKS cluster.

  2. Click on the Monitoring tab on the Overview page.

  3. Click on Logs

    💡
    This works only if Container Insights is enabled on the AKS cluster. Else, you can navigate to the Log analytics workspace(configured before with Diagnostic settings) on Portal to query the same information
  4. Choose the Audit/Diagnostic option on the All Queries. Alternatively, you can use the Find in table to query individual tables. The former has some sample queries to get started

💡
In this case, the sample queries will not work for us as we configured the Azure Diagnostics mode whereas the sample queries are all in Resource Specific mode
  1. In this case, I chose to write my own query for API Server logs
AzureDiagnostics
| where Category =='kube-apiserver'
| take 100

Using Control Plane Logs for Diagnosis in AKS

Some common use-cases for Control Plane logs

  • Monitor Authentication issues

  • Latency issues with the API server: In this case, metrics alone are not sufficient because the metrics do not have the user agent which is critical for the investigation

  • Scheduling Issues with scheduler/cluster-autoscaler

Links for documentation