Debugging AKS Certificate Issues: When Let’s Encrypt Rate Limits Strike Your Kubernetes Cluster

Debugging AKS Certificate Issues: When Let’s Encrypt Rate Limits Strike Your Kubernetes Cluster

A complete guide to understanding, diagnosing, and solving certificate provisioning failures in Azure Kubernetes Service

Aravindan Thangaiah

20 Nov 2025

The Problem That Stopped Our Deployment

Picture this: You’re working on a Kubernetes deployment in Azure Kubernetes Service (AKS), everything seems configured correctly, but suddenly your Application Gateway Ingress Controller (AGIC) starts throwing errors:
Unable to find the secret associated to secretId: [test-lab/test-lab]
Source: azure/application-gateway ingress-appgw-deployment-79d86b4bf4-kczvg
Count: 4
Your ingress is configured, cert-manager is installed, but somehow your TLS secrets aren’t being created. Sound familiar? You’re not alone.

The Investigation Journey

Step 1: Check if Secrets Exist
The first step in any Kubernetes mystery is verification. Let’s check if our secrets actually exist:
kubectl get secrets –all-namespaces | Select-String “test-lab”
Result: Nothing. The secret doesn’t exist anywhere in the cluster.
Step 2: Examine the Ingress Configuration
Looking at our ingress resource, everything appeared correct:Looking at our ingress resource, everything appeared correct:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-lab
  namespace: test-lab
spec:
  rules:
  - host: test-lab.test.com
    http:
      paths:
      - backend:
          service:
            name: test-lab
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - test-lab.test.com
    secretName: test-lab  # cert-manager should create this secret
The ingress was referencing a secretName: test-lab, but that secret didn’t exist. Since we’re using cert-manager, it should automatically create this secret.
Step 3: Check Certificate Resources
This is where the real detective work began:
kubectl get certificates -n test-lab
Output:
NAME       READY   SECRET     AGE
test-lab    False   test-lab    12m
api-test   False   test   12m
Both certificates were stuck in False state for 12 minutes. Something was preventing cert-manager from successfully provisioning our certificates.

The Root Cause Discovery

The breakthrough came when we described the certificate resource:
kubectl describe certificate test-lab -n test-lab

The smoking gun:

Message: The certificate request has failed to complete and will be retried:

Failed to wait for order resource “test-lab-1-324203979” to become ready:

order is in “errored” state: Failed to create Order:

429 urn:ietf:params:acme:error:rateLimited:
too many certificates (5) already issued for this exact set of identifiers in the last 168h0m0s, retry after 2025-09-16 21:22:00 UTC

Eureka! We had hit Let’s Encrypt’s rate limiting.

Understanding Let’s Encrypt Rate Limits

What Exactly Is Rate Limiting?
Let’s Encrypt implements several rate limits to prevent abuse and ensure their free service remains available to everyone. The specific limit we encountered is called “Duplicate Certificate Limit”.

The Specific Rate Limit We Hit

  •  Limit:5 certificates per exact set of identifiers (domain names) per week
  • Window: Rolling 7-day (168-hour) period
  • Scope: Exact same domain name(s) in the certificate
  • Reset: When the oldest certificate ages out of the 7-day window

Why This Happens in Development Environments

Common Scenarios Leading to Rate Limit Hits:
  • Rapid Development Iterations
    • Frequent cluster rebuilds during testing
    • Multiple deployment attempts while debugging configurations
    • Testing different cert-manager configurations
  • Configuration Mistakes
    • Incorrect ingress annotations causing cert-manager to repeatedly retry
    • Missing or misconfigured ClusterIssuers
    • DNS validation failures leading to retry loops
  • Kubernetes-Specific Issues
    • cert-manager losing state during cluster operations
    • Secrets getting accidentally deleted
    • Namespace recreation removing existing certificates

The Solution Toolkit

Immediate Solutions
Option 1: Wait It Out (Production Approach)

The error message tells us exactly when the rate limit resets:

retry after 2025-09-16 21:22:00 UTC
For production environments, waiting is often the most appropriate solution.
Option 2: Use a Different Domain (Quick Fix)
Change your domain to bypass the rate limit:
spec:
rules:
– host: testlab-v2.test.com

tls:
– hosts:
  – testlab-v2.test.com
  secretName: testlab-v2
Option 3: Switch to Let’s Encrypt Staging
For development environments, use the staging environment:

metadata:
annotations:
  cert-manager.io/cluster-issuer: “letsencrypt-staging”

The staging environment has much higher rate limits (30,000 certificates per week).
Option 4: Create Temporary Self-Signed Certificates
Unblock your application immediately:
# Generate self-signed certificate
openssl req -x509 -nodes -days 30 -newkey rsa:2048 \
-keyout api-lab.key –out api-lab.crt \
-subj “/CN=test-lab.test.com”
# Create the secret
kubectl create secret tls test-lab \
–cert=test-lab.crt –key=test-lab.key \
-n test-lab

Prevention Strategies

1. Environment Separation

Use different subdomains for different environments:

  • api-dev.yourdomain.com – Development
  • api-staging.yourdomain.com – Staging
  • api.yourdomain.com – Production
2. Development Best Practices
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
  server: https://acme-staging-v02.api.letsencrypt.org/directory
3. Certificate Backup and Restore
Backup your working certificates:
kubectl get secret test-lab -n test-lab -o yaml > test-lab-backup.yaml
Restore when needed:
kubectl apply -f testlab-backup.yaml
4. Monitoring and Alerting
Set up monitoring for certificate status:
kubectl get certificates –all-namespaces -o custom-columns=NAMESPACE:.metadata.n

Diagnostic Commands Cheat Sheet

Here’s your troubleshooting toolkit:
kubectl get secrets –all-namespaces | SelectString “your-secret-name”
# Check certificate status
kubectl get certificates -n your-namespace
# Get detailed certificate information (MOST IMPORTANT COMMAND)
kubectl describe certificate your-cert-name -n your-namespace
# Describe all certificates in a namespace
kubectl describe certificates -n your-namespace
# Get detailed certificate information in YAML format
kubectl get certificate your-cert-name -n your-namespace -o yaml
# Check certificate requests and their details
kubectl get certificaterequests -n your-namespace
kubectl describe certificaterequests -n your-namespace
# Check cert-manager logs for errors
kubectl logs -n cert-manager -l app=cert-manager –tail=50
# Check ACME challenges (for Let’s Encrypt debugging)
kubectl get challenges -n your-namespace
kubectl describe challenges -n your-namespace
# Check ClusterIssuers and their status
kubectl get clusterissuers
kubectl describe clusterissuer your-issuer-name
# Check Issuers (namespace-scoped)
kubectl get issuers -n your-namespace
kubectl describe issuer your-issuer-name -n your-namespace
# Find all ingress resources using a specific domain
kubectl get ingress –all-namespaces -o yaml | Select-String “your-domain.com” -Context 3
# Check orders (ACME-specific resources)
kubectl get orders -n your-namespace
kubectl describe orders -n your-namespace

Key Takeaways

  1. Always check certificate status first when TLS secrets are missing
  2. Let’s Encrypt rate limits are real and hit development environments frequently
  3. Use staging environment for development to avoid production rate limits
  4. Domain separation is crucial for multi-environment setups
  5. Monitor certificate health as part of your operational practices
  6. Backup working certificates before making changes

Conclusion

Certificate management in Kubernetes can be tricky, especially when external services like Let’s Encrypt impose rate limits. The key is understanding the tools at your disposal and having a systematic approach to diagnosis. Remember: that cryptic “secret not found” error might actually be a rate limiting issue in disguise. Always dig deeper with kubectl describe commands to get the full picture. The next time you see certificate provisioning failures, you’ll know exactly where to look and how to resolve them quickly.