Debugging AKS Certificate Issues: When Let’s Encrypt Rate Limits Strike Your Kubernetes Cluster
A complete guide to understanding, diagnosing, and solving certificate provisioning failures in Azure Kubernetes Service
Aravindan Thangaiah
20 Nov 2025
The Problem That Stopped Our Deployment
Picture this: You’re working on a Kubernetes deployment in Azure Kubernetes Service (AKS), everything seems configured correctly, but suddenly your Application Gateway Ingress Controller (AGIC) starts throwing errors:
Unable to find
the secret associated
to secretId:
[test-lab/test-lab]
Source:
azure/application-gateway
ingress-appgw-deployment-79d86b4bf4-kczvg
Count: 4
Your ingress is configured, cert-manager is installed, but somehow your TLS secrets aren’t being created. Sound familiar? You’re not alone.
The Investigation Journey
Step 1: Check if Secrets Exist
The first step in any Kubernetes mystery is verification. Let’s check if our secrets actually exist:
kubectl get secrets –all-namespaces | Select-String “test-lab”
Result: Nothing. The secret doesn’t exist anywhere in the cluster.
Step 2: Examine the Ingress Configuration
Looking at our ingress resource, everything appeared correct:Looking at our ingress resource, everything appeared correct:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-lab namespace: test-lab spec: rules: - host: test-lab.test.com http: paths: - backend: service: name: test-lab port: number: 80 path: / pathType: Prefix tls: - hosts: - test-lab.test.com secretName: test-lab # cert-manager should create this secret
The ingress was referencing a secretName: test-lab, but that secret didn’t exist. Since we’re using cert-manager, it should automatically create this secret.
Step 3: Check Certificate Resources
This is where the real detective work began:
kubectl get certificates -n test-lab
Output:
NAME READY SECRET AGE
test-lab False test-lab 12m
api-test False test 12m
NAME READY SECRET AGE
test-lab False test-lab 12m
api-test False test 12m
Both certificates were stuck in False state for 12 minutes. Something was preventing cert-manager from successfully provisioning our certificates.
The Root Cause Discovery
The breakthrough came when we described the certificate resource:
kubectl describe certificate
test-lab -n
test-lab
The smoking gun:
Message: The certificate request has failed to complete and will be retried:
Failed to wait for order resource “test-lab-1-324203979” to become ready:
order is in “errored” state: Failed to create Order:
429 urn:ietf:params:acme:error:rateLimited:
too many certificates (5) already issued for this
exact set of identifiers
in the last 168h0m0s, retry after
2025-09-16 21:22:00 UTC
Eureka! We had hit Let’s Encrypt’s rate limiting.
Understanding Let’s Encrypt Rate Limits
What Exactly Is Rate Limiting?
Let’s Encrypt implements several rate limits to prevent abuse and ensure their free service remains available to everyone. The specific limit we encountered is called “Duplicate Certificate Limit”.
The Specific Rate Limit We Hit
- Limit:5 certificates per exact set of identifiers (domain names) per week
- Window: Rolling 7-day (168-hour) period
- Scope: Exact same domain name(s) in the certificate
- Reset: When the oldest certificate ages out of the 7-day window
Why This Happens in Development Environments
Common Scenarios Leading to Rate Limit Hits:
- Rapid Development Iterations
- Frequent cluster rebuilds during testing
- Multiple deployment attempts while debugging configurations
- Testing different cert-manager configurations
- Configuration Mistakes
- Incorrect ingress annotations causing cert-manager to repeatedly retry
- Missing or misconfigured ClusterIssuers
- DNS validation failures leading to retry loops
- Kubernetes-Specific Issues
- cert-manager losing state during cluster operations
- Secrets getting accidentally deleted
- Namespace recreation removing existing certificates
The Solution Toolkit
Immediate Solutions
Option 1: Wait It Out (Production Approach)
The error message tells us exactly when the rate limit resets:
retry after 2025-09-16 21:22:00 UTC
For production environments, waiting is often the most appropriate solution.
Option 2: Use a Different Domain (Quick Fix)
Change your domain to bypass the rate limit:
spec:
rules:
– host: test–lab-v2.test.com
tls:
– hosts:
– test–lab-v2.test.com
secretName: test–lab-v2
rules:
– host: test–lab-v2.test.com
tls:
– hosts:
– test–lab-v2.test.com
secretName: test–lab-v2
Option 3: Switch to Let’s Encrypt Staging
For development environments, use the staging environment:
metadata: |
The staging environment has much higher rate limits (30,000 certificates per week).
Option 4: Create Temporary Self-Signed Certificates
Unblock your application immediately:
# Generate self-signed certificate
openssl req -x509 -nodes -days 30 -newkey rsa:2048 \
-keyout api-lab.key –out api-lab.crt \
openssl req -x509 -nodes -days 30 -newkey rsa:2048 \
-keyout api-lab.key –out api-lab.crt \
-subj “/CN=test-lab.test.com”
# Create the secret
kubectl create secret tls test-lab \
–cert=test-lab.crt –key=test-lab.key \
-n test-lab
# Create the secret
kubectl create secret tls test-lab \
–cert=test-lab.crt –key=test-lab.key \
-n test-lab
Prevention Strategies
1. Environment Separation
Use different subdomains for different environments:
- api-dev.yourdomain.com – Development
- api-staging.yourdomain.com – Staging
- api.yourdomain.com – Production
2. Development Best Practices
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
3. Certificate Backup and Restore
Backup your working certificates:
Backup your working certificates:
kubectl get secret test-lab -n test-lab -o yaml > test-lab-backup.yaml
Restore when needed:
kubectl apply -f test–lab-backup.yaml
4. Monitoring and Alerting
Set up monitoring for certificate status:
Set up monitoring for certificate status:
kubectl get certificates –all-namespaces -o custom-columns=NAMESPACE:.metadata.n
Diagnostic Commands Cheat Sheet
Here’s your troubleshooting toolkit:
kubectl get secrets –all-namespaces | Select–String “your-secret-name”
# Check certificate status
kubectl get certificates -n your-namespace
# Get detailed certificate information (MOST IMPORTANT COMMAND)
kubectl describe certificate your-cert-name -n your-namespace
# Describe all certificates in a namespace
kubectl describe certificates -n your-namespace
# Get detailed certificate information in YAML format
kubectl get certificate your-cert-name -n your-namespace -o yaml
# Check certificate requests and their details
kubectl get certificaterequests -n your-namespace
kubectl describe certificaterequests -n your-namespace
# Check cert-manager logs for errors
kubectl logs -n cert-manager -l app=cert-manager –tail=50
# Check ACME challenges (for Let’s Encrypt debugging)
kubectl get challenges -n your-namespace
kubectl describe challenges -n your-namespace
# Check ClusterIssuers and their status
kubectl get clusterissuers
kubectl describe clusterissuer your-issuer-name
# Check Issuers (namespace-scoped)
kubectl get issuers -n your-namespace
kubectl describe issuer your-issuer-name -n your-namespace
# Find all ingress resources using a specific domain
kubectl get ingress –all-namespaces -o yaml | Select-String “your-domain.com” -Context 3
# Check orders (ACME-specific resources)
kubectl get orders -n your-namespace
kubectl describe orders -n your-namespace
kubectl get certificates -n your-namespace
# Get detailed certificate information (MOST IMPORTANT COMMAND)
kubectl describe certificate your-cert-name -n your-namespace
# Describe all certificates in a namespace
kubectl describe certificates -n your-namespace
# Get detailed certificate information in YAML format
kubectl get certificate your-cert-name -n your-namespace -o yaml
# Check certificate requests and their details
kubectl get certificaterequests -n your-namespace
kubectl describe certificaterequests -n your-namespace
# Check cert-manager logs for errors
kubectl logs -n cert-manager -l app=cert-manager –tail=50
# Check ACME challenges (for Let’s Encrypt debugging)
kubectl get challenges -n your-namespace
kubectl describe challenges -n your-namespace
# Check ClusterIssuers and their status
kubectl get clusterissuers
kubectl describe clusterissuer your-issuer-name
# Check Issuers (namespace-scoped)
kubectl get issuers -n your-namespace
kubectl describe issuer your-issuer-name -n your-namespace
# Find all ingress resources using a specific domain
kubectl get ingress –all-namespaces -o yaml | Select-String “your-domain.com” -Context 3
# Check orders (ACME-specific resources)
kubectl get orders -n your-namespace
kubectl describe orders -n your-namespace
Key Takeaways
- Always check certificate status first when TLS secrets are missing
- Let’s Encrypt rate limits are real and hit development environments frequently
- Use staging environment for development to avoid production rate limits
- Domain separation is crucial for multi-environment setups
- Monitor certificate health as part of your operational practices
- Backup working certificates before making changes
Conclusion
Certificate management in Kubernetes can be tricky, especially when external services like Let’s Encrypt impose rate limits. The key is understanding the tools at your disposal and having a systematic approach to diagnosis.
Remember: that cryptic “secret not found” error might actually be a rate limiting issue in disguise. Always dig deeper with kubectl describe commands to get the full picture.
The next time you see certificate provisioning failures, you’ll know exactly where to look and how to resolve them quickly.