Owner: Engineering Team | Last Updated: 2026-01-30 | Status: Current
Step-by-step guide for investigating production issues.
production/ecs/{cluster}/{service}| Symptom | Likely Cause | Investigation |
|---|---|---|
| 500 errors spike | Backend exception | Sentry → stack trace |
| Slow responses | Database or model latency | CloudWatch metrics |
| Auth failures | Token/OAuth issue | Sentry + user reports |
| UI broken | Frontend bug | Clarity replay + Sentry |
| Date | Author | Change |
|---|---|---|
| 2026-01-30 | Admin | Initial creation |
Prev: Guide: Database Migrations | Next: Code Review Checklist | Up: General