AeroMexico operates 340 microservices supporting flight operations, reservations, digital ticketing, and loyalty programmes across AWS and on-premises infrastructure. Applicare reduced their MTTR from 4.5 hours to 11 minutes and transformed their digital customer experience.
Modern airline operations are extraordinarily complex from a software perspective. AeroMexico's digital platform handles flight reservations, real-time seat inventory, baggage tracking, loyalty point calculations, payment processing, and digital ticketing — all simultaneously, all with zero tolerance for downtime during peak travel periods.
Their 340 microservices span AWS infrastructure and on-premises data centres, with hundreds of inter-service dependencies. When something went wrong — and in a system this complex, something always eventually went wrong — the diagnosis process was a war room exercise involving 8+ engineers, 5 separate monitoring dashboards, and an average of 4.5 hours before root cause was identified.
During peak travel periods like Christmas and Semana Santa, a 4.5-hour outage on the digital ticketing platform could affect 50,000+ passengers and generate significant revenue impact. The pressure to resolve incidents faster was existential.
Applicare's entity graph auto-discovered all 340 microservices and their dependencies within the first 12 hours of deployment. No manual CMDB configuration. No infrastructure-as-code parsing. The graph identified relationships between services that the team hadn't formally documented — including several circular dependency risks that were immediately addressed.
With the complete causal graph in place, ArcIn could traverse the full dependency chain when diagnosing incidents. A slowdown in digital ticketing could be immediately correlated with the inventory availability service, which depended on the reservation database, which was experiencing connection pool pressure from a concurrent loyalty points calculation batch job.
Within the first month, ArcIn identified a performance pattern in AeroMexico's digital ticketing flow that had been causing intermittent slowdowns for 18 months. The ticketing service was performing N+1 queries against the seat inventory database — executing one query per available seat class rather than batching — a pattern that only manifested under load during busy booking periods.
The fix was straightforward once identified: a single query change reduced ticketing API response times by 340ms at peak load. But identifying it had taken Applicare 47 seconds; the team had been chasing the symptom for 18 months.
Three months after deployment, AeroMexico experienced a major incident during a peak booking period. A database configuration change had inadvertently reduced the connection pool size for the reservation service. Within 60 seconds, checkout was degrading for all users.
ArcIn identified the root cause in 47 seconds and surfaced it directly to the on-call engineer's Slack. IntelliTune automatically restored the connection pool configuration in 380ms. Total user-facing impact: 4 minutes. Total engineer involvement: one person, one Slack message. No war room. No bridge call.
Before Applicare, diagnosing a p99 regression meant a war room with 8 engineers, 5 dashboards, and 4.5 hours of manual correlation. Now ArcIn tells us the root cause and the fix in under 60 seconds. Every time.