HomePlatformSolutionsResourcesCustomers
AIOps·Mar 18, 2025·8 min read

How we reduced false positive alerts by 94% with per-entity behavioural baselines

Applicare Engineering Team Mar 18, 2025 8 min read

The alert fatigue problem nobody talks about honestly

Most observability vendors will tell you their tool reduces alert noise. What they won't tell you is how many alert rules you need to write to get there — and how quickly those rules become stale as your infrastructure changes.

IntelliSense takes a fundamentally different approach. Instead of rules, it learns normal. Here's exactly how we reduced false positive alerts by 94% across Enterprise customers — and why the architecture behind it matters.

94%
False positive reduction
200+
Enterprise customers
0
Alert rules needed

Why threshold-based alerting fails at scale

Traditional monitoring works by letting you set thresholds: "alert when CPU exceeds 80%." This sounds reasonable until you realise that a CPU spike during a batch job at 2am is normal, while the same spike during checkout on a Friday afternoon is a crisis.

The threshold doesn't know the difference. So you either set it too low and get flooded with alerts, or set it too high and miss real incidents. Most teams end up with hundreds of rules, a dedicated engineer to maintain them, and an on-call team that's learned to ignore pages.

At one of our largest customers, their previous monitoring tool was generating 340 alerts per week. Their on-call team was responding to fewer than 20. The rest were noise — but they couldn't safely ignore them because occasionally one of the 340 was real.

Applicare IntelliSense — per-entity baselines vs actual metrics

Per-entity behavioural baselines — how it works

IntelliSense builds a separate baseline for every entity in your environment — every service, every host, every database instance. It learns the normal behaviour pattern for that specific entity, including time-of-day patterns, day-of-week patterns, and correlations with other entities.

A checkout service that processes 10,000 requests per minute on Friday afternoons has a completely different baseline than the same service at 3am on Tuesday. IntelliSense models both — automatically, without any configuration.

What "per-entity" actually means in practice

Most anomaly detection tools build a single model for a metric type across all instances. IntelliSense builds one model per entity-metric pair. For a cluster with 340 services, that means:

This is computationally expensive — which is why nobody else does it at this granularity. But it's the only way to make the alerting actually accurate.

The 94% number — where it comes from

We measured this across all customers who migrated from a threshold-based system to IntelliSense. We compared the alert volume in the 30 days before migration with the 30 days after, controlling for incident rate (we only count alerts that weren't associated with a real user-impacting incident as false positives).

The median reduction was 94%. Some customers saw higher reductions (one saw 98%), some lower (the lowest was 87%) — but every single customer saw a significant reduction.

The most important finding: the real incident detection rate didn't decrease. IntelliSense caught every significant incident that the threshold-based system would have caught — it just stopped alerting on the 94% that didn't need human attention.

What this means for your on-call team

When on-call engineers stop being woken up for noise, something important happens: they start trusting the alerts that do fire. A team that responds to 4 alerts a week and finds a real incident 3.5 times has a very different relationship with their monitoring than a team that responds to 40 alerts and finds a real incident twice.

Trust in alerting is the foundation of effective incident response. IntelliSense is designed to earn that trust by being right, not just loud.

← Back to blog Try Applicare free →