Google Ads · Anomaly Detection

Google Ads Anomaly Detection: How to Catch Spend Spikes, Conversion Drops, and Tracking Outages Before They Cost You a Week

Built-in alerts are too noisy or too late. The Account Anomaly Detector script is brittle. The stack that actually works in 2026: rolling baselines, severity tiers, and an agent that classifies before it pages.

B6
By B6 TeamKampaioMay 15, 2026 · 14 min read

TL;DR - Why Google Ads Anomaly Detection Is Broken Out of the Box

If you manage 5 or more Google Ads accounts, you already know the pattern. Monday morning, you open one of the smaller clients, and CPA is up 73% over a 4 day window. You scroll back. Day 1 was fine. Day 2 was fine. Day 3 was when the GA4 container deploy went out and Enhanced Conversions stopped firing on 41% of checkout events. You missed it because nobody paged you. The built-in Google Ads notification panel surfaced two things in that window: a payment method expiring in 60 days, and an "auto-applied recommendation" suggesting a budget raise on the campaign that just stopped converting.

This is the central problem with Google Ads anomaly detection in 2026. The built-in alerts are too noisy or too late. They fire on the things Google chose to monitor (disapprovals, budget caps, policy issues), not on the things that matter to a portfolio manager (CPA drift, conversion-rate breakage, click bombing, bid-algo overreach during Smart Bidding learning). The Account Anomaly Detector script that Google publishes is closer to right, but it is brittle: same-day-of-week mean, hard percentage thresholds, one email a day. It catches the obvious and misses everything subtle.

What you actually need is a stack. Rolling baseline math, severity classification, routing that knows the difference between "wake Sara up" and "log for the weekly digest." This article is the comparison guide for that stack. Built-in alerts vs the script vs commercial monitors (Go-Insights, Promonavigator's collection, Optmyzr) vs an agent-based approach where a dedicated reviewer (Aegis on B6) classifies severity before anything escalates.

The 30 second triage when something feels wrong: pull the last 28 days, check if spend is off baseline, check if conversions are off baseline, check if the ratio between them is off baseline. Two of three drifting in the same direction is real. One of three drifting alone is almost always either tracking or seasonality.

Distribution of real (action-required) anomalies across a 12-account portfolio over 12 months. Tracking outages are the single largest category. Click bombing is rare but high-priority when it hits.

The Anatomy of a Real Account Anomaly

The word "anomaly" gets thrown around loosely. Let's be precise. An anomaly is a deviation from a rolling baseline that exceeds an expected range. Three pieces matter: the rolling baseline (not a static threshold), the deviation measure (z-score or percentage), and the expected range (the false-positive budget you accept).

A rolling baseline is the mean of the metric over a trailing window, typically 14 or 28 days, computed for the equivalent slice of time. The trailing 14 day same-hour mean is what you compare today's 10:00 AM spend against. Not yesterday's 10:00 AM, and not the static daily budget. Sara's Tuesday 2 PM should be compared to the 14 prior Tuesday 2 PMs, not to the previous Tuesday or to today's account total.

The deviation measure determines what counts as significant. Two standard deviations from the rolling mean gives you a roughly 5% expected false-positive rate on normally distributed data. Three standard deviations drops that to under 1%. Percentage thresholds are simpler but worse: a 30% jump on a campaign that normally moves 5% day-over-day is a real anomaly, but a 30% jump on a campaign that already moves 25% day-over-day is just Tuesday.

Rolling 14-day baseline with ±2σ band (CPA, daily) $70 $50 $30 $10 Severity 1: 3.4σ above baseline Expected range (±2σ) 14-day rolling mean Observed CPA Anomaly (out of band) Day-over-day CPA stays inside the expected band for 28 days, then breaks out for 3 consecutive days. That is the signal.
A rolling baseline with ±2σ band. The first 28 days stay inside the expected range. The 3 spike points are out-of-band by more than 3σ and trigger a severity 1 classification.

In practice, after running anomaly detection across a portfolio for a year, the categories of "real" alerts cluster:

  • Tracking outage (35-45% of real anomalies). Conversion volume drops sharply, often paired with a deploy timestamp. Almost always the highest priority. If you do not solve this first, you will be paged for follow-on anomalies that are downstream of broken data. The diagnostic sequence is in our conversion tracking not working playbook.
  • Bid-algo overreach (15-25%). Smart Bidding in learning period or after a Target CPA/Target ROAS change, CPC climbs 40-80% while conversion volume holds flat. Looks like a problem, is mostly noise, but requires confirmation that you are inside the 2 to 6 week learning window.
  • Seasonal pressure (10-20%). Q4, BFCM, geographic holidays, new entrants in Auction Insights. Real, but not actionable as an alert. Calendar this, do not page on it.
  • Real fraud or click bombing (5-10%). 3x click volume spike inside one hour on a single campaign with no corresponding impression spike. Rare, but when it hits you want to know in minutes.
  • Product or landing page change (10-15%). A price update, an out-of-stock SKU, or a checkout redesign. Conversion rate moves but click volume stays flat. Engineering deploys are correlated with this category more often than agencies admit.
🛡️Aegis· Risk review
Aegis caught 47 anomalies last week across 12 accounts. 3 needed action: one tracking outage, one click bombing pattern on a small geo campaign, one Smart Bidding strategy that flipped overnight. 44 were noise the team would have spent 2 hours investigating: weekend seasonality, single-keyword variance, and one Performance Max campaign doing what Performance Max campaigns do. Classification is the work. The math is the easy part.

The Detection Stack Compared: Built-in vs Script vs Commercial vs Agent

There are four levels of anomaly detection in the Google Ads ecosystem. They serve different roles. None is a complete solution alone.

Google built-in alerts and recommendations. Surface-level. Disapprovals, payment issues, "your campaign is limited by budget," some auto-applied recommendations. Useful as a floor. Insufficient as a primary signal. In-account notifications and email notifications cover the operational basics (billing, disapprovals, suspension). For MCC users, manager-account notifications are a separate setup that has to be enabled per child account.

The Account Anomaly Detector script. Google's own Apps Script solution, published in the Ads Scripts docs. It compares today's running stats against the average of the same day of week across the prior 26 weeks. Adjustable thresholds per metric in a Google Sheet. Single email per alert per day. Good baseline. Two weaknesses: the same-day-of-week mean breaks badly if the account has structural changes inside the 26 week window (campaign restructure, new product launch, seasonality shift), and the per-metric percentage thresholds do not scale across a portfolio of accounts with different volatility profiles.

The Campaign Anomaly Detector (CAD v2). Open-sourced by Google in 2022 and rewritten in 2023, available on GitHub. Monitors at account level and campaign level, supports configurable past windows and current windows, has a 30-minute execution timeout, and ships with an interactive Google Sheets configuration tab. Closer to what Sara wants. Still rule-based rather than statistical, and the multi-account version requires load balancing across script instances.

Commercial monitors. Go-Insights routes anomaly detection into Slack, Teams, and email with 24/7 monitoring on CPC, spend, impressions, and similar metrics. Promonavigator's anomaly script collection bundles 14 different anomaly tracking scripts ranging from low Quality Score detection to suspicious-click filtering (one of which flags campaigns exceeding "30% invalid clicks during the day"). Optmyzr has a similar alerts layer inside its rule engine. Useful when you want pre-built routing and do not want to maintain the script yourself.

Agent-based detection. The newest layer. An agent runs continuously across the account, computes the rolling baseline, classifies the deviation against learned patterns, and decides whether to escalate, block a related action, or absorb the signal as noise. On B6, this is what Aegis does. The classification step is what separates an agent from a script: a percentage threshold can fire, but it cannot tell you why, and it cannot block a Smart Bidding change that is about to compound the anomaly.

LayerDetection modelRoutingBest fit
Google built-inRule-based notifications, recommendationsIn-account + email1-account operators, baseline floor
Account Anomaly Detector scriptSame-day-of-week mean, % thresholds, 26 wk windowSingle email per alert per daySingle account, low setup cost
CAD v2 (GitHub)Configurable past vs current window thresholdsSheet log + emailMulti-campaign account, technical owner
Commercial monitorMostly rule-based, pre-built integrationsSlack, Teams, email, webhookMulti-account agency, no internal eng
Agent-based (Aegis on B6)Statistical + rule overrides + classifierSeverity-tiered routing with action blockingMulti-account portfolio, autonomy required
The detection stack from least to most sophisticated. Most real-world setups combine layer 1 (built-in floor) with one of layers 2-5. Aegis sits on top by classifying severity and blocking downstream actions, not just by detecting.

Threshold Math: What Should Actually Trigger an Alert

The honest version of "what threshold should I use" is: it depends on the metric and the account volatility. Here is the working set we use on portfolios of 5 to 30 accounts, calibrated to roughly 5% false-positive rate on the alerts that fire.

Threshold matrix · metric · rule · severity
Metric
Spend pacing
±15% intra-day vs trailing 14-day same-hour band, sustained 60 min
S2
Metric
CPA
±25% week-over-week is "look at it", ±50% is escalation
S1 / S2
Metric
CTR
2σ below rolling 28-day mean, campaign level only
S3
Metric
Conversion rate
Drop > 30% sustained 24h triggers tracking suspicion first
S1
Metric
Click volume
3x spike in 1h with flat impressions = click bombing
S2
Metric
CPC
2σ above 28-day mean, persistent across 6h window
S3

Spend pacing needs the "sustained 60 minutes" clause to kill 80% of single-blip noise. If pacing is genuinely off, work through the Google Ads not spending full budget playbook to tell pacing problems apart from CPC problems. For CPA, ±25% week-over-week is a "look at it" signal, ±50% is escalation, and the full diagnostic sequence is in our ROAS dropped suddenly walkthrough.

A drop in conversion rate greater than 30% sustained over 24 hours triggers "tracking suspicion" first, not "performance investigation." Nine times out of ten the data is wrong, not the campaign. A 3x click volume spike inside one hour on a single campaign with flat impression growth is click bombing until proven otherwise. Invalid traffic monitoring should already be filtering this, but invalid traffic detection runs after the fact and refunds you, it does not prevent the spend. For CPC, the same standard-deviation logic as CTR applies, plus a rule that the spike must persist across a 6 hour window. CPC fluctuates inside Smart Bidding learning periods routinely. Most of those signals are noise. The CPC too high diagnostic covers the durable CPC pattern.

Two thresholds always cause arguments inside teams. The first is whether to use z-score or percentage. The answer for a Sara-sized portfolio is z-score for the alert math, percentage for the human-readable description in the alert payload. "Spend on Campaign X is 2.4σ above rolling baseline (currently $342 vs expected $185 to $230)" is what the rule engine evaluates. "Spend on Campaign X jumped 78%" is what shows up in the Slack message. The second is whether seasonality should be hand-coded or learned. Hand-coded wins for portfolios under 50 accounts. The hand-coded version is two lines in the rule engine: "between Nov 20 and Dec 26, widen the spend band by 40%."

Anomaly Severity Tiers and Who Should Get the Page

A good alerting system has four severity tiers and explicit routing rules. The fastest way to burn out a PPC team is to page on every severity-2 event.

  • Severity 1 (critical, pages immediately). Tracking outage suspected (conversions to zero or drop greater than 50% across the account), brand campaign paused, account suspended, payment failure. These wake Sara up. They should be 1 to 3 per month across a 12 account portfolio. More than that and the threshold is wrong.
  • Severity 2 (high, Slack channel within 1 hour). Spend pacing greater than 30% off baseline, CPA spike greater than 50%, click bombing pattern, Smart Bidding strategy switched without notice. These get investigated same day.
  • Severity 3 (medium, daily digest). Drift signals, single-keyword anomalies, single-campaign Auction Insights shifts, CTR slipping over a multi-day window. These go into Echo's weekly digest, not into Slack.
  • Severity 4 (noise, archived). Seasonal patterns, expected weekend behavior, single-day blip on a campaign with high natural variance. These get logged so the false-positive rate stays measurable, but they never page anyone.

The mapping matters more than the math. A statistical model that fires 200 severity-2 alerts per week is worse than a dumb threshold that fires 4 severity-1 alerts per week, because the 200 alerts get muted and then the 4 real ones get muted with them.

📊Echo· Reporting
Last week's digest covered 6 accounts. 2 severity 1 events (both tracking-related, both auto-classified by Aegis in under 200 ms), 11 severity 2 events that were investigated and closed inside the day, 38 severity 3 drift signals in the digest table, and 174 severity 4 noise events logged for the false-positive review. Sara saw the 2 severity 1 events in real time. The other 223 went into the weekly write-up without paging anyone.

How Aegis Detects, Classifies, and Routes Anomalies in B6

Aegis is the risk-review and anomaly-detection agent in the B6 multi-agent stack. Its job is to sit between the other agents and the production Google Ads account, classify every proposed change and every observed metric deviation, and either pass, escalate, or block. Aegis is the lead defense layer.

The Aegis loop is rule-augmented statistical. The rolling baseline is computed across the trailing 28 day window per campaign per hour-of-day. Deviations beyond 2σ enter the classifier. The classifier has explicit overrides for known patterns: brand campaign actions are always severity 1, anything touching the conversion tag is always severity 1, anything inside a Smart Bidding learning window gets de-prioritized one tier because volatility there is expected. The output is a severity-tagged alert with a recommended next action.

The Sage -> Aegis -> Buzz -> Echo -> User flow on a real bid-change proposal. Aegis is the gate. Risk score >= 80 blocks the action and pages immediately; 50-79 escalates to manual; under 50 auto-applies with a log entry.

In Sprint 5, on a real Goodevas It client account, Aegis raised a risk score of 82/100 on a proposed Buzz bid action. The action was a "logical" bid cut on the top performer in a brand campaign. Aegis blocked it because two anomalies fired in the same minute: the brand campaign pattern (always severity 1) and a tracking-suspicion flag (conversion rate had drifted in the prior 6 hours). Buzz's proposed change would have killed a chunk of the account's revenue while masking the underlying tracking issue. The user got a single notification with the severity classification, the math, and the recommended next action ("verify tracking before reconsidering bid change"). No paging at 2 AM. No 47-alert Slack flood.

The other mascots are part of the chain. Sage feeds Aegis the keyword-level and audience-level signals that statistical baselines need. Buzz is the agent whose proposals Aegis reviews most often, since bid changes are the most frequent action class. Echo writes the incident note in the weekly digest, so the team has a written audit trail of every severity 2 and 3 event without having to scroll Slack.

🐝Buzz· Bidding
On the Goodevas It account in Sprint 5, I proposed 23 bid changes over two weeks. Aegis blocked 6 of them with hard risk reasons. The 6 included one brand campaign cut (risk score 82) that would have killed 32% of the account's revenue. The other 17 ran clean. Average CPC dropped 18% in 14 days, conversion volume held flat. Every blocked proposal came with a written reason and a recommended fix. That is the loop.

The pitch is not "AI does anomaly detection instead of you." The pitch is: Aegis classifies severity in under 100 milliseconds per event, you spend your attention on the 3 to 5 severity-1 events per month that actually need an operator decision. Everything else is logged and digested. See pricing tiers for how the agent layer is packaged, or open a free Buzz audit on one of your accounts to see Aegis classification in action on real data.

Building Your Own Anomaly Detection Layer (When You Can't Buy)

If you cannot or will not buy a commercial layer or move to an agent-based stack, the buildable version is six steps. We have shipped this for clients who wanted to keep the logic in-house. It is not glamorous and it works.

The 6-step homemade anomaly detection flow. Statistical detection (the >2sigma check) is the entry gate; the classifier turns a raw signal into a severity-routed alert. The Smart-Bidding-learning-window check is what kills the largest noise category in most accounts.
  1. Rolling baseline metrics view. Pull hourly aggregates per campaign per metric into a store you can query. BigQuery is the right home if you have it. Google Sheets with a Google Ads export macro works for portfolios under 10 accounts. The window is trailing 28 days, recomputed daily.
  2. Standard deviation per metric per campaign. Calculate σ on the trailing window. Store it next to the mean. The pair (μ, σ) is what every threshold check reads from.
  3. Per-campaign threshold calibration. Brand campaigns have lower natural variance than non-brand. Performance Max has different variance than Search. Calibrate per campaign type, not as a global account threshold. The Account Anomaly Detector script is fine as a starting kit, but its single global percentage threshold is the reason teams find it noisy.
  4. Severity classifier. The four tier table above, encoded as rules. Brand campaign action gets bumped to severity 1 regardless of math. Anything in Smart Bidding learning gets de-prioritized one tier. Tracking-related flags always go to severity 1.
  5. Routing. Severity 1 to PagerDuty or direct phone. Severity 2 to a dedicated Slack channel with @here. Severity 3 to a daily digest email. Severity 4 to a logged-only sink. The routing layer is the cheapest part to build and the most important to get right.
  6. Weekly false-positive review. Open the past 7 days of alerts. Mark each as "real" or "noise." If your false-positive rate is above 30% on severity 1 or 2, the math is too loose. Above 60% on severity 3 is fine, that tier is supposed to be wide. This review is what keeps the system trustworthy over time.

A small team can stand this up in a long weekend if BigQuery is already in the stack. The maintenance cost is the weekly false-positive review and the occasional threshold recalibration when an account changes structure.

FAQ

What is the Account Anomaly Detector script in Google Ads? Google's first-party Apps Script for anomaly detection, documented here. It compares the current day's running stats (impressions, clicks, conversions, cost) against the average of the same day of week over the prior 26 weeks. Thresholds are configurable per metric in a Google Sheet. Sends a single email per alert per day. It is a fine baseline, brittle as a primary detection layer for a 10+ account portfolio.

How do I set up alerts for unusual activity in Google Ads? Three layers. (1) Turn on the built-in in-account notifications and email notifications for the operational basics (payment, disapproval, suspension). (2) Deploy the Account Anomaly Detector script or CAD v2 for performance-metric anomalies. (3) Layer a routing tool (Slack via Go-Insights or a commercial monitor, or an agent like Aegis) for severity classification and on-call paging.

What is a normal false-positive rate for ad-account alerts? Aim for under 15% on severity 1 alerts and under 30% on severity 2. Higher on severity 3 is acceptable because that tier is supposed to be wide. If severity 1 false positives exceed 30%, you have either threshold drift, structural change in an account (campaign restructure, product launch) that nobody told the rule engine about, or both.

Is z-score better than percentage thresholds? For alerting math, yes. Z-score adapts to the natural variance of the campaign, so a noisy Performance Max campaign does not page you every Tuesday for routine 25% swings. For the human-readable alert payload, percentage is better because it is faster to parse. Use both: z-score for the rule, percentage for the message.

Does Google have built-in anomaly detection? Partial. The Recommendations page surfaces some performance opportunities and warnings. The Anomalies card in Google Ad Manager is a beta feature on the publisher side, not the advertiser side. In Google Ads proper, you get notifications and recommendations but no true rolling-baseline anomaly engine. That is the gap the Account Anomaly Detector script and the commercial layers exist to fill.

Stop Reacting to Spend Spikes. Detect Them.

Three things to take away. The built-in Google Ads notification system is a floor, not a ceiling. The Account Anomaly Detector script is the cheapest meaningful upgrade and worth deploying even if you plan to layer something on top of it. Severity classification matters more than statistical sophistication: a dumb threshold with the right routing is more useful than a clever model that pages on everything.

If you manage 5 or more accounts and you have ever missed a real anomaly because the team was triaging false positives, the next step is to install a classifier that knows the difference. Run a free Buzz + Aegis audit on one of your accounts and see severity classification on real data. Read-only access, no changes made without your approval, takes 10 minutes.

Anomaly detection is not about the alert. It is about the gap between the moment a problem starts and the moment a human knows. Close the gap.