⏳

Google Ads · Measurement

Incrementality Testing in Google Ads: What It Measures, How to Run One, and Why Smart Bidding Needs It

Q: Can incrementality testing prove Performance Max is working?

It can prove whether Performance Max produces incremental conversions at the campaign level, but because PMax bundles Search, Shopping, Display, and YouTube, the result is a blended number. To isolate components, layer in either a PMax-versus-no-PMax holdout or a channel-level diagnostic alongside the lift test.

Smart Bidding optimizes against last-touch conversions. Lift testing measures causal conversions. The two numbers can disagree by 30 to 50 percent. This is how to run the test that closes the gap.

By B6 TeamKampaioMay 18, 2026 · 13 min read

TL;DR: What Incrementality Testing Actually Proves

Incrementality testing in Google Ads is a randomized controlled experiment that measures the causal lift of your ads. One matched group is exposed to your campaign (treatment), another sees no impression at all (control). The difference in conversions between the two groups is incremental lift: the conversions your ads actually caused, not the ones that would have happened anyway.

That last clause is where most ad accounts lose money. Smart Bidding optimizes against last-touch conversions. Lift testing measures causal conversions. The two numbers can disagree by 30 to 50 percent. If you have never run a lift test on your account, your Target ROAS is almost certainly mis-calibrated, and the algorithm is happily scaling spend on traffic that would have converted on its own.

Google ships two native ways to run one: user-level Conversion Lift and geo-based incrementality experiments (rebuilt in November 2025 with a $5,000 minimum spend, down from roughly $100,000). You can also run a self-managed geo holdout outside the platform when you want full control. This article covers when to pick which, how to design one that produces an actually usable answer, and how to feed the result back into Smart Bidding.

What Incrementality Testing Measures (and What It Doesn't)

The thing being measured is causal lift on a defined conversion event over a defined window for a specific exposure. Nothing more. A lift test will not tell you whether your creative is better than your competitor's, will not separate the brand-halo effect from the direct-response effect unless you designed it to, and will not magically reconcile your MMM with your platform attribution. It answers one question cleanly: if these specific impressions had not happened, how many of these specific conversions would still have occurred.

The Google Ads implementation of Conversion Lift uses an intent-to-treat design. The control group does not just see different ads. The control group sees nothing from your campaign, served by the ghost-ad mechanism: Google runs the auction, your bid wins or loses normally, and for control users the impression is simply withheld and logged as a ghost. That preserves the auction dynamics that would have existed and gives you a clean treatment-versus-control comparison.

Three things people often confuse with incrementality and shouldn't:

Attribution tracks which touchpoints correlate with conversions. It says nothing about causation. A lift test of 0 percent on a campaign that gets 100 percent of last-click credit is a real finding.
MMM (marketing mix modeling) estimates channel-level effects across the full mix using regression on aggregate data. It is complementary to lift testing, not a substitute. Google's official measurement framework now positions MMM, incrementality, and attribution as three separate tools that calibrate each other.
A/B testing of creative measures which ad variant performs better given that ads are running. Lift testing measures whether the ads should run at all.

The visual is intuitive: treatment and control trend together until the ads start working. The gap between the curves over the test window is the lift. The dashed control line is what would have happened with no ads. Reading the gap before the conversion window closes is the single most common analytical mistake.

The Two Native Google Ads Lift Products

Google offers two lift products inside the Ads UI and they answer different questions.

Conversion Lift (user-level). Conversion Lift "isn't available for all Google Ads accounts. To use Conversion Lift, contact your Google account representative," per the official help center. When you do get access, the experiment randomizes at the user level using the ghost-ad mechanism described above. Reports return Incremental Conversions, Relative Conversion Lift, Incremental Conversion Value, Incremental Cost Per Action, and Incremental Return on Ad Spend for studies with conversion values. The honest constraint: Conversion Lift requires meaningful user-level data, and post-cookie environments have made user-level studies harder to qualify for. Many mid-market advertisers will not be approved.

Geo-based experiments / incrementality experiments. This is the path Google rebuilt in November 2025. The minimum spend dropped from approximately $100,000 per experiment to $5,000, "up to 50% more conclusive" results, and a redesigned interface with custom test-size controls and configurable confidence levels. Geo experiments work by holding out entire DMAs or regions: ads run as usual in treatment markets, are paused in control markets, and the difference in the conversion rate between matched geos is the lift. The reports return Incremental ROAS, Incremental Conversions, Incremental Conversion Value, and Incremental Cost.

Method	Access	Min spend	Unit	Best for	Limitation
Conversion Lift (user-level)	On request via Google account rep	Not publicly stated (account-scale gated)	Users (ghost-ad mechanism)	Large accounts, well-tracked user-level conversions	Most mid-market accounts not eligible
Geo experiments (Google native)	Self-serve in Ads UI (Nov 2025 update)	$5,000 per experiment	DMAs / geos	Omnichannel businesses, mid-market, offline sales	Requires meaningful geo separation
Self-run geo holdout	Always available	No platform minimum, but need ~1K conversions/arm	Hand-matched DMAs or synthetic control	Custom hypotheses, multi-channel tests	Requires analyst time + matching effort

Three ways to run a lift test on Google Ads. Conversion Lift is the gold standard when you qualify. Geo experiments are now within reach for mid-market accounts after the November 2025 update. Self-run geo holdouts give you the most analytical control but require the most analyst time.

When to pick which. User-level Conversion Lift gives you more precise answers when your account scale qualifies and your conversions are well-tracked at the user level. Geo-experiments work better for omnichannel businesses (offline sales, app installs, considered purchases) and for mid-market accounts that cannot get Conversion Lift access. Industry survey data Google cited with the November 2025 update: "80% of senior US marketing analytics professionals report incrementality experiment insights significantly impact revenue growth." That number maps to the audience this article is written for.

How to Design a Lift Test That Actually Answers Your Question

Six steps. Skip any of them and you will produce a number that looks like an answer but is not.

The 6-step design loop. Steps 1, 3, and 6 are where most self-run tests fail. The wait between step 5 and step H is the single biggest discipline test for the analyst.

Write the hypothesis as a falsifiable statement. "Does Performance Max work" is not a hypothesis, it is a debate. "Performance Max drives at least 10 percent incremental conversions over a no-ads baseline at the current tROAS target" is a hypothesis. The verb is "drives," the threshold is "10 percent," the comparator is "no-ads baseline," the window is implied. Write the rejection criterion before the test runs, not after.
Pick the treatment unit. Users for Conversion Lift. Geos for everything else. If you pick geos, your unit of analysis must be DMA-level (or finer), and your statistical power calculation must use the geo as the observation, not the user. This is where most self-run tests die.
Match the control. Google handles matching for native Conversion Lift. For a self-run geo holdout, match DMAs on pre-period revenue covariance: take the 12 weeks before the test, compute weekly revenue per DMA, and pair markets so the treatment and control sets have correlated baselines. Synthetic-control methods (weighted combinations of multiple control markets to approximate one treatment market) outperform simple one-to-one matching for accounts under 25 markets.
Size the test. A useful rule of thumb: you need at least 1,000 conversions in each arm to detect a 10 percent minimum detectable effect at 80 percent statistical power. Smaller accounts (under 200 conversions per week) usually cannot detect lift below 15-20 percent even with a 4-week test. Two options: extend to 6-8 weeks, or accept a wider confidence interval and report the result as directional rather than significant.
Choose your conversion window. For e-commerce with short lookbacks (1-7 days), a 4-week test is fine. For considered purchases with 14-30 day lookbacks, the test needs to run at least 6 weeks and the analysis window must close before you read the result.
Pre-register the analysis. Decide your significance threshold (usually 90 percent confidence for ad operations decisions, 95 percent for budget reallocations over $100K) and your analysis method before the test runs. If you decide after seeing the data, you are post-hoc fitting and the result is not trustworthy.

🦉Sage· Research

On a $40K/mo retail account last quarter, the team wanted to kill a YouTube campaign that looked dead on last-click attribution. We ran a 6-week geo lift across 12 DMAs. YouTube delivered 14 percent incremental conversions. Brand search, which looked like a winner on attribution, delivered 6 percent. The budget reallocation paid for the test in 11 days.

What Real Lift Numbers Look Like in Google Ads

The honest range, synthesized from Haus, Fusepoint, and what we have observed on B6 accounts:

Brand search

Typical lift: 5-15% (known DTC) / 20-40% (unknown brands)

Often 30-60% of branded conversions would have happened organically

Performance Max

Typical lift: 8-18% net incremental (bundle hides internal range)

YouTube portion under-credited by attribution

Non-brand search

Typical lift: 25-50% incremental in most accounts

Most reliable performer once brand cannibalization is removed

Display / retargeting

Typical lift: Single-digit or sometimes negative

A real finding, not a tracking artifact: redirect that budget

YouTube

Typical lift: Weak attribution, strong incrementality on new customers

Classic pattern: low last-click ROAS, double-digit incremental lift

A lift result of zero (or negative) is a valid finding, not a failure of the experiment. "This campaign produces no measurable incremental revenue" is genuinely useful: redirect that budget. The Haus quote that captures this best: "Turning off a campaign would only decrease total sales by 30 percent of what Google attributes to it." Sit with that number for a second. For the related diagnostic when ROAS suddenly shifts after a budget move, see our ROAS dropped suddenly walkthrough. For the broader Performance Max diagnostic when lift comes back weak, see Performance Max not converting.

Common Pitfalls That Invalidate Lift Tests

The list of ways a self-run lift test goes wrong is long enough that we run through it every time a team designs one.

Contamination. A user sees ads on one device and is in the control group on another. Cookie loss reassigns users mid-test. Treatment and control geos share a commuter zone (the classic Manhattan-Newark problem). Each contaminates the result, usually toward underestimating lift.

Underpowered tests. Two weeks, 200 conversions per arm, 30 percent reported lift, no significance. The test ran, the number exists, the number is meaningless. We see senior teams ship recommendations off underpowered tests more often than we should.

Reading early. Considered purchases convert on day 22 of a 14-day lookback. If you read the lift result before the lookback closes, you are reading half the story. The conversion window must close before analysis starts.

Seasonality contamination. Comparing a December treatment period against a November pre-period without adjustment will produce 40-percent "lift" that is just Q4 demand. Always include seasonal controls or run during a stable window.

Letting Smart Bidding re-optimize mid-test. If you change tROAS, tCPA, budget, or audience signals during the test, the treatment is no longer stable. Either freeze the campaign settings or accept that the lift you measured is for the average of two different treatments.

Running lift with no challenger. A lift test on a brand campaign with no holdout geo and no creative variant measures nothing. We have seen this pitched as "we are running an incrementality test" three times this year. It was always a non-experiment.

How to Feed Lift Results Back Into Smart Bidding

This is the section most lift articles skip. The output of a lift test is not a slide for the QBR. It is a multiplier you apply to your bidding inputs.

The mechanic is simple. Smart Bidding optimizes against the conversions you send it. If your lift study shows 60 percent of last-click conversions are incremental, then for bidding purposes the conversion stream is overstated by 40 percent. Multiply your conversion value feed (or your conversion count, if you bid on Target CPA) by the incrementality factor (0.6 in this example) before sending it to the bidding algorithm. The result: Smart Bidding starts targeting causal conversions instead of correlated ones.

🦊Vox· Strategy

We ran lift on a $180K/mo home goods account in February. Brand search returned 42 percent incremental, Performance Max returned 11 percent, Display retargeting returned negative 3 percent. We re-weighted the conversion feed, dropped Display retargeting, moved $14K/mo into upper-funnel YouTube, and held tROAS targets steady on brand and PMax. Revenue grew 8 percent in 60 days on the same total spend.

The operational rules we use on B6 accounts:

Re-run the lift test quarterly. Lift drifts as the market, the creative, and the audience shift.
Apply the multiplier at the conversion-action level, not the account level. Different conversion actions have different incrementality factors.
Keep value-based bidding turned on. The whole point is to feed the algorithm a causally-honest signal.
If lift comes back as zero or negative on a campaign, pause it, redirect the budget, and re-test the campaign you redirected into.

This loop is the practical version of Google's recommendation to "combine incrementality testing with AI solutions" in the Think with Google framework. The frame matters. AI bidding is not the problem. The problem is feeding AI bidding a non-causal conversion signal and acting surprised when the algorithm optimizes against the wrong thing. See our AI-powered PPC optimization strategies for the full feedback-loop pattern, and the Quality Score guide for the related Smart Bidding signal-quality discussion.

🐝Buzz· Bidding

After Vox finished the re-weighting, I retuned tROAS from 4.2 to 3.6 on brand search to absorb the lower causal value and from 2.8 to 3.1 on PMax to push more spend at the campaign with real incremental contribution. CPA held inside 8 percent of the prior 30 days. The bidding algorithm did the work once the signal was honest.

FAQ

What's the difference between incrementality testing and A/B testing? A/B testing compares two versions of a treatment (ad A versus ad B) and tells you which is better. Incrementality testing compares treatment against no-treatment and tells you whether the campaign should run at all. They answer different questions and are not interchangeable.

Does Google Ads have built-in incrementality testing? Yes, two tools. User-level Conversion Lift, available on request via your Google account representative. Geo-based incrementality experiments, rebuilt in November 2025 with a $5,000 minimum spend. Both run inside the Ads UI.

What's the minimum spend to run a lift test? Google's geo-based experiments require $5,000 per experiment as of late 2025. A self-run geo holdout outside the platform has no minimum, but realistically needs at least 1,000 conversions per arm to detect a 10 percent effect.

How long should a lift test run? Four weeks minimum for short-lookback e-commerce. Six to eight weeks for considered purchases or any account under 200 conversions per week. Always wait for the conversion window to close before reading the result.

Can incrementality testing prove Performance Max is working? It can prove whether PMax produces incremental conversions at the campaign level, but because PMax bundles Search, Shopping, Display, and YouTube, the result is a blended number. To isolate components, you need to layer in either a PMax-versus-no-PMax holdout or a channel-level diagnostic. For broader PMax diagnostics, see our Performance Max not converting playbook. For sudden ROAS shifts, see ROAS dropped suddenly. For the deeper Smart Bidding context, see the RSA best practices guide.

Stop Optimizing Against Conversions That Would Have Happened Anyway

Smart Bidding will gladly scale a campaign that adds zero incremental revenue, because Smart Bidding cannot tell the difference between a conversion it caused and a conversion that would have happened in its absence. Lift testing is the only empirical bridge. Without it, you are tuning a multi-million-dollar bidding algorithm on a signal you have never validated.

The B6 stack treats this as the core measurement loop. Sage designs the lift test (treatment unit, control match, sample size, conversion window), runs the analysis when the window closes, and reports the causal numbers with confidence intervals. Vox translates the result into a budget reallocation proposal, telling you which campaigns deserve more spend, which deserve less, and where to redirect the cuts. Buzz retunes tROAS, tCPA, and the conversion value feed so Smart Bidding is targeting causal value. The whole loop runs quarterly, with no QBR deck required.

The cost structure: B6 at $199 a month on the Approval tier, versus Optmyzr at roughly $499 a month for recommendation-only insights, versus the typical agency that will quote $4-8K for one custom incrementality study. Connect your account at /chat and Sage will design a default geo-lift across your top three campaigns inside 5 minutes. Pricing tiers and what each agent does in each tier are at /pricing.

Keep reading

Google Ads

Performance Max Optimization: The 7-Lever Playbook (In the Right Order)

Google Ads

Landing Page Optimization for Google Ads (Quality Score Fix)

Google Ads

Why Is My Google Ads Account Suspended? Causes, Fixes, and How to Appeal