A/B Test Analysis & Statistical Significance
Use AI to analyze A/B test results, calculate significance, interpret p-values, and detect common mistakes. Practical prompts and SQL for marketing analysts.
The A/B Testing Problem Most Teams Have
Here's what usually happens: someone runs an A/B test, sees that variant B has a 12% higher conversion rate, declares victory, and ships it. Nobody checks if that result is statistically significant. Nobody asks about sample size. Nobody checks for segment-level effects. AI can be your statistics co-pilot — catching the mistakes humans make when they're excited about a result.
Statistical Significance Explained Simply
Statistical significance answers one question: 'Is this result real, or could it be random noise?' A p-value of 0.05 means there's a 5% chance you'd see this result even if there's no real difference. Most teams use a 95% confidence level (p < 0.05) as the bar. But there's a lot more nuance to getting this right.
-- A/B test results summary with confidence intervals
WITH test_results AS (
SELECT
variant,
COUNT(*) AS visitors,
SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END) AS conversions,
ROUND(AVG(CASE WHEN converted = 1 THEN 1.0 ELSE 0.0 END), 4) AS conversion_rate,
ROUND(AVG(revenue), 2) AS avg_revenue_per_visitor
FROM ab_test_events
WHERE test_id = 'pricing_page_v2'
AND timestamp >= '2026-03-01'
AND timestamp < '2026-04-01'
GROUP BY variant
)
SELECT
variant,
visitors,
conversions,
conversion_rate,
avg_revenue_per_visitor,
-- Standard error for proportion
ROUND(SQRT(conversion_rate * (1 - conversion_rate) / visitors), 4) AS std_error,
-- 95% confidence interval
ROUND(conversion_rate - 1.96 * SQRT(conversion_rate * (1 - conversion_rate) / visitors), 4) AS ci_lower,
ROUND(conversion_rate + 1.96 * SQRT(conversion_rate * (1 - conversion_rate) / visitors), 4) AS ci_upper
FROM test_results
ORDER BY variant;Get a complete A/B test analysis with significance testing and revenue impact
I ran an A/B test on our pricing page. Here are the results: - Control: 4,521 visitors, 312 conversions (6.9% conversion rate) - Variant: 4,487 visitors, 351 conversions (7.82% conversion rate) The test ran for 14 days. Tell me: (1) Is this result statistically significant at 95% confidence? Show the math. (2) What's the confidence interval for the lift? (3) Did we have enough sample size, or should we keep running? (4) What's the expected annual revenue impact if we ship the variant? (5) Are there any red flags I should check before declaring a winner?
The 5 Most Common A/B Testing Mistakes
- Peeking at results early and stopping when it looks good (this inflates false positives massively)
- Not calculating required sample size before starting (your test might need 3x longer than you think)
- Ignoring multiple comparison problems (testing 5 metrics without adjusting significance threshold)
- Not checking for novelty effect (results look great in week 1 and fade by week 4)
- Segment Simpson's Paradox — the variant wins overall but loses in every segment (usually means uneven traffic allocation)
Calculate sample size before you start — the most important step most teams skip
I need to calculate the required sample size for an A/B test. Our current conversion rate is 4.2%. We want to detect a minimum 15% relative improvement (to 4.83%). We want 95% confidence and 80% power. How many visitors per variant do we need? How many days will this take if we get 1,200 visitors per day? Also, what if we only want to detect a 10% improvement — how does that change the requirements?
Segment-Level Analysis
An overall test result hides crucial details. Maybe the variant crushes it for mobile users but tanks on desktop. Maybe enterprise customers love it but SMB customers bounce. AI is great at running these segment-level analyses quickly and flagging where results diverge.
Manual Workflow
With AI
Analyze your most recent A/B test results with AI
Here are the results of our most recent A/B test: [paste variant names, visitor counts, conversion counts, and revenue if available]. The test ran for [X] days. Please: (1) Test for statistical significance, (2) Calculate confidence intervals for the lift, (3) Check if we had adequate sample size, (4) Run segment analysis by device type and new vs. returning visitors if I provide that breakdown, (5) Give me a clear recommendation with caveats. Also flag any red flags you see in the data.
Get weekly job alerts
Curated marketing analytics roles — delivered every Monday.