P Value Excel: T.TEST, Z.TEST and Data Analysis ToolPak Guide 2026 June
Calculate p value in Excel with T.TEST, Z.TEST, CHISQ.TEST and the Data Analysis ToolPak. One-tailed vs two-tailed, errors, examples.

You opened Excel, dropped your two sample columns into a sheet, and now you need a p value. Not a chart. Not a fancy regression. Just that one small number that decides whether your boss believes the new landing page actually beat the old one.
Here's the good news. Excel ships with everything you need — four built-in p-value functions, a free Data Analysis ToolPak with t-tests, z-tests and ANOVA, and enough conversion formulas to turn any z-score into a probability. You don't need SPSS. You don't need R. You don't even need Python, although it's nice to have around.
This guide walks through every practical way to calculate a p value Excel can produce, when to pick each one, and how to read the output without fooling yourself. We'll cover T.TEST, Z.TEST, CHISQ.TEST, F.TEST, the manual NORM.S.DIST route, and the ToolPak dialogs that wrap it all into a clickable workflow. By the end, you'll have a checklist for picking the right test and a working template you can paste into any spreadsheet.
One thing first. A p-value is a probability, not a verdict. It tells you how unlikely your data would be if the null hypothesis were true. Below 0.05 is the conventional cutoff for "statistically significant," but the cutoff isn't magic — it's a habit borrowed from Ronald Fisher in the 1920s. Treat it as one piece of evidence among several, alongside effect size and a confidence interval. We'll return to that point near the end.
Use =T.TEST(array1, array2, tails, type) for comparing two means. Set tails=2 for two-tailed and type=3 for Welch's unequal-variance t-test — the safe default when you're not sure variances match. Excel returns the p-value directly with no helper columns.
The four built-in Excel functions that return a p-value
Excel has four functions that hand you a p-value directly, with no helper columns and no Analysis ToolPak. Each one fits a specific kind of hypothesis test. Pick the wrong function and the number is meaningless — pick the right one and you're done in about ten seconds.
T.TEST — the workhorse for comparing two means
If you're comparing two sample means, T.TEST is almost always the right call. Syntax: =T.TEST(array1, array2, tails, type).
tails is either 1 (one-tailed) or 2 (two-tailed). type is 1 for paired samples, 2 for two-sample equal variance, or 3 for two-sample unequal variance — that last one is Welch's t-test, and it's the safest default when you're not sure the variances match.
Example. You ran an A/B test on two checkout flows. Conversion times for version A sit in A2:A41, version B in B2:B41. Drop =T.TEST(A2:A41, B2:B41, 2, 3) into any empty cell.
Excel returns a single number — say, 0.0271. That's your two-tailed p-value for Welch's t-test. Reject the null at α = 0.05, conclude the flows differ in time-to-checkout, write it up. Done. If you want a directional claim, switch tails to 1 — Excel halves the number for you.
Z.TEST — one-sample test against a known population mean
Use Z.TEST when you know (or strongly assume) the population standard deviation and you're testing one sample against a fixed value. Syntax: =Z.TEST(array, μ₀, [sigma]).
Sigma is optional. Leave it off and Excel uses the sample standard deviation, which technically makes it a one-sample t-test in disguise — fine for large samples but check before relying on it.
Example. A factory claims their bolts have a mean tensile strength of 500 MPa with a known population sigma of 12. You measured 30 bolts and pasted them into C2:C31. Test whether your sample mean differs: =Z.TEST(C2:C31, 500, 12).
The output is the upper-tail p-value. If your sample mean is below 500 you'll see a number above 0.5. For a clean two-tailed result use =2*MIN(Z.TEST(C2:C31,500,12), 1-Z.TEST(C2:C31,500,12)). Annoying quirk, but consistent across versions.
CHISQ.TEST — chi-square test for categorical data
When your data is counts in a contingency table — survey responses by gender, defect rates by shift, anything with frequencies — CHISQ.TEST is the function. Syntax: =CHISQ.TEST(actual_range, expected_range).
You need to compute the expected counts first using row totals × column totals ÷ grand total. Build the expected matrix in a parallel block, point CHISQ.TEST at both ranges, and Excel returns the p-value for the chi-square test of independence (or goodness-of-fit if you've only got one row).
Example. A 2×3 table of pass/fail counts across three training programs. Observed counts in A2:C3, expected counts in A6:C7. Formula: =CHISQ.TEST(A2:C3, A6:C7). Returns something like 0.0083 — strong evidence the programs differ in pass rate.
F.TEST — variance comparison
Before you run a two-sample t-test with the equal-variance assumption, it's polite to check that the variances actually are equal. F.TEST does that. Syntax: =F.TEST(array1, array2).
It returns the two-tailed p-value for the null that the two population variances are equal. If F.TEST comes back below 0.05, switch to Welch's (type=3) and stop worrying. If it's well above, you can use either type 2 or type 3 — Welch's is fine even when variances are equal, so it's the lazy-but-correct default.
P Value Excel at a Glance
The Data Analysis ToolPak — clickable t-tests, z-tests and ANOVA
Functions are fast, but they only return one number. The Excel Data Analysis ToolPak wraps each test in a dialog box and dumps a full report — test statistic, degrees of freedom, critical value, both tails of the p-value, and the mean and variance of each sample. For anything you'll show your boss, the ToolPak output is the cleaner choice.
Enabling it takes about ten seconds. File → Options → Add-ins → Manage Excel Add-ins → Go → tick "Analysis ToolPak" → OK. A new Data Analysis button appears at the right edge of the Data ribbon.
Click it and you get a scrollable list with eighteen analyses. The ones that produce p-values are the t-Test variants (equal variance, unequal variance, paired), the z-Test two sample, and the three flavors of ANOVA (single factor, two-factor with replication, two-factor without). Pick the one that matches your design, point it at the data, and read the resulting table.
Each dialog asks for the data range, an alpha (default 0.05), and an output location. Pick "New Worksheet Ply" if you want the report on its own tab. The "P(T<=t) one-tail" and "P(T<=t) two-tail" cells in the output are your p-values — Excel labels both because it doesn't know which one you care about. Match it to your hypothesis.
For normality checks before any of this, the ToolPak doesn't include Shapiro-Wilk. You'll need to roll your own using =NORM.S.INV() against ranked data and visual QQ plots, or just run a quick histogram and trust your eyes for n > 30. If you want a proper Shapiro-Wilk, that's a job for R or Python.
File → Options → Add-ins → Manage Excel Add-ins → Go → tick Analysis ToolPak → OK. A new Data Analysis button appears on the Data ribbon. Eighteen analyses inside, including three flavors of t-test, a z-test, and three ANOVA variants. Each one outputs a full report with both one-tailed and two-tailed p-values, the test statistic, the degrees of freedom, and a critical value comparison.
Converting a z-score to a p-value the manual way
Sometimes you've already calculated a z-statistic — maybe from a textbook problem, a published study, or a custom test the ToolPak doesn't cover. To turn that z into a p-value, use NORM.S.DIST: =NORM.S.DIST(z, TRUE).
The TRUE argument asks for the cumulative distribution — the probability of getting a value at or below z. For a one-tailed test on the upper side, subtract from 1: =1 - NORM.S.DIST(z, TRUE). For two tails, double the smaller tail: =2 * (1 - NORM.S.DIST(ABS(z), TRUE)).
Example. You computed z = 2.17 by hand. The two-tailed p-value is =2*(1-NORM.S.DIST(2.17,TRUE)), which gives roughly 0.0300. Significant at α = 0.05, not at α = 0.01.
The same trick works for t-statistics with T.DIST.2T(ABS(t), df) for two-tailed and T.DIST.RT(t, df) for upper-tail. For chi-square, use CHISQ.DIST.RT(chi2, df). For F, use F.DIST.RT(f, df1, df2). Memorize that .RT means "right tail" and you've covered ninety percent of the distribution functions Excel offers.
Picking the Right P-Value Function
Two-sample mean comparison.
- ▸Args: array1, array2, tails (1 or 2), type (1 paired, 2 equal var, 3 Welch's)
- ▸Returns p-value directly — no helper columns needed
- ▸Use type=3 as your default for independent samples
One-sample test against known population mean.
- ▸Returns upper-tail p only
- ▸Double the smaller tail for two-tailed result
- ▸Pass sigma to use known population SD instead of sample SD
Chi-square test for categorical data.
- ▸Needs observed and expected ranges of identical shape
- ▸Compute expected from row total times column total over grand total
- ▸Works for goodness-of-fit (1 row) and independence (2D table)
Variance equality check before t-tests.
- ▸Two-tailed p for null of equal variances
- ▸Below 0.05 means switch t-test to type=3 (Welch's)
- ▸Above 0.05 means either type=2 or type=3 works fine
One-tailed vs two-tailed — picking the right tails argument
This trips up more analysts than any other detail. A two-tailed test asks whether two values differ in either direction. A one-tailed test asks whether one value is specifically greater than (or less than) the other. The one-tailed p-value is half the two-tailed value, all else equal, which makes it more permissive — which is exactly why reviewers get suspicious.
Rule of thumb. If your hypothesis before looking at the data was "version B converts better," go one-tailed and accept the trade-off — you'll never be able to claim version A was better even if it crushes B. If your hypothesis was "they differ somehow," go two-tailed and stay flexible. Switching tails after seeing the data is p-hacking, and reviewers will catch it.
In Excel's T.TEST function, the tails argument controls this directly. Pass 1 for one-tailed, 2 for two-tailed. There's no separate function — same call, different argument.
Tail and Sample Design Cheatsheet
Use when your hypothesis predicts direction: B converts better than A, the new drug lowers blood pressure, this campaign increased sales. You can only claim significance in the predicted direction — a result in the other direction, no matter how extreme, gives you nothing. Halves the p-value compared to two-tailed, but locks in your prediction. Set tails=1 in T.TEST.
Interpreting the number — what counts as significant
The convention is p < 0.05 means "reject the null hypothesis" and call the result statistically significant. Below 0.01 is sometimes called "highly significant." Above 0.05 means you fail to reject the null — note the wording. You haven't proved the null is true, you just haven't ruled it out.
What 0.05 actually means. If the null hypothesis were exactly true and you repeated your experiment infinitely many times, you'd see data this extreme or more extreme 5% of the time by pure chance. That's it. It's not the probability your hypothesis is wrong, it's not the probability the effect is real, and it's not a measure of effect size. Just a long-run frequency.
For comparison studies where you also want to see how big the effect is, pair the p-value with a Cohen's d calculation or the coefficient of variation. Cohen's d in Excel is =(MEAN1 - MEAN2) / pooled_sd — there's no built-in, but it's a four-cell calculation. Anything above 0.8 is a large effect; 0.5 is medium; 0.2 is small. A p-value of 0.001 with d = 0.05 means a real but tiny effect — interesting statistically, useless practically.
A full worked example — split-test conversion times
Here's the kind of analysis you'll run dozens of times if your job touches product or marketing. Two checkout flows, forty users each, measured time to complete checkout in seconds.
Step one: import data. Paste version A into A2:A41, version B into B2:B41. Label headers in row 1.
Step two: check normality. Build a histogram from the Data > Data Analysis > Histogram dialog. With n=40 you don't need perfect normality — the central limit theorem handles it. Just check there's no obvious skew or bimodality.
Step three: pick the test. Two independent samples, continuous outcome, unknown population variances. That's a two-sample t-test, and Welch's variant (unequal variances) is safest unless you've checked otherwise. Type=3 in T.TEST.
Step four: run F.TEST first. =F.TEST(A2:A41, B2:B41) — if this is well above 0.05 you can defensibly use type=2, but type=3 always works.
Step five: calculate the p-value. =T.TEST(A2:A41, B2:B41, 2, 3). Say it returns 0.0142. Reject the null at α = 0.05, conclude the flows differ.
Step six: report effect size. Pooled standard deviation goes in helper cell D2. Cohen's d: =(AVERAGE(B2:B41)-AVERAGE(A2:A41)) / D2. If d = 0.65, you've got a medium effect — worth shipping.
Step seven: add a 95% confidence interval on the difference. =CONFIDENCE.T(0.05, D2, COUNT(A2:A41)) gives the half-width. Report the difference plus-minus that half-width and you've got a complete write-up.
P-Value Excel Calculation Checklist
- ✓State your null and alternative hypothesis before opening Excel
- ✓Decide one-tailed or two-tailed before looking at the data
- ✓Check sample size — n less than 30 means inspect normality more carefully
- ✓Run F.TEST to decide between t-test type 2 and type 3
- ✓Use Welch's (type=3) as the default when variances are uncertain
- ✓Calculate Cohen's d alongside the p-value for effect size
- ✓Report the test statistic, df, p-value, effect size, and 95% CI together
- ✓Apply Bonferroni correction if running multiple comparisons
Common errors and what they mean
Excel's error codes are unhelpful in isolation but follow a pattern. Three you'll hit constantly.
#NUM! — usually means your values are out of range for the function. T.TEST throws this if one array has zero variance (all identical values), if tails isn't 1 or 2, or if type isn't 1, 2, or 3. Check those first.
#VALUE! — mismatched array sizes for paired tests (type=1 requires equal length), text in a numeric range, or empty cells where Excel expects numbers. Use =COUNT(A2:A41) on each range to verify they match.
#DIV/0! — appears when standard deviation is zero (all values identical). The math literally divides by zero. Inspect your data.
If T.TEST returns exactly 1.0 or exactly 0.0, your data probably has identical means or wildly separated means with no overlap — sanity-check before reporting.
The Bonferroni correction — when you're running many tests
If you run twenty t-tests at α = 0.05, you'll expect one false positive just by chance. The Bonferroni correction is the simplest fix: divide alpha by the number of tests. Twenty tests → α = 0.0025 per test. Or equivalently, multiply each p-value by 20 and compare to 0.05.
In Excel: =MIN(p_value * n_tests, 1). The MIN caps it at 1 because corrected p-values above 1 are nonsense.
Bonferroni is conservative — it controls family-wise error rate aggressively, sometimes too aggressively. For exploratory work with many hypotheses, look at Benjamini-Hochberg (controls false discovery rate instead). The B-H formula is iterative and easier in R, but Excel can do it with a sorted column and an IF ladder if you're patient.
When Excel isn't enough
Excel's statistical functions stop at the basics. The moment you need mixed-effects models, logistic regression with proper diagnostics, survival analysis, or Bayesian anything, you've outgrown spreadsheets. R and Python (with scipy.stats or statsmodels) handle the same t-tests in one line and add hundreds of tests Excel doesn't cover.
SPSS, JMP, and Stata are the commercial alternatives — slicker dialogs, better defaults, expensive licenses. JASP is free, beautifully designed, and runs Bayesian and frequentist tests side by side. If you do statistics weekly, learn one of these.
Excel still wins for the everyday analyst job: drop two columns of numbers into a sheet, get a p-value, write the email, move on. That's a high-frequency, low-stakes task that Excel handles cleanly with the workflow above.
Dynamic-array formulas in Excel 365 make the whole thing nicer. =T.TEST still returns a scalar, but you can wrap it inside LET to define intermediate names, or use it inside FILTER(data, condition) to compute p-values on subsets without helper columns. The Excel functions list has the full set of new dynamic-array tools that pair nicely with the statistical functions when you're running the same test across thirty groups.
Reporting your results — what to include
A clean stat report has five pieces. First, what you tested: "Two-sample t-test comparing checkout time between flows A and B." Second, the test statistic: "t(78) = 2.51" — that's t-statistic and degrees of freedom, both visible in the ToolPak output.
Third, the p-value: "p = 0.014." Report to three decimal places; if Excel gives you 4.7e-08, write "p < 0.001." Fourth, the effect size: "Cohen's d = 0.65 (medium)." Fifth, the confidence interval on the effect: "Mean difference 4.2s, 95% CI [0.9, 7.5]."
Skip any of these and a careful reader will rightfully push back. The p-value alone tells them whether the effect is real — the rest tells them whether it matters.
Ready to test what you've learned? Try our practice quiz on Excel statistical functions — same kinds of problems you'd see on a finance or analyst interview.
Quick reference — which function for which test
Save this map somewhere accessible. Most of the time you'll glance at it, pick the right function, and move on. Two means with independent samples → T.TEST(a, b, tails, 3) for Welch's. Two means with paired samples → T.TEST(a, b, tails, 1).
One mean against a population value → Z.TEST(array, μ₀, sigma). Counts in a contingency table → CHISQ.TEST(observed, expected). Variance equality check → F.TEST(a, b). Three or more means → ToolPak ANOVA: Single Factor.
Converting a z-stat manually → 2*(1-NORM.S.DIST(ABS(z), TRUE)) for two-tailed. Converting a t-stat manually → T.DIST.2T(ABS(t), df). That's the whole map. Eight rules. Keep them handy and you'll handle ninety percent of the hypothesis tests that land on your desk without ever opening another tool.
Excel Questions and Answers
About the Author
Business Consultant & Professional Certification Advisor
Wharton School, University of PennsylvaniaKatherine Lee earned her MBA from the Wharton School at the University of Pennsylvania and holds CPA, PHR, and PMP certifications. With a background spanning corporate finance, human resources, and project management, she has coached professionals preparing for CPA, CMA, PHR/SPHR, PMP, and financial services licensing exams.