How to Do Regression Analysis in Excel

Run regression in Excel three ways: Data Analysis ToolPak, scatter trendline, LINEST. Read R-squared, p-values, residuals like a pro.

Microsoft ExcelBy Katherine LeeMay 18, 202614 min read
How to Do Regression Analysis in Excel

Regression analysis in Excel turns a column of numbers into a real prediction engine. You can fit a line to two variables, or build a multivariate model with five predictors — Excel handles both without a single line of VBA. The trick is knowing which method to reach for and when.

You have three serious options. The Data Analysis ToolPak runs a full statistical regression and prints an ANOVA table, coefficients, standard errors, t-stats, p-values, R-squared, adjusted R-squared, and confidence intervals. A scatter plot with a trendline gives you a fast visual model — equation on the chart, R-squared in the corner, done in twenty seconds. And the formula trio (LINEST, SLOPE, INTERCEPT, RSQ) lets you bake regression coefficients directly into a live worksheet that updates the second your data changes.

Pick the ToolPak when you need diagnostics. Pick the chart when you need a picture. Pick formulas when the model needs to live inside a dashboard. Most analysts end up using all three on the same dataset — and that is the workflow this guide will walk you through.

Why does this matter? Because regression is the workhorse of business analytics. Sales versus marketing spend. Cycle time versus headcount. Customer churn versus support tickets. If two columns have a relationship, regression quantifies it — and Excel makes the math accessible without a stats degree.

Regression in Excel by the Numbers

3Built-in methods: ToolPak, scatter trendline, LINEST family
5Key output statistics: R-square, Adjusted R-square, F, t, p-value
0.05Standard significance threshold for p-values and Significance F
95%Default confidence level for coefficient intervals in ToolPak

Before you touch any tool, get your data in shape. Excel regression expects two things: a column of dependent values (the Y, the thing you are trying to predict) and one or more columns of independent values (the X variables, the predictors). Headers in row 1, numbers below, no merged cells, no blank rows breaking the range. If the data has gaps, fill them or filter them out — regression chokes on missing cells.

A small example. Suppose column A holds monthly ad spend and column B holds monthly revenue across 24 months. Your Y range is B2:B25. Your X range is A2:A25. That's it — your dataset is regression-ready. For multiple regression, line up two or more X columns side by side (say A through C) and Excel will fit them all in one shot.

Microsoft Excel - Microsoft Excel certification study resource

Before You Start

Excel regression needs clean, contiguous data. Headers in row 1, numbers below, no merged cells, no blank rows in the middle of a range. The dependent variable (Y) and independent variables (X) should sit in adjacent columns or at least share the same row range. Fix the data first — half of all regression problems are data problems, not model problems. Watch for sneaky text values masquerading as numbers (LEN test or =ISNUMBER on a sample cell catches them) and for date columns formatted as strings.

The Data Analysis ToolPak is Excel's built-in statistics add-in. It ships with every desktop install of Excel — Windows, Mac, Microsoft 365 — but it's switched off by default. You have to enable it once, then it lives under the Data tab forever.

To turn it on: File > Options > Add-ins. At the bottom of the dialog, set "Manage" to "Excel Add-ins" and click Go. Tick the box next to "Analysis ToolPak" and hit OK. Now look at the Data tab on the ribbon — a new "Data Analysis" button has appeared on the far right. Click it, scroll the list, pick "Regression," and the regression dialog opens.

The dialog asks for an Input Y Range, an Input X Range, and a few options. Confidence Level defaults to 95% (leave it). Tick "Labels" if your first row contains headers. Tick "Residuals" and "Residual Plots" — you will want them for diagnostics. Choose an output location (a new worksheet is cleanest) and hit OK. Excel spits out a multi-section report in about half a second.

ToolPak Output: Five Sections to Read in Order

Regression Statistics

Multiple R, R Square, Adjusted R Square, Standard Error, Observations. The quick health check — glance here first to see if the model has any explanatory power at all before reading any other section.

ANOVA Table

Significance F is the headline. Under 0.05 means the whole model explains real variance. Over 0.05 means your predictors are basically noise and you should rethink your variable choice.

Coefficient Table

Intercept and slope coefficients with standard errors, t-stats, and p-values for each predictor. This is where you build your prediction equation and decide which X variables to keep.

Confidence Intervals

Lower 95% and Upper 95% columns show the plausible range for each true coefficient. If the interval crosses zero, the predictor probably doesn't matter — same conclusion as p over 0.05.

Residual Output

Predicted Y, residuals, standardised residuals. Diagnostic gold for spotting outliers, heteroscedasticity, and the wrong functional form. Always tick Residual Plots in the dialog.

That ToolPak output has five sections, and reading it in order is the difference between insight and confusion. Top block: Regression Statistics. Multiple R is the correlation between your model's predictions and the actual values — closer to 1 is better. R Square is the percentage of variance in Y explained by X. Adjusted R Square is the same number penalised for adding useless variables — it's the honest version. Standard Error is the typical distance between predicted and actual values, in the same units as Y.

Middle block: ANOVA. The Significance F value tells you whether the entire model is statistically meaningful. Under 0.05 means yes, the model explains real variance. Over 0.05 means your predictors are basically noise — pack up and try different variables.

Bottom block: the coefficient table. Each row is one predictor (plus the intercept). The Coefficients column gives you the equation. The P-value column tells you which predictors actually contribute — under 0.05 means significant, over 0.05 means drop it. The 95% confidence interval shows the plausible range each coefficient lives inside.

Excellence Playa Mujeres - Microsoft Excel certification study resource

Three Ways to Run Regression in Excel

Enable Analysis ToolPak via File > Options > Add-ins. Set Manage to Excel Add-ins, click Go, tick Analysis ToolPak, OK. Click Data > Data Analysis > Regression. Set Input Y Range and Input X Range. Tick Labels, Residuals, and Residual Plots. Pick an output cell (a new worksheet is cleanest). Hit OK. Full statistical report appears in under a second — Regression Statistics on top, ANOVA in the middle, coefficient table with p-values and confidence intervals below.

Sometimes you don't need ANOVA. You just want a picture and an equation. The scatter chart trendline does exactly that — in about 15 seconds.

Select both columns of data (Y and X), insert a scatter chart (Insert > Charts > Scatter, first option), and right-click any data point. Pick "Add Trendline." A side panel opens. Choose Linear. Scroll down and tick "Display Equation on chart" and "Display R-squared value on chart." Excel overlays the regression line, prints y = mx + b at the top of the chart, and shows R² right next to it. That's a complete simple linear regression — visualised, fitted, labelled.

The trendline panel also offers Polynomial (for curved relationships), Logarithmic, Exponential, Power, and Moving Average. Polynomial of order 2 fits a parabola — useful when scatter clearly bends. Don't go above order 3 unless you have a real physical reason; high-order polynomials overfit fast.

Let's walk through a worked example so the workflow becomes concrete. Imagine you run a small SaaS company. Column A holds monthly Google Ads spend, ranging from $2,400 to $11,800 across the past 24 months. Column B holds monthly new-trial signups, ranging from 180 to 720. You suspect more spend means more signups — but you need numbers.

Step one. Open the ToolPak's Regression dialog, set Input Y Range to B2:B25 (signups) and Input X Range to A2:A25 (spend). Tick Labels because row 1 has headers. Tick Confidence Level 95%, Residuals, and Residual Plots. Send output to a new worksheet. Hit OK.

Step two. The output drops in. Adjusted R Square is 0.78 — spend explains 78% of the variation in signups after the model-complexity penalty. Significance F is 0.0001 — model is highly significant. The Spend coefficient is 0.054 with a p-value of 0.0001, meaning every extra dollar of ad spend buys 0.054 additional signups, or roughly one signup per $18.50. The intercept is 42 — baseline signups when spend is zero (probably organic traffic).

Step three. Build a prediction cell. =0.054 * D2 + 42 where D2 is a planned spend figure. Drop $8,000 into D2 and the cell shows 474 expected signups. That's regression as a forecasting tool — math validated by ToolPak diagnostics, deployed via a one-line formula.

Charts are great for explaining. Formulas are great for dashboards. When you need regression coefficients to update live as new data arrives, LINEST is the function you want.

LINEST returns an array. The basic syntax is =LINEST(known_y's, known_x's, const, stats). With stats set to TRUE, LINEST returns a 5-row by N-column block: slope coefficients on row 1, standard errors on row 2, R-squared and standard error of Y in row 3, F-statistic and degrees of freedom in row 4, regression sum of squares in row 5. In Excel 365 it spills automatically — type =LINEST(B2:B25, A2:A25, TRUE, TRUE) in one cell and the whole block appears. In older Excel you select the output range first and confirm with Ctrl+Shift+Enter.

For just the slope, =SLOPE(known_y's, known_x's). For just the intercept, =INTERCEPT(known_y's, known_x's). For R-squared alone, =RSQ(known_y's, known_x's). Three quick formulas, three numbers, one line each. The equation y = SLOPE × x + INTERCEPT then predicts any new value of Y from a new X — bake that into a prediction cell and you have a live forecasting model.

One frequently overlooked feature: the FORECAST family of functions. =FORECAST.LINEAR(new_x, known_y, known_x) returns the predicted Y for a new X using the same simple linear regression math as SLOPE and INTERCEPT — but in one tidy formula. Useful when you don't care about reading the equation, you just need the prediction.

=TREND(known_y, known_x, new_x) does the same job and can return multiple predictions at once if you pass an array of new X values. =GROWTH is the exponential counterpart — fits an exponential curve instead of a straight line. These are the quick-and-dirty siblings of LINEST: less detail, less friction, perfect for a forecast column in a sales tracker.

Excel Spreadsheet - Microsoft Excel certification study resource

Pre-Regression Data Checklist

  • Headers sit in row 1, numerical data starts in row 2, no merged cells anywhere
  • No blank rows or empty cells inside either Y or X ranges — fill or filter first
  • Y and X columns share the exact same row count, no off-by-one mismatches
  • Outliers identified with =STDEV-based filter and investigated, not silently kept
  • Data Analysis ToolPak enabled (Data tab shows Data Analysis button on the right)
  • Output destination chosen — new worksheet keeps the report tidy and reusable
  • Residuals and Residual Plots ticked in the dialog before hitting OK
  • For multiple regression, predictor columns are sitting side-by-side in contiguous columns
  • Confidence Level set to 95% (the default) unless your team prefers 90% or 99%

Multiple regression in Excel uses the same three approaches — only the X range expands. In the ToolPak dialog, drag your Input X Range across all predictor columns at once. In LINEST, pass a multi-column range as the second argument: =LINEST(B2:B25, C2:E25, TRUE, TRUE) fits Y against three predictors. The output array grows wider — one coefficient per X variable plus the intercept on the far right.

Read the multiple-regression output the same way as simple regression, with one extra step: check each predictor's p-value individually. A model can have a great overall F-statistic and still contain one predictor with p = 0.8 — that variable is dead weight, drop it and re-run. This is called backward elimination, and it's the standard cleanup pass.

What if the relationship isn't linear? You'll spot it two ways. First, the scatter chart visibly curves. Second, the residual plot from the ToolPak fans, dips, or arcs instead of looking like a random cloud. Both are red flags.

You have several fixes inside Excel. Try a polynomial trendline first — right-click the chart line, change Linear to Polynomial, order 2 or 3. Compare R² before and after. If polynomial fits better and residuals smooth out, you have your shape. Alternatively, transform the data: add a column =LN(A2) and regress Y against the log of X. Logarithmic relationships are common in biology, finance, and decay curves. Or square the X column and regress Y against X-squared for quadratic shapes like fuel-consumption-versus-speed.

For seriously non-linear data, Excel's Solver add-in fits any function you can write — exponential decay, Michaelis-Menten kinetics, logistic growth curves. It's the same enable-via-Add-ins routine as the ToolPak. Solver is overkill for most business datasets, but it's good to know it exists.

ToolPak vs LINEST vs Trendline — Pros and Cons

Pros
  • +
  • +
  • +
  • +
  • +
Cons

Diagnostics matter. R-squared of 0.95 looks great until you plot residuals and find a curve — meaning your linear model is fitting a non-linear relationship and the residuals are systematically wrong at the ends. The ToolPak's "Residual Plots" tick gives you a residual-vs-X chart for each predictor; if those clouds are random, you're fine. If they fan out or curve, the model needs a transformation (log Y, square X, add a polynomial term).

Check for outliers too. One bad data point can drag a regression line halfway across the chart. Sort residuals by absolute value — the top three are your suspects. Investigate them before trusting the model.

Reporting your results matters as much as running them. When you present a regression to a non-technical audience, ditch the ANOVA jargon. Lead with the equation in plain English: "Every extra $1,000 of ad spend produces about 54 new trial signups, plus a baseline of 42 organic signups per month." Follow with the confidence: "The model explains 78% of the month-to-month variation, and the relationship is statistically significant." Show the scatter plot with the trendline overlaid — that single image communicates more than the entire ToolPak output table.

Stick a one-line caveat at the end: "Forecasts assume the relationship continues to hold and apply only inside the historical spend range of $2,400 to $11,800." That single sentence saves you from being quoted on a wildly wrong $50,000-spend prediction six months later.

A common mistake: trusting R-squared too much. R² always goes up when you add predictors, even if those predictors are pure noise. That's why Adjusted R-squared exists — it penalises model complexity. Always quote adjusted R² when comparing models with different numbers of variables.

Another trap: extrapolation. A regression fitted on ad-spend between $1,000 and $10,000 cannot reliably predict revenue at $100,000. Stay inside the X-range of the training data. If you must extrapolate, flag the uncertainty.

A final word on time-series data. If your two columns are dates and values, ordinary regression treats time like any other predictor. That works for trend detection — slope is the per-day change — but it ignores autocorrelation, seasonality, and structural breaks. For real time-series forecasting, Excel offers the Forecast Sheet feature (Data tab > Forecast Sheet) which fits an exponential smoothing model with confidence intervals and seasonality auto-detected. Use regression for "is there a trend" and Forecast Sheet for "what's next month's number."

Want to double-check your Excel skills before tackling a regression project at work? Practice questions on the ToolPak, LINEST, and chart trendlines tend to show up on Excel certifications and pre-employment tests — and the muscle memory carries over directly to live spreadsheets. The same shortcut for opening Add-ins, the same dialog flow for the Regression option, the same syntax for SLOPE and INTERCEPT. Run a few practice quizzes and the real workflow starts to feel automatic.

Wrapping it all together: ToolPak for the deep dive, scatter trendline for the visual, LINEST for the dashboard. Most analyses use the ToolPak first to validate that a relationship exists and that p-values are acceptable, then bake the coefficients into formulas for ongoing use, then chart the result for stakeholders. Three tools, one workflow, every business question that involves two columns of numbers.

One last reminder. Regression doesn't prove causation — it measures association. If ice-cream sales correlate with sunburn cases, regression will give you a clean slope and a happy p-value, but the cause is the sun, not the ice cream. Always pair the math with domain knowledge before drawing conclusions.

Excel Questions and Answers

About the Author

Katherine LeeMBA, CPA, PHR, PMP

Business Consultant & Professional Certification Advisor

Wharton School, University of Pennsylvania

Katherine Lee earned her MBA from the Wharton School at the University of Pennsylvania and holds CPA, PHR, and PMP certifications. With a background spanning corporate finance, human resources, and project management, she has coached professionals preparing for CPA, CMA, PHR/SPHR, PMP, and financial services licensing exams.