Insight Stream Insight Stream | Data Analytics & Visualisation Guide

Linear Regression

Fit a line through two variables to expose relationships and highlight influential outliers.

Fit a regression line to the predictor and response, then inspect the residuals to find observations the model fails to explain. Large residuals or high leverage points reveal influential outliers that can distort the slope and deserve follow-up.

Scatter with regression line and highlighted residual
Regression line through sales vs marketing spend with one influential point marked.

Image source: generate_visuals.py

Code examples

Building a quick regression

Python (scikit-learn)
import pandas as pd
from sklearn.linear_model import LinearRegression

# Simple spend versus sales dataset
data = pd.DataFrame(
    {
        'Spend': [10, 12, 13, 15, 30],
        'Sales': [22, 24, 25, 27, 45],
    }
)

# Fit the regression model
model = LinearRegression().fit(data[['Spend']], data['Sales'])

# Generate predictions and residuals for diagnostics
data['Predicted'] = model.predict(data[['Spend']])
data['Residual'] = data['Sales'] - data['Predicted']
print(data)
Excel
' Regression coefficients and predictions
=LINEST($B$2:$B$6,$A$2:$A$6,TRUE,TRUE)
=FORECAST.LINEAR(A2,$B$2:$B$6,$A$2:$A$6)

' Residuals to assess model fit
=Actual - Predicted
Power BI (DAX trend line)
// Slope of the regression line using covariance / variance
Reg Slope =
    VAR Numerator =
        SUMX(
            'Data',
            ('Data'[Spend] - AVERAGE('Data'[Spend]))
                * ('Data'[Sales] - AVERAGE('Data'[Sales]))
        )
    VAR Denominator =
        SUMX(
            'Data',
            ('Data'[Spend] - AVERAGE('Data'[Spend]))^2
        )
    RETURN
        DIVIDE(Numerator, Denominator)

// Intercept for the regression line
Reg Intercept =
    AVERAGE('Data'[Sales]) - [Reg Slope] * AVERAGE('Data'[Spend])

// Predicted sales using the regression components
Reg Predicted = [Reg Intercept] + [Reg Slope] * AVERAGE('Data'[Spend])

// Residual to highlight outliers from the line
Reg Residual = AVERAGE('Data'[Sales]) - [Reg Predicted]

Key takeaways

  • Regression helps you quantify relationships and spot influential points.
  • Always inspect residuals; they reveal whether the model fits evenly across the range.
  • Explain assumptions (linearity, no hidden factors) before presenting results.

Further reading