CORREL Function: Calculate Correlation Coefficient Between Data Sets

Home » pub » Tutorials » CORREL Function: Calculate Correlation Coefficient Between Data Sets

Two columns of numbers — do they move together? The CORREL function answers with a single number between −1 and +1. A result of +0.967 means a very strong positive relationship: when one variable rises, the other rises proportionally. A result of −0.841 means a strong inverse relationship. A result near zero means little or no linear association exists. This guide covers eight practical examples: basic correlation between two variables, building a 4×4 correlation matrix with conditional formatting, testing statistical significance using the t-test and TDIST, filtering to a specific category with conditional CORREL, rolling correlation to track how the relationship changes over time, finding the strongest pairs across a matrix using LARGE and SMALL, calculating Spearman rank correlation for non-normal or outlier-heavy data, and building a self-updating dashboard that outputs a plain-English label like “r = 0.967 (Very strong positive).” The guide also covers the most important limitation: a high CORREL result does not mean one variable causes the other.

Two columns of numbers. Do they move together? When one rises, does the other tend to rise as well — or fall? The CORREL function answers this question with a single number between −1 and +1. That number is the Pearson correlation coefficient — the most widely used measure of linear association between two datasets. A coefficient of +0.9 means the two variables are strongly positively related. A coefficient of −0.85 means they move strongly in opposite directions. A coefficient near zero means there is little or no linear relationship.

This guide covers the full CORREL syntax, the interpretation scale, and eight practical examples. You will learn how to calculate basic correlation, build a correlation matrix across multiple variables, test whether a correlation is statistically significant, and identify the strongest relationships in a dataset using LARGE and SMALL. Each example also addresses a common pitfall — specifically, the difference between correlation and causation.

Availability: CORREL is available in all Excel versions, including Microsoft 365, Excel 2024, 2021, 2019, 2016, and earlier. It also works in Google Sheets. CORREL calculates the Pearson product-moment correlation coefficient. For the Spearman rank correlation, which is more robust to outliers, use RANK first then apply CORREL to the ranked values.

What Is the Correlation Coefficient?

The Pearson correlation coefficient measures the strength and direction of the linear relationship between two variables. It is calculated by dividing the covariance of the two variables by the product of their standard deviations. The result always falls between −1 and +1. A value of +1 means a perfect positive linear relationship — when one variable increases by any amount, the other increases proportionally. A value of −1 means a perfect negative relationship. A value of 0 means no linear relationship exists.

Importantly, correlation measures only linear association. Two variables can have a strong non-linear relationship — for example, a U-shaped curve — and still produce a correlation near zero. Additionally, correlation does not imply causation. Two variables can be highly correlated because they share a common cause, because one is a time index, or simply by coincidence. Always combine the correlation coefficient with domain knowledge and visual inspection.

How Do You Interpret the Correlation Coefficient?

The scale from −1 to +1 is divided into meaningful bands by convention. These thresholds are guidelines, not strict rules. Social science research uses different cutoffs than engineering or finance. Furthermore, sample size matters — a correlation of 0.30 from 200 observations is more reliable than the same value from 10 observations.

Range	Strength	Direction	Typical interpretation
+0.90 to +1.00	Very strong	Positive	Variables move almost in lockstep. One reliably predicts the other.
+0.70 to +0.89	Strong	Positive	Clear positive relationship. Useful for prediction with caution.
+0.40 to +0.69	Moderate	Positive	Noticeable positive trend. Other factors also at play.
+0.10 to +0.39	Weak	Positive	Slight positive tendency. Unreliable for prediction.
−0.10 to +0.10	Negligible	None	No meaningful linear relationship detectable.
−0.39 to −0.11	Weak	Negative	Slight negative tendency.
−0.69 to −0.40	Moderate	Negative	Noticeable inverse relationship.
−0.89 to −0.70	Strong	Negative	Clear inverse relationship.
−1.00 to −0.90	Very strong	Negative	Variables move almost perfectly in opposite directions.

What Is the CORREL Syntax?

=CORREL(array1, array2)

Argument	Required?	What it does
array1	Required	The first dataset — a range, column, or array of numeric values. Text, logical values, and empty cells are ignored. Must have the same number of data points as array2.
array2	Required	The second dataset. Must be the same length as array1. CORREL returns #N/A if the lengths differ, #DIV/0! if either array is empty or has zero variance.

CORREL vs PEARSON: CORREL and PEARSON produce identical results — they calculate the same Pearson correlation coefficient. CORREL is more commonly used and more memorable. Use whichever you prefer. Both ignore text, logical values (TRUE/FALSE), and empty cells in the same way. Neither requires the data to be sorted.

Examples 1–4: Core Correlation Patterns

Basic correlation — relationship between two variables

The simplest use is passing two column ranges to CORREL. The function returns a single decimal between −1 and +1 describing the linear relationship. The result is symmetric — swapping the two arguments always produces the same value. Below, three variable pairs illustrate strong positive, weak, and strong negative correlations from the same dataset.

Variable pair

CORREL result

Strength

Direction

Ad spend vs Revenue

0.967

Very strong

Positive

Temperature vs Sales

0.312

Weak

Positive

Price vs Units sold

−0.841

Strong

Negative

A2:A20 = ad spend. B2:B20 = revenue. C2:C20 = temperature. D2:D20 = price. E2:E20 = units sold. Basic correlation — ad spend vs revenue: =CORREL(A2:A20, B2:B20) → 0.967 (very strong positive: more spend associates with higher revenue) Symmetric: swapping the arguments gives the same result: =CORREL(B2:B20, A2:A20) → 0.967 (identical — order does not matter) Negative correlation — price vs units sold: =CORREL(D2:D20, E2:E20) → −0.841 (strong negative: higher price associates with fewer units sold) R² from correlation — square the correlation to get explained variance: CORREL² = R², the proportion of variance shared between the two variables. =CORREL(A2:A20, B2:B20)^2 → 0.935 (ad spend explains 93.5% of revenue variation)

Correlation matrix — all pairwise relationships across multiple variables

When you have four or five variables, a correlation matrix shows all pairwise relationships simultaneously. Each cell contains the correlation between the row variable and the column variable. The diagonal is always 1.0 (a variable is perfectly correlated with itself). The matrix is symmetric — the upper-right triangle mirrors the lower-left. Building the matrix with CORREL and absolute references allows the formula to be entered once and copied across the entire grid.

Data columns: A = Revenue, B = Ad spend, C = Headcount, D = Price. A2:A30, B2:B30, C2:C30, D2:D30. Build the matrix in F2:I5. Row headers in E2:E5 = {"Revenue","Ad spend","Headcount","Price"}. Column headers in F1:I1 = {"Revenue","Ad spend","Headcount","Price"}. F2 formula (Revenue vs Revenue) — always 1 on the diagonal: =CORREL($A$2:$A$30, $A$2:$A$30) → 1.000 G2 formula (Revenue vs Ad spend): Use CHOOSE to map column position to the data column. This avoids hard-coding the column reference in each cell. Alternatively, use a direct reference pattern: Row variable fixed with $, column variable fixed with $. =CORREL($A$2:$A$30, $B$2:$B$30) → 0.967 Conditional formatting for the matrix: 1. Select the matrix range F2:I5. 2. Apply two rules: Green fill: cell value >= 0.7 (strong positive correlation). Red fill: cell value <= -0.7 (strong negative correlation). 3. The diagonal (1.0 cells) will also turn green — exclude those if needed.

Statistical significance — is the correlation real or due to chance?

A correlation coefficient is a sample estimate. Even a perfectly random dataset occasionally produces non-zero correlations purely by chance. The t-test converts the correlation and sample size into a t-statistic, which is then converted to a p-value. A p-value below 0.05 means the correlation is statistically significant — unlikely to be a chance result at the 5% significance level.

r = CORREL result stored in D2. n = sample size stored in D3 (= COUNT of either array). t-statistic for the correlation: t = r × SQRT(n-2) / SQRT(1 - r²) df = n - 2 (degrees of freedom) D2 * SQRT(D3-2) / SQRT(1 - D2^2) → e.g. 14.2 for r = 0.967, n = 19 P-value — two-tailed test (standard for correlation significance): TDIST(|t|, df, 2) returns the two-tailed p-value. =TDIST( ABS(D2 * SQRT(D3-2) / SQRT(1-D2^2)), D3-2, 2 ) → 0.0000000003 (p ≪ 0.05: correlation is highly significant) Minimum sample size needed for r = 0.5 to be significant at p < 0.05: This is found by solving for n. Practically, n ≥ 16 for r = 0.5. For r = 0.3, you need approximately n ≥ 44 for significance. Significance label for a dashboard: =IF( TDIST(ABS(D2*SQRT(D3-2)/SQRT(1-D2^2)), D3-2, 2) < 0.05, "Significant (p < 0.05)", "Not significant" ) → "Significant (p < 0.05)"

Conditional correlation — correlation within a filtered subset

CORREL does not filter natively. To calculate the correlation between two variables only for rows that meet a condition — for example, only for a specific region or product category — use CORREL wrapped in IF as an array formula. This is the same approach used for conditional standard deviation and variance. In Excel 365, the formula enters normally; in Excel 2019 and earlier, press Ctrl+Shift+Enter.

A2:A100 = x variable (e.g. ad spend). B2:B100 = y variable (e.g. revenue). C2:C100 = category labels (e.g. region names). Correlation for "North" region only — Excel 365 (no array entry needed): =CORREL( IF(C2:C100="North", A2:A100), IF(C2:C100="North", B2:B100) ) → Correlation of ad spend and revenue for North region only Excel 2019 and earlier — select cell, type formula, Ctrl+Shift+Enter: Curly braces { } appear to confirm array entry. {=CORREL(IF(C2:C100="North",A2:A100),IF(C2:C100="North",B2:B100))} Dynamic category — reference a cell instead of hardcoding the label: D2 = the category to analyse. Change D2 to switch regions. =CORREL( IF(C2:C100=$D$2, A2:A100), IF(C2:C100=$D$2, B2:B100) ) Two-condition filter — region AND product: Both conditions must be true (multiply the IF results). =CORREL( IF((C2:C100="North")*(D2:D100="Widget"), A2:A100), IF((C2:C100="North")*(D2:D100="Widget"), B2:B100) )

Examples 5–8: Applied Correlation Analysis

Rolling Windows, Ranking, and Spearman Correlation

Rolling correlation — tracking how the relationship changes over time

A single correlation coefficient describes the average relationship across the entire history. However, the relationship between two variables often changes over time. A rolling correlation — calculated over a fixed window of recent observations — shows when the relationship strengthens, weakens, or reverses. This is particularly useful in finance, where the correlation between assets changes with market conditions.

A2:A100 = time-ordered x variable. B2:B100 = y variable. Rolling 12-period correlation — starts at row 13 (needs 12 rows of history). In C13, enter and copy down to C100: OFFSET creates a 12-row window ending at the current row. =CORREL( OFFSET(A2, ROW()-14, 0, 12, 1), OFFSET(B2, ROW()-14, 0, 12, 1) ) → Rolling 12-period correlation for each row Simpler alternative — INDIRECT with row numbers: Builds the range address as a string. Easier to adjust the window size. =CORREL( INDIRECT("A"&(ROW()-12)&":A"&(ROW()-1)), INDIRECT("B"&(ROW()-12)&":B"&(ROW()-1)) ) → Same rolling 12-period correlation — change 12 to adjust window size Flag periods where the correlation turns negative: Useful for detecting relationship reversals that warrant investigation. =IF(C13<0, "Correlation reversed", "")

Finding the strongest correlations — LARGE and SMALL across a matrix

When a correlation matrix contains dozens of pairs, identifying the strongest relationships manually is tedious. LARGE and SMALL extract the k-th largest and smallest correlation values from the entire matrix array. Additionally, combining this with INDEX and MATCH allows you to identify not just the strongest correlation value but also which variable pair produced it.

Correlation matrix stored in F2:I5 (4×4 matrix, diagonal = 1.0). Row and column labels in E2:E5 and F1:I1. Strongest positive correlation (excluding the diagonal): LARGE returns the k-th largest value from the matrix. Use k=2 because k=1 gives 1.0 (diagonal self-correlations). =LARGE(F2:I5, 2) → e.g. 0.967 (the highest non-diagonal correlation in the matrix) Strongest negative correlation: SMALL returns the smallest (most negative) value. =SMALL(F2:I5, 1) → e.g. −0.841 (the most negative correlation in the matrix) Count how many pairs have strong positive correlation (>= 0.7): COUNTIF counts cells in the matrix meeting the threshold. =COUNTIF(F2:I5, ">=0.7") - 4 → Subtracts 4 because the 4 diagonal values (1.0) also exceed 0.7 Count pairs with strong negative correlation (<= -0.7): =COUNTIF(F2:I5, "<=-0.7")

Spearman rank correlation — robust correlation for non-normal data

Pearson correlation assumes the relationship is linear and the data is approximately normally distributed. When data contains outliers, is skewed, or follows a non-linear but monotonic pattern, the Spearman rank correlation is more appropriate. Spearman correlation ranks each dataset and then applies Pearson correlation to the ranks. In Excel, this is achieved in two steps: rank with RANK.AVG, then correlate the ranks with CORREL.

A2:A30 = first variable (may contain outliers or skewed values). B2:B30 = second variable. Step 1: Rank each variable (RANK.AVG handles ties by averaging ranks). C2 = rank of A2 in A2:A30. Copy down to C30. =RANK.AVG(A2, $A$2:$A$30, 1) → Rank of A2 within the column (1 = smallest) D2 = rank of B2 in B2:B30. Copy down to D30. =RANK.AVG(B2, $B$2:$B$30, 1) Step 2: Spearman correlation = Pearson correlation of the ranks. =CORREL(C2:C30, D2:D30) → Spearman rank correlation (robust to outliers and non-linearity) One-step Spearman (no helper columns) — Excel 365: RANK on the fly using BYROW or RANK.AVG with the full array. Simpler approach: wrap the ranks inline. =CORREL( RANK.AVG(A2:A30, A2:A30, 1), RANK.AVG(B2:B30, B2:B30, 1) ) → Same Spearman result, no helper columns needed in Excel 365

Correlation dashboard — live coefficient with strength label and chart prep

A self-updating correlation dashboard shows the coefficient, a plain-language strength label, a significance indicator, and the R² value. All cells reference the data directly, so adding new rows updates every figure automatically. Additionally, adding a scatter chart with a trendline provides the visual confirmation that the correlation number alone cannot give.

A2:A50 = x variable. B2:B50 = y variable. All formulas reference the data ranges directly — no intermediate steps. D2 = coefficient: =CORREL(A2:A50, B2:B50) D3 = strength label (plain English): =IF(ABS(D2)>=0.9, "Very strong", IF(ABS(D2)>=0.7, "Strong", IF(ABS(D2)>=0.4, "Moderate", IF(ABS(D2)>=0.1, "Weak", "Negligible")))) D4 = direction label: =IF(D2>0, "Positive", IF(D2<0, "Negative", "None")) D5 = R² (shared variance): =TEXT(D2^2, "0.0%") & " shared variance" → "93.5% shared variance" D6 = full summary sentence: "r = " & TEXT(D2, "0.000") & " (" & D3 & " " & LOWER(D4) & ")" → "r = 0.967 (Very strong positive)"

Common Issues and How to Fix Them

CORREL returns #N/A

#N/A means the two arrays have different numbers of data points. Excel counts the number of values in array1 and array2 after ignoring text, blanks, and logical values — and they must match. Check both ranges for hidden blank rows, accidentally included header rows, or mismatched range sizes. The simplest fix is to make both ranges exactly the same size by selecting the same number of rows for each.

CORREL returns #DIV/0!

#DIV/0! occurs when either array has zero variance — meaning all values in one column are identical. Correlation is undefined when a variable does not vary. Check the data for columns that are all the same value, or for ranges that accidentally include only one unique number. This error also appears when both arrays are empty or contain only non-numeric values after filtering.

The correlation is high but the relationship is not linear

A high Pearson correlation confirms a strong linear relationship. It does not confirm that the relationship exists in the first place — it only measures the linear component. Consequently, always plot the two variables in a scatter chart before drawing conclusions. A scatter chart showing a curved pattern, a cluster of outliers, or a relationship that differs by subgroup all indicate that the correlation coefficient alone is an incomplete description of the data.

Correlation is not causation — this matters in practice: A high CORREL result does not mean that one variable causes the other to change. Both might be driven by a third variable you have not measured. For example, ice cream sales and drowning rates correlate because both rise in summer — temperature is the hidden driver. Before acting on a strong correlation, always investigate the mechanism. Use domain expertise, experimental design, or causal inference methods to move beyond association.

Frequently Asked Questions

What does the CORREL function return?+

CORREL returns the Pearson product-moment correlation coefficient for two datasets. The result is a decimal between −1 and +1. A value close to +1 indicates a strong positive linear relationship. A value close to −1 indicates a strong negative linear relationship. A value near zero indicates little or no linear relationship. The function is equivalent to PEARSON and to taking the square root of the R² value from a simple linear regression of one variable against the other.
What is the difference between CORREL and PEARSON in Excel?+

There is no difference in results — CORREL and PEARSON are identical functions that calculate the same Pearson product-moment correlation coefficient. CORREL is more commonly used because the name is more intuitive. Both functions handle text, logical values, and blank cells the same way (by ignoring them). Choose whichever name you find more readable. Some style guides prefer CORREL for consistency with COVARIANCE functions in the same family.
How do I check if a correlation is statistically significant?+

Convert the correlation to a t-statistic using t = r × SQRT(n−2) / SQRT(1−r²), where r is the CORREL result and n is the number of data points. Then use TDIST(ABS(t), n−2, 2) to get the two-tailed p-value. A p-value below 0.05 means the correlation is statistically significant at the 5% level — unlikely to be a chance result. Larger samples make smaller correlations significant. For n = 100, a correlation of just 0.20 is statistically significant. For n = 10, you need a correlation of about 0.63 to achieve significance.