mse, mae, rmse, r² in regression : which regression performance metric should you choose ?

8 min readJan 26, 2025

in the context of Machine Learning, MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MAE (Mean Absolute Error) and R² (Coefficient of determination or R Squared) are commonly used metrics to evaluate the performance of regression models. these metrics measure the error, which is the difference between the actual values and the predicted values, but they function differently. the goal is to determine which of these metrics is most appropriate depending on the model and specific objectives.

2. Regression models concerned

Metrics such as MSE, MAE, RMSE, and R² are widely used in regression models, which predict a continuous numerical quantity. These models include techniques such as:linear regression

decision trees in regression
neural networks
random forests,
(SVM) adapted for regression (SVR).

These models are used to predict continuous variables, whether they relate to economic data, environmental data, or human behavior.

3. What is an error and why do we talk about It?

An error in a regression model is simply the difference between the observed actual value and the value predicted by the model. This error helps us understand how close our predictions are to the actual data; it is an indicator of the model’s ability to reflect the expected values compared to the observed values. A correct evaluation of these errors is essential to determine if a model is reliable or if it needs improvement.t.

In this context, MSE, MAE, RMSE, and R² measure this error in different ways:

MSE and RMSE emphasize large errors.
MAE treats each error equally.
R² evaluates the proportion of variance explained by the model.

4. MSE (Mean Squared Error) :

MSE is calculated by taking the mean of the squared errors, which are the differences between the actual values and the predicted values.
Squaring the errors has a particular impact: it amplifies significant errors.

For example, if an error is 10, it becomes 100 after squaring, whereas an error of 1 remains 1. This means that large errors will have a much stronger influence on the MSE.

Penalizes large errors: Squaring the errors amplifies their impact, making this metric sensitive to significant prediction mistakes.
Provides a smooth gradient, making it commonly used in optimization algorithms like gradient descent.

5. RMSE (Root Mean Squared Error)

RMSE is simply the square root of the MSE. This brings the scale of the error back to the same units as the original data, making the RMSE easier to interpret. Like the MSE, the RMSE penalizes large errors more heavily, but it does so in a more intuitive format.

Difference from the MSE: The main difference is that the RMSE takes the square root of the MSE, which reduces the impact of large errors and provides an interpretation in the same units as the observed data, unlike the MSE, which provides squared units.

Retains the sensitivity to large errors of MSE but presents the result in the same units as the target variable.
More intuitive to interpret compared to MSE (e.g., if the RMSE is 2 hours, it’s directly meaningful).

6. MAE (Mean Absolute Error) :

The MAE (Mean Absolute Error), on the other hand, is calculated by taking the mean of the absolute errors, which means it simply measures the absolute difference between the actual values and the predicted values without squaring the errors.

Here, the errors are treated uniformly, meaning an error of 10 and an error of 1 have an equivalent linear weight. Unlike the MSE, there is no squaring involved, so large errors are not amplified. This makes the MAE more robust to outliers, but it may be less sensitive to large-scale errors

Treats all errors equally regardless of its size
Robust to outliers : Since it doesn’t square the errors, large errors won’t overly influence the metric

7. R² (R-squared)

R² measures the proportion of the variance in the data explained by the model. An R² close to 1 indicates that the model explains the variability of the data well, while an R² close to 0 indicates that the model explains very little of the variance.

Difference from MSE, RMSE, and MAE: R² measures how well the model fits the data, whereas the other metrics (MSE, MAE, RMSE) directly measure prediction error. R² is useful for understanding the proportion of variance explained, but it does not provide a direct measurement of error.

Measures the proportion of variance explained: Indicates how well the model explains the variability in the target data.
Bound between 0 and 1

8. When to use them ?

When do you need to use MSE?

Use the MSE when you want to heavily penalize large errors. It is particularly useful in situations where significant errors can have severe consequences, and avoiding these large errors is crucial

Example:

magine you are a delivery company predicting the delivery time of packages. You have two packages:

Package A: The model predicts the package will arrive in 3 hours. In reality, it arrives in 3.5 hours. The error is therefore 0.5 hours
Package B: The model predicts the package will arrive in 3 hours. In reality, it arrives in 6 hours. The error is therefore 3 hours

The MSE calculates the squared error:

Error for Package A: (3.5–3)² = (0.5)² = 0.25
Error for Package B: (6–3)² = (3)² = 9

Thus, the MSE is:
MSE = (0.25 + 9) / 2 = 4.625

The MSE penalizes large errors much more severely. The error of 3 hours has a significantly stronger impact than the error of 0.5 hours.

Why this is useful:
The MSE’s strong penalty for large errors makes it particularly useful in situations where significant errors (such as long delays) are especially costly or unacceptable. For example, in the delivery sector, minimizing long delays is critical to maintain customer satisfaction and operational efficiency.

When do you need to use RMSE?

RMSE is similar to MSE, but it is more intuitive because it brings the error to the same scale as the original data. RMSE is particularly useful when large errors need to be penalized more significantly but in a unit comparable to the observed values.

Example:
Let’s revisit the delivery company example, but this time using RMSE to make the error more interpretable in the original units (hours in this case).

Package A: Prediction = 3 hours, Reality = 3.5 hours. Error = 0.5 hours.
Package B: Prediction = 3 hours, Reality = 6 hours. Error = 3 hours.

We already calculated the MSE as 4.625, but RMSE is simply the square root of the MSE:
RMSE = √4.625 = 2.15

Why it’s useful:
RMSE is in the same units as the data, making it more intuitive to interpret. For example, if the error is 0.5 hours, you know exactly what it means in terms of delivery time.
It penalizes large errors similarly to MSE but offers a more natural interpretation.

When do you need to use MAE?

MAE is preferable when you don’t want a few large errors to dominate the overall evaluation of the model. It is useful if your data contains outliers and you don’t want them to excessively influence the model’s performance. MAE is more stable and provides a fairer evaluation in situations where extreme errors should not be exaggerated.

Example:
Imagine you are a teacher predicting a student’s final grade based on previous test scores. You have two students:

Student A: Prediction = 80/100, Reality = 90/100. Error = 10 points.
Student B: Prediction = 80/100, Reality = 40/100. Error = 40 points.

The MAE simply averages the absolute errors:

Error for Student A: |90–80| = 10
Error for Student B: |40–80| = 40

MAE = (10 + 40) / 2 = 25

Why it’s useful:
MAE treats both errors equally. Whether the error is 10 points or 40 points, they have the same weight in the evaluation.
If you don’t want a large error (like Student B’s) to overly skew the evaluation of the model, MAE is ideal. It provides a more balanced average of errors.

When do you need to use R²?

R² is used to measure the proportion of variance explained by the model. A high R² (close to 1) indicates that the model explains the data variance well. However, it can be less reliable in non-linear models or when extreme data points are present.

Example:
Imagine you are predicting a company’s revenue based on advertising investment. The results are as follows:

Advertising investment (independent variable) and revenue (dependent variable) are measured over a given period.

Suppose your model predicts revenue based on advertising investment, and the R² calculated by the model is 0.85.
This means that 85% of the variance in revenue is explained by the model in relation to advertising investment. The remaining 15% of variance is not explained by the model and could be due to other factors (such as seasonality, economic trends, etc.).

Why it’s useful:
R² gives an idea of the model’s overall fit to the data.

An R² close to 1 : the model explains almost all the variance in the data, high perf
An R² close to 0 : the model explains very little of the variance in the data, poor perf
A negative R² : the model fails to explain the variance in the data but also suggests that the model is not suitable for the dataset or that it is poorly fitted (ovrfitting, underfitting, using the wrong type of model…)

R² is a global metric that shows how much of the variability in your data is explained by the model.

9. Conclusion

The choice between MSE, MAE, RMSE, and R² depends on the regression problem’s nature and the model’s objectives. Here’s a summary to help you choose:

MSE: Used when you want to penalize larger errors more. It is particularly useful in applications where large errors can have significant consequences, such as in financial asset price prediction.
MAE: Preferred when you want a more robust and stable evaluation, where all errors are given equal weight.
RMSE: Ideal when you want a metric in the same units as the original data while still penalizing large errors more intuitively.
R²: Used to evaluate the overall quality of the model by measuring the proportion of variance explained by it. It is particularly useful for understanding the fit of a linear model to the data.

Thus, the choice of metric depends on your priorities: robustness to outliers (MAE), penalizing large errors (MSE and RMSE), or understanding the model’s overall fit to the data (R²)

Sirine Amrane