Root mean squared error (RMSE)

Root mean squared error (RMSE) is a measure of the differences between predicted values and observed values in a dataset, commonly used to assess the accuracy of a predictive model.
Written by
Reviewed by
Updated on Jun 11, 2024
Reading time 5 minutes

3 key takeaways

Copy link to section
  • RMSE quantifies the average magnitude of the error between predicted and actual values, providing a single measure of predictive accuracy.
  • It is calculated as the square root of the average of the squared differences between predicted and observed values.
  • RMSE is sensitive to large errors, making it useful for identifying models with significant prediction deviations.

What is root mean squared error (RMSE)?

Copy link to section

Root mean squared error (RMSE) is a widely used metric in statistics and machine learning to evaluate the accuracy of a predictive model. It measures the average magnitude of the error between predicted values and actual observed values, providing an overall indication of the model’s performance.

RMSE is particularly useful because it penalizes larger errors more than smaller ones, due to the squaring of differences before averaging. This makes RMSE a valuable tool for assessing models where larger errors are particularly undesirable.

How does RMSE work?

Copy link to section

RMSE is calculated by taking the square root of the average of the squared differences between predicted and observed values. The formula for RMSE is as follows:

RMSE = sqrt((1/n) * sum((y_i – y_hat_i)^2))

where:

  • n is the number of observations.
  • y_i represents the observed value for the i-th observation.
  • y_hat_i represents the predicted value for the i-th observation.
  • The term (y_i – y_hat_i)^2 represents the squared difference between the observed and predicted values for the i-th observation.

Example calculation

Copy link to section

Consider a simple example with three observed values y = [1, 2, 3] and corresponding predicted values y_hat = [1.1, 1.9, 3.2]. To calculate RMSE:

  1. Calculate the squared differences: (1 – 1.1)^2 = 0.01 (2 – 1.9)^2 = 0.01 (3 – 3.2)^2 = 0.04

  2. Compute the average of the squared differences: (0.01 + 0.01 + 0.04) / 3 = 0.02

  3. Take the square root of the average: RMSE = sqrt(0.02) ≈ 0.141

This means the RMSE for this dataset is approximately 0.141, indicating the average magnitude of the prediction errors.

Importance of RMSE

Copy link to section

RMSE is an essential metric for evaluating the accuracy of predictive models for several reasons:

Sensitivity to large errors

Copy link to section

Because RMSE squares the differences before averaging, it gives more weight to larger errors. This sensitivity makes RMSE particularly useful for identifying models that have significant prediction deviations.

Comparability

Copy link to section

RMSE is expressed in the same units as the target variable, making it easy to interpret and compare across different models or datasets.

Overall performance indicator

Copy link to section

By summarizing the prediction errors into a single value, RMSE provides a clear and concise measure of a model’s overall accuracy. This helps in comparing different models and selecting the one that best fits the data.

Benefits and limitations of RMSE

Copy link to section

Understanding the benefits and limitations of RMSE provides insight into its practical implications and appropriate use cases.

Benefits

Copy link to section
  • Comprehensive measure: RMSE provides a single metric that summarizes the overall accuracy of a model, considering both the magnitude and distribution of errors.
  • Interpretability: RMSE is easy to interpret and compare, as it is expressed in the same units as the target variable.
  • Sensitivity to large errors: RMSE’s sensitivity to larger errors makes it useful for identifying models that struggle with significant prediction deviations.

Limitations

Copy link to section
  • Sensitivity to outliers: While sensitivity to large errors can be beneficial, it can also be a drawback if the dataset contains outliers. These outliers can disproportionately affect the RMSE, potentially leading to misleading conclusions.
  • Lack of context: RMSE does not provide information about the direction of errors (whether predictions are consistently over or underestimating), nor does it indicate the distribution of errors across the dataset.

Examples of RMSE in practice

Copy link to section

To better understand RMSE, consider these practical examples that highlight its application in different contexts:

Example 1: Weather forecasting

Copy link to section

In weather forecasting, RMSE is used to evaluate the accuracy of temperature predictions. Meteorologists compare predicted temperatures with observed temperatures to calculate the RMSE, helping them assess and improve their forecasting models.

Example 2: Machine learning

Copy link to section

In machine learning, RMSE is commonly used to evaluate regression models. For instance, in predicting housing prices, the RMSE provides a measure of how well the model’s predicted prices match the actual prices. Lower RMSE values indicate more accurate models.

Example 3: Financial modeling

Copy link to section

In financial modeling, RMSE can be used to assess the accuracy of models predicting stock prices, interest rates, or other financial metrics. By comparing predicted values with actual market data, analysts can gauge the performance of their models and make necessary adjustments.

Understanding RMSE is crucial for evaluating and improving predictive models across various fields. If you’re interested in learning more about related topics, you might want to read about mean absolute error (MAE), R-squared, and other model evaluation metrics.


Sources & references

Arti

Arti

AI Financial Assistant

  • Finance
  • Investing
  • Trading
  • Stock Market
  • Cryptocurrency
Arti is a specialized AI Financial Assistant at Invezz, created to support the editorial team. He leverages both AI and the Invezz.com knowledge base, understands over 100,000 Invezz related data points, has read every piece of research, news and guidance we\'ve ever produced, and is trained to never make up new...