Winsorized Mean: Formula, Examples and Meaning

What Is the Winsorized Mean?

Winsorized mean is a method of averaging that initially replaces the smallest and largest values with the observations closest to them. This is done to limit the effect of outliers or abnormal extreme values, or outliers, on the calculation.

After replacing the values, the arithmetic mean formula is then used to calculate the winsorized mean.

Key takeaways

  • The winsorized mean is an averaging method that involves replacing the smallest and largest values of a data set with the observations closest to them.
  • It mitigates the effects of outliers by replacing them with less extreme values.
  • The winsorized mean is not the same as the trimmed mean, which involves removing data points as opposed to replacing them—although the results of the two tend to be close.
  • The winsorized mean is also not the same as the arithmetic mean which does not adjust for outliers.

Formula for the Winsorized Mean

Winsorized Mean  =   x n x n + 1   +   x n + 2 x n N where: n   =   The number of largest and smallest data points to be replaced by the observation closest to them N   =  Total number of data points \begin{aligned} &\text{Winsorized Mean}\ =\ \frac{x_{n}\dots x_{n+1}\ +\ x_{n+2}\dots x_{n}}{N}\\ &\textbf{where:}\\ &\begin{aligned} n\ =\ &\text{The number of largest and smallest data}\\ &\text{points to be replaced by the observation}\\ &\text{closest to them}\end{aligned}\\ &N\ =\ \text{Total number of data points} \end{aligned} Winsorized Mean = Nxnxn+1 + xn+2xnwhere:n = The number of largest and smallest datapoints to be replaced by the observationclosest to themN = Total number of data points

Winsorized means are expressed in two ways. A "kn" winsorized mean refers to the replacement of the "k" smallest and largest observations, where "k" is an integer. An "X%" winsorized mean involves replacing a given percentage of values from both ends of the data.

The winsorized mean is achieved by replacing the smallest and largest data points, then summing all the data points and dividing the sum by the total number of data points.

What Does the Winsorized Mean Tell You?

The winsorized mean is less sensitive to outliers because it can replace them with less extreme values. That is, it is less susceptible to outliers versus the arithmetic average. However, if a distribution has fat tails, the effect of removing the highest and lowest values in the distribution will have little influence because of the high degree of variability in the distribution figures.

One major downside for winsorized means is that they naturally introduce some bias into the data set. By reducing the influence of outliers, the analysis is modified for better analysis, but also removes information about the underlying data.

Strengths of Winsorized Mean

There are several situations in which using the winsorized mean is best. These vague situations are listed below, with more specific examples of where winsorized mean may be most useful in the next section. Times when it's sometimes best to use winsorized mean includes when there's:

  • Outliers in the Dataset. Using the conventional arithmetic mean might produce false results when your dataset contains outliers, or extreme values that are considerably different from the other data points. Winsorized mean offers a more accurate representation of central trend and reduces the influence of these outliers.
  • Skewed Distributions. Winsorized mean can be useful for datasets with significantly skewed distributions. In skewed distributions, there may be extreme values and a lengthy tail on one side. In order to reduce the skewness and create a more reliable estimate of the central tendency, winsorizing is used.
  • Data with Measurement Mistakes. Measurement mistakes might cause outliers when they are present in the data. These measurement errors can be lessened by using the winsorized mean.
  • Temporarily Value Fluctuations. Winsorized mean can be helpful in circumstances where brief variations in data could lead to extreme numbers since it is resistant to these fluctuations. Over time, the winsorized mean becomes more reliable and stable by taking the place of these outliers.
  • Limited Sample Size: When there are few data points and a small sample size, the influence of outliers on the conventional mean may be greater. In these circumstances, the winsorized mean can offer a more accurate estimation of the central tendency.

Winsorized Mean Level

The winsorization level is crucial for effectively using the winsorized mean. The winsorization level determines the percentage of extreme values to be replaced with less extreme ones. To determine the appropriate winsorization level, consider data exploration, relying on domain knowledge, conducting sensitivity analyses, and consulting with experts who may be more familiar on what extreme values may look like.

When assessing the winsorized level, understanding the nature of outliers and their reasons can help determine the appropriate level. Outliers can influence the statistical analysis, so a higher winsorization level may be beneficial if they unduly influence results. However, a lower level may be more appropriate if the goal is to preserve some of the data's original characteristics. When picking the level, gauge your interest in how important keeping the original data's composition is.

In many cases, data domain knowledge is essential in setting the winsorization level. Consider any data set and what the typical range of values would be. Without historical, implicit knowledge of the industry, it would be much more challenging to identify bad data. In some cases, experimentation is crucial in observing how the winsorized mean changes with varying levels.

Winsorized Mean and Real World Situations

More specifically, there's a handful of situations or industries where winsorized mean makes more sense than other forms of measurement. These real world situations may include but aren't limited to the categories below.

Financial/Investments

Market volatility can have a material impact on financial data. Stock prices, asset returns, and other financial indicators may display extreme levels in the world of finance and investing. The impact of severe price volatility and outliers can be lessened when financial data estimates are computed using the winsorized mean.

Payroll/Salaries

Distributions of salaries or payroll within businesses can occasionally be very skewed. This is especially true in sectors where there is a significant income gap or sectors that materially reward those who have been in the industry long or "penalize" those who are just starting their careers. By minimizing the impact of abnormally high or low incomes, the winsorized mean can assist in providing a more accurate measurement of the typical pay range.

Health Care

Because of uncommon medical illnesses or extreme measures, medical data may contain outliers. Health-related indicators like blood pressure, cholesterol levels, or patient recovery durations can be better understood should extremes be removed. For example, information may be more helpful to know regarding a collective average of patients should that data set not be skewed by any abnormally high or abnormally low medical readings.

Education

Due to a variety of variables, some children may have unusually high or low test results. It may not be as useful to incorporate these abnormal test scores when evaluating a specific cohort's performance; therefore, an assessment's average score can be calculated using the winsorized mean to remove any negative (or positive) implications that student may have when evaluating how a specific teacher or course was perceived.

Customer Satisfaction

On a very similar note, when assessing customer satisfaction ratings, outliers may appear because a tiny percentage of consumers provided extremely good or negative comments. In the example above regarding a class, perhaps a single disgruntled student brings down the course evaluation score. Winsorized mean can help reduce the influence of these extreme scores and produce a more realistic picture of overall happiness.

Environmental Data

With seemingly more uncommon occurrences or extreme weather conditions occurring, there may be situations where environmental data without these extremes are useful. For example, consider a measure of average air quality or the amount of water contamination. Abnormally high or low levels of contamination in either context may mislead decision-makers in understanding what the average daily situation may be; for example, environmental economic resources may be misallocated.

Example of How to Use Winsorized Mean

Let's calculate the winsorized mean for the following data set: 1, 5, 7, 8, 9, 10, 34. In this example, we assume the winsorized mean is in the first order, in which we replace the smallest and largest values with their nearest observations.

The data set now appears as follows: 5, 5, 7, 8, 9, 10, 10. Taking an arithmetic average of the new set produces a winsorized mean of 7.7, or (5 + 5 + 7 + 8 + 9 + 10 + 10) divided by 7. Note that the arithmetic mean would have been higher—10.6. The winsorized mean effectively reduces the influence of the 34 value as an outlier.

Or consider a 20% winsorized mean that takes the top 10% and bottom 10% and replaces them with their next closest value. We will winsorize the following data set: 2, 4, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 62, 75. The two smallest and two largest data points—20% of the 20 data points—will be replaced with their next closest value. Thus, the new data set is as follows: 7, 7, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 61, 61. The winsorized mean is 33.9, or the total of the data (678) divided by the total number of data points (20).

Winsorized Mean vs. Other Measurements

There are several other common forms of 'mean', each of which slightly vary from winsorized mean. Also, there are other measurement such as median that give comparable but different information. In general, the winsorized mean is designed to be more resistant to outliers compared to other types of means. Those other types of measurement may include:

  • Traditional/Arithmetic Mean: The traditional mean, also known as the arithmetic mean, is calculated by summing all the data points in a dataset and dividing by the number of data points. It is sensitive to extreme values and can be significantly affected by outliers.
  • Trimmed Mean: The trimmed mean is another type of robust mean that involves removing a certain percentage of extreme values from both ends. These ends may be referred to as the top and bottom of the data distribution. The trimmed mean retains a specific portion of the data's central values and discards the extreme values, making it more resistant to the influence of outliers compared to the arithmetic mean.
  • Median: The median isn't a mean calculation at all; instead, it represents the middle value of a dataset when arranged in ascending or descending order. Unlike the traditional mean, the median is not affected by extreme values because it only considers the central value(s) in the dataset.

Can Winsorized Mean Handle Multiple Outliers?

Yes, winsorized mean can handle multiple outliers within the dataset. It effectively replaces the specified percentage of extreme values, regardless of the number of outliers present. By replacing multiple outliers with less extreme values, the winsorized mean remains less sensitive to the influence of these outliers.

Can Winsorized Mean Be Used with Non-Numeric Data?

Not really. Winsorized mean is primarily designed for numeric data, as it involves replacing numerical outliers with other numerical values. For non-numeric data, such as categorical variables or text data, other robust statistical measures or preprocessing techniques may be more appropriate.

Does Winsorized Mean Preserve Data Variability?

Winsorized mean retains more data variability than the trimmed mean, as it only replaces the extreme values with values closer to the dataset's central part. Therefore, it preserves more of the original data range and variability, making it a desirable choice in cases where maintaining variability is essential.

How Does Winsorized Mean Impact Hypothesis Testing?

Winsorized mean can affect the results of hypothesis testing by reducing the impact of extreme values on statistical tests. In cases where outliers might lead to erroneous conclusions, winsorized mean can provide more reliable test outcomes, making it a useful tool in hypothesis testing with non-normal or skewed data.

The Bottom Line

Winsorized mean is a statistical measure used to calculate the average of a dataset by replacing a specified percentage of extreme values or outliers with less extreme ones. This approach reduces the impact of outliers, providing a more robust estimate of the central tendency that is less sensitive to extreme values compared to the traditional arithmetic mean.

Compare Accounts
×
The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.
Provider
Name
Description