Stein’s Paradox. Why the Sample Mean Isn’t Always the… | by Tim Sumner | Sep, 2024
To demonstrate the versatility of this technique, I will generate a 6-dimensional data set with each column containing numerical data from various random distributions. Here are the specific distributions and parameters of each I will be using:
X1 ~ t-distribution (ν = 3)
X2 ~ Binomial (n = 10, p = 0.4)
X3 ~ Gamma (α = 3, β = 2)
X4 ~ Uniform (a = 0, b = 1)
X5 ~ Exponential (λ = 50)
X6 ~ Poisson (λ = 2)
Note each column in this data set contains independent variables, in that no column should be correlated with another since they were created independently. This is not a requirement to use this method. It was done this way simply for simplicity and to demonstrate the paradoxical nature of this result. If you’re not entirely familiar with any or all of these distributions, I’ll include a simple visual of each of the univariate columns of the randomly generated data. This is simply one iteration of 1,000 generated random variables from each of the aforementioned distributions.
It should be clear from the histograms above that not all of these variables follow a normal distribution implying the dataset as a whole is not multivariate normal.
Since the true distributions of each are known, we know the true averages of each. The average of this multivariate dataset can be expressed in vector form with each row entry representing the average of the variable respectively. In this example,
Knowing the true averages of each variable will allow us to be able to measure how close the sample mean, or James Stein estimator gets implying the closer the better. Below is the experiment I ran in R code which generated each of the 6 random variables and tested against the true averages using the Mean Squared Error. This experiment was then ran 10,000 times using four different sample sizes: 5, 50, 500, and 5,000.
set.seed(42)
## Function to calculate Mean Squared Error ##
mse <- function(x, true_value)
return( mean( (x - true_value)^2 ) )
## True Average ##
mu <- c(0, 4, 1.5, 0.5, 0.02, 2)
## Store Average and J.S. Estimator Errors ##
Xbar.MSE <- list(); JS.MSE <- list()
for(n in c(5, 50, 500, 5000)){ # Testing sample sizes of 5, 30, 200, and 5,000
for(i in 1:1e4){ # Performing 10,000 iterations## Six Random Variables ##
X1 <- rt(n, df = 3)
X2 <- rbinom(n, size = 10, prob = 0.4)
X3 <- rgamma(n, shape = 3, rate = 2)
X4 <- runif(n)
X5 <- rexp(n, rate = 50)
X6 <- rpois(n, lambda = 2)
X <- cbind(X1, X2, X3, X4, X5, X6)
## Estimating Std. Dev. of Each and Standardizing Data ##
sigma <- apply(X, MARGIN = 2, FUN = sd)
## Sample Mean ##
Xbar <- colMeans(X)
## J.S. Estimator ##
JS.Xbar <- james_stein_estimator(Xbar=Xbar, sigma2=sigma/n)
Xbar.MSE[[as.character(n)]][i] <- mse(Xbar, mu)
JS.MSE[[as.character(n)]][i] <- mse(JS.Xbar, mu)
}
}
sapply(Xbar.MSE, mean) # Avg. Sample Mean MSE
sapply(JS.MSE, mean) # Avg. James-Stein MSE
From all 40,000 trails, the total average MSE of each sample size is computed by running the last two lines. The results of each can be seen in the table below.
The results of this of this simulation show that the James-Stein estimator is consistently better than the sample mean using the MSE, but that this difference decreases as the sample size increases.