This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.


Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(
where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?




\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).

\(Y\) ~ Binomial \((n, \bar{p})\)

\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).

Mean Square Error

If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).

\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i – \sum_{i=1}^{n} p_i^2 \)

\( MSE(Y) = Var(Y) = n\bar{p}(1 – \bar{p}) = \sum_{i=1}^{n} p_i – \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)

Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).

This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).

Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.

Let’s verify this as usual by simulation.

Computation and Simulation

N = 10
p = runif(10, 0, 1)
X = rep(0,N)
for (j in 1:1000)
  for (i in 1:N) 
    X[i] = rbern(1,p[i])
  Z = sum(X) #sum of Xi random variables
  Y = rbinom(1,N,mean(p)) #Y random variable
  vX = c(vX, Z)
  vY = c(vY, Y)
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.