Get inspired by the success stories of our students in IIT JAM MS, ISI  MStat, CMI MSc DS.  Learn More 

Mean Square Error | ISI MStat 2019 PSB Problem 5

Join Trial or Access Free Resources

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

Problem

Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(
\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)
\)
where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?

Prerequisites

Solution

Unbiasedness

\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).

\(Y\) ~ Binomial \((n, \bar{p})\)

\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).

Mean Square Error

If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).

\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2 \)

\( MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)

Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).

This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).

Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.

Let's verify this as usual by simulation.

Computation and Simulation

library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
  for (i in 1:N) 
  {
    X[i] = rbern(1,p[i])
  }
  Z = sum(X) #sum of Xi random variables
  Y = rbinom(1,N,mean(p)) #Y random variable
  vX = c(vX, Z)
  vY = c(vY, Y)
  
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.

Knowledge Partner

Cheenta is a knowledge partner of Aditya Birla Education Academy
Cheenta

Cheenta Academy

Aditya Birla Education Academy

Aditya Birla Education Academy

Cheenta. Passion for Mathematics

Advanced Mathematical Science. Taught by olympians, researchers and true masters of the subject.
JOIN TRIAL
TEAM
support@cheenta.com
Menu
Trial
Whatsapp
ISI Entrance Solutions
ISI CMI Self Paced
rockethighlight