Get inspired by the success stories of our students in IIT JAM MS, ISI  MStat, CMI MSc DS.  Learn More 

Mean Square Error | ISI MStat 2019 PSB Problem 5

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

Problem

Suppose X_{1}, X_{2}, \ldots, X_{n} are independent random variables such that \mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)
where p_{1}, p_{2}, \ldots, p_{n} \in(0,1) are all distinct and unknown. Consider X=\sum_{i=1}^{n} X_{i} and another random variable Y which is distributed as Binomial (n, \bar{p}), where \bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} . Between X and Y, which is a better estimator of \sum_{i=1}^{n} p_{i} in terms of their respective mean squared errors?

Prerequisites

Solution

Unbiasedness

E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}.

Y ~ Binomial (n, \bar{p})

E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}.

Mean Square Error

If T is unbiased for \theta, then MSE(T) = Var(T).

MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i})   =  \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2

MSE(Y) = Var(Y) =  n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n}

Observe that (\sum_{i=1}^{n} p_i^2)n  = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2.

This results in the fact that MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y).

Therefore, \sum_{i=1}^{n} X_{i} is a better estimate thatn Y w.r.t Mean Square Error.

Let's verify this as usual by simulation.

Computation and Simulation

library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
  for (i in 1:N) 
  {
    X[i] = rbern(1,p[i])
  }
  Z = sum(X) #sum of Xi random variables
  Y = rbinom(1,N,mean(p)) #Y random variable
  vX = c(vX, Z)
  vY = c(vY, Y)
  
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

Problem

Suppose X_{1}, X_{2}, \ldots, X_{n} are independent random variables such that \mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)
where p_{1}, p_{2}, \ldots, p_{n} \in(0,1) are all distinct and unknown. Consider X=\sum_{i=1}^{n} X_{i} and another random variable Y which is distributed as Binomial (n, \bar{p}), where \bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} . Between X and Y, which is a better estimator of \sum_{i=1}^{n} p_{i} in terms of their respective mean squared errors?

Prerequisites

Solution

Unbiasedness

E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}.

Y ~ Binomial (n, \bar{p})

E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}.

Mean Square Error

If T is unbiased for \theta, then MSE(T) = Var(T).

MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i})   =  \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2

MSE(Y) = Var(Y) =  n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n}

Observe that (\sum_{i=1}^{n} p_i^2)n  = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2.

This results in the fact that MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y).

Therefore, \sum_{i=1}^{n} X_{i} is a better estimate thatn Y w.r.t Mean Square Error.

Let's verify this as usual by simulation.

Computation and Simulation

library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
  for (i in 1:N) 
  {
    X[i] = rbern(1,p[i])
  }
  Z = sum(X) #sum of Xi random variables
  Y = rbinom(1,N,mean(p)) #Y random variable
  vX = c(vX, Z)
  vY = c(vY, Y)
  
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Knowledge Partner

Cheenta is a knowledge partner of Aditya Birla Education Academy
Cheenta

Cheenta Academy

Aditya Birla Education Academy

Aditya Birla Education Academy

Cheenta. Passion for Mathematics

Advanced Mathematical Science. Taught by olympians, researchers and true masters of the subject.
JOIN TRIAL
support@cheenta.com
Menu
Trial
Whatsapp
rockethighlight