 Get inspired by the success stories of our students in IIT JAM MS, ISI  MStat, CMI MSc Data Science.  Learn More

# Mean Square Error | ISI MStat 2019 PSB Problem 5 This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

## Problem

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ are independent random variables such that $\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)$
where $p_{1}, p_{2}, \ldots, p_{n} \in(0,1)$ are all distinct and unknown. Consider $X=\sum_{i=1}^{n} X_{i}$ and another random variable $Y$ which is distributed as Binomial $(n, \bar{p}),$ where $\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .$ Between $X$ and $Y,$ which is a better estimator of $\sum_{i=1}^{n} p_{i}$ in terms of their respective mean squared errors?

## Solution

#### Unbiasedness

$E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}$.

$Y$ ~ Binomial $(n, \bar{p})$

$E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}$.

#### Mean Square Error

If $T$ is unbiased for $\theta$, then MSE($T) = Var(T)$.

$MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2$

$MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n}$

Observe that $(\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2$.

This results in the fact that $MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)$.

Therefore, $\sum_{i=1}^{n} X_{i}$ is a better estimate thatn $Y$ w.r.t Mean Square Error.

Let's verify this as usual by simulation.

## Computation and Simulation

library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)

}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

## Problem

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ are independent random variables such that $\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)$
where $p_{1}, p_{2}, \ldots, p_{n} \in(0,1)$ are all distinct and unknown. Consider $X=\sum_{i=1}^{n} X_{i}$ and another random variable $Y$ which is distributed as Binomial $(n, \bar{p}),$ where $\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .$ Between $X$ and $Y,$ which is a better estimator of $\sum_{i=1}^{n} p_{i}$ in terms of their respective mean squared errors?

## Solution

#### Unbiasedness

$E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}$.

$Y$ ~ Binomial $(n, \bar{p})$

$E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}$.

#### Mean Square Error

If $T$ is unbiased for $\theta$, then MSE($T) = Var(T)$.

$MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2$

$MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n}$

Observe that $(\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2$.

This results in the fact that $MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)$.

Therefore, $\sum_{i=1}^{n} X_{i}$ is a better estimate thatn $Y$ w.r.t Mean Square Error.

Let's verify this as usual by simulation.

## Computation and Simulation

library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)

}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y  #2.272519

Hence, the theory is verified by this simulation. I hope it helps.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

### Knowledge Partner  