Get inspired by the success stories of our students in IIT JAM MS, ISI MStat, CMI MSc Data Science. Learn More

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(

\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)

\)

where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?

- Bernoulli and Binomial Distribution
- Basic Estimation Theory (Unbiasedness and Mean Square Error)
- Cauchy - Schwartz Inequality

\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).

\(Y\) ~ Binomial \((n, \bar{p})\)

\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).

If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).

\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2 \)

\( MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)

Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).

This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).

Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.

Let's verify this as usual by simulation.

```
library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y #2.272519
```

Hence, the theory is verified by this simulation. I hope it helps.

This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(

\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)

\)

where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?

- Bernoulli and Binomial Distribution
- Basic Estimation Theory (Unbiasedness and Mean Square Error)
- Cauchy - Schwartz Inequality

\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).

\(Y\) ~ Binomial \((n, \bar{p})\)

\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).

If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).

\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2 \)

\( MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)

Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).

This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).

Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.

Let's verify this as usual by simulation.

```
library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y #2.272519
```

Hence, the theory is verified by this simulation. I hope it helps.

Cheenta is a knowledge partner of Aditya Birla Education Academy

Advanced Mathematical Science. Taught by olympians, researchers and true masters of the subject.

JOIN TRIALAcademic Programs

Free Resources

Why Cheenta?

Google