This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.

## Problem

Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(

\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)

\)

where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?

### Prerequisites

- Bernoulli and Binomial Distribution
- Basic Estimation Theory (Unbiasedness and Mean Square Error)
- Cauchy – Schwartz Inequality

## Solution

#### Unbiasedness

\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).

\(Y\) ~ Binomial \((n, \bar{p})\)

\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).

#### Mean Square Error

If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).

\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i – \sum_{i=1}^{n} p_i^2 \)

\( MSE(Y) = Var(Y) = n\bar{p}(1 – \bar{p}) = \sum_{i=1}^{n} p_i – \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)

Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).

This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).

Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.

Let’s verify this as usual by simulation.

## Computation and Simulation

```
library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y #2.272519
```

Hence, the theory is verified by this simulation. I hope it helps.

Google