This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.
Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(
\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)
\)
where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?
\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).
\(Y\) ~ Binomial \((n, \bar{p})\)
\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).
If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).
\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2 \)
\( MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)
Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).
This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).
Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.
Let's verify this as usual by simulation.
library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y #2.272519
Hence, the theory is verified by this simulation. I hope it helps.
This problem based on the calculation of Mean Square Error gives a detailed solution to ISI M.Stat 2019 PSB Problem 5, with a tinge of simulation and code.
Suppose \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables such that \(
\mathrm{P}\left(X_{i}=1\right)=p_{i}=1-\mathrm{P}\left(X_{i}=0\right)
\)
where \(p_{1}, p_{2}, \ldots, p_{n} \in(0,1)\) are all distinct and unknown. Consider \(X=\sum_{i=1}^{n} X_{i}\) and another random variable \(Y\) which is distributed as Binomial \((n, \bar{p}),\) where \(\bar{p}=\frac{1}{n} \sum_{i=1}^{n} p_{i} .\) Between \(X\) and \(Y,\) which is a better estimator of \(\sum_{i=1}^{n} p_{i}\) in terms of their respective mean squared errors?
\(E(\sum_{i=1}^{n} X_{i}) = \sum_{i=1}^{n} p_{i}\).
\(Y\) ~ Binomial \((n, \bar{p})\)
\( E(Y) = n.\bar{p} = \sum_{i=1}^{n} p_{i}\).
If \(T\) is unbiased for \(\theta\), then MSE(\(T) = Var(T)\).
\( MSE(\sum_{i=1}^{n} X_{i}) = Var(\sum_{i=1}^{n} X_{i}) \overset{X_{1}, X_{2}, \ldots, X_{n} \text{are independent}}{=} \sum_{i=1}^{n} Var(X_{i}) = \\ \sum_{i=1}^{n} p_i(1-p_i) = \sum_{i=1}^{n} p_i - \sum_{i=1}^{n} p_i^2 \)
\( MSE(Y) = Var(Y) = n\bar{p}(1 - \bar{p}) = \sum_{i=1}^{n} p_i - \frac{(\sum_{i=1}^{n} p_i)^2}{n} \)
Observe that \( (\sum_{i=1}^{n} p_i^2)n = (\sum_{i=1}^{n} p_i^2)(\sum_{i=1}^{n} 1) \overset{\text{Cauchy Schwartz Inequality}}{\geq} (\sum_{i=1}^{n} p_i)^2 \).
This results in the fact that \(MSE(\sum_{i=1}^{n} X_{i}) \leq MSE(Y)\).
Therefore, \(\sum_{i=1}^{n} X_{i}\) is a better estimate thatn \(Y\) w.r.t Mean Square Error.
Let's verify this as usual by simulation.
library(statip)
library(Metrics)
N = 10
p = runif(10, 0, 1)
X = rep(0,N)
vX = NULL
vY = NULL
for (j in 1:1000)
{
for (i in 1:N)
{
X[i] = rbern(1,p[i])
}
Z = sum(X) #sum of Xi random variables
Y = rbinom(1,N,mean(p)) #Y random variable
vX = c(vX, Z)
vY = c(vY, Y)
}
k = rep(sum(p), 1000)
mse(k, vX) #MSE of Sum Xi #1.57966
mse(k, vY) #MSE of Y #2.272519
Hence, the theory is verified by this simulation. I hope it helps.