Useless Data, Conditional Probability, and Independence | Cheenta Probability Series

This concept of independence, conditional probability and information contained always fascinated me. I have thus shared some thoughts upon this.

When do you think some data is useless?

Some data/ information is useless if it has no role in understanding the hypothesis we are interested in.

We are interested in understanding the following problem.

$X$ is some event. $Y$ is another event. How much information do $Y$ and $X$ give about each other?

We can model an event by a random variable. So, let's reframe the problem as follows.

$X$ and $Y$ are two random variables. How much information do $Y$ and $X$ give about each other?

There is something called entropy. But, I will not go into that. Rather I will give a probabilistic view only. The conditional probability marches in here. We have to use the idea that we have used the information of $Y$, i.e. conditioned on $Y$. Hence, we will see how $X \mid Y$ will behave?

How does $ X \mid Y$ behave? If $Y$ has any effect on $X$, then $X \mid Y$ would have changed right?

But, if $Y$ has no effect on $X$, then $X \mid Y$ will not change and remain same as X. Mathematically, it means

$ X \mid Y$ ~ $X$ $\iff$ $ X \perp \!\!\! \perp Y$

We cannot distinguish between the initial and the final even after conditioning on $Y$.

Theorem

$X$ and $Y$ are independent $ \iff $ $ f(x,y) = P(X =x \mid Y = y) $ is only a function of $x$.

Proof

$ \Rightarrow$

$X$ and $Y$ are independent $ \Rightarrow $ $ f(x,y) = P(X =x \mid Y = y) = P(X = x)$ is only a function of $x$.

$ \Leftarrow $

Let $ \Omega $ be the support of $Y$.

$ P(X =x \mid Y = y) = g(x) \Rightarrow $

$ P(X=x) = \int_{\Omega} P(X =x \mid Y = y).P(Y = y)dy $

$= g(x) \int_{\Omega} P(Y = y)dy = g(x) = P(X =x \mid Y = y) $

Exercises

$(X,Y)$ is a bivariate standard normal with $ \rho = 0.5$ then $ 2X - Y \perp \!\!\! \perp Y$.
$X, Y, V, W$ are independent standard normal, then $ \frac{VX + WY}{\sqrt{V^2+W^2}} \perp \!\!\! \perp (V,W) $.

Random Thoughts (?)

How to quantify the amount of information contained by a random variable in another random variable?

Information contained in $X$ = Entropy of a random variable $H(X)$ is defined by $ H(X) = E(-log(P(X)) $.

Now define the information of $Y$ contained in $X$ as $\mid H(X) - H(X|Y) \mid$.

Thus, it turns out that $H(X) - H(X|Y) = E_{(X,Y)} (log(\frac{P(X \mid Y)}{P(X)})) = H(Y) - H(Y|X) = D(X,Y)$.

$D(X,Y)$ = Amount of information contained in $X$ and $Y$ about each other.

Exercise

Prove that $H(X) \geq H(f(X))$.
Prove that $X \perp \!\!\! \perp Y \Rightarrow D(X,Y) = 0$.

Note: This is just a mental construction I did, and I am not sure of the existence of the measure of this information contained in literature. But, I hope I have been able to share some statistical wisdom with you. But I believe this is a natural construction, given the properties are satisfied. It will be helpful, if you get hold of some existing literature and share it to me in the comments.

Some useful Links:

Correlation of two ab(Normals) | ISI MStat 2016 PSB Problem 6

This problem is an interesting application of the moment generating function of normal random variable to see how the correlation behaves under monotone function. This is the problem 6 from ISI MStat 2016 PSB.

Problem

Suppose that random variables $X$ and $Y$ jointly have a bivariate normal distribution with $\mathrm{E}(X)=\mathrm{E}(Y)=0, {Var}(X)={Var}(Y)=1,$ and
correlation $\rho$. Compute the correlation between $e^{X}$ and $e^{Y}$.

Prerequisites

Correlation Coefficient
Moment Generating Function
Moment Generating Function $M_X(t)$ of Normal $X$ ~ $N(\mu, \sigma^2)$ is $ e^{t\mu + \frac{\sigma^2t^2}{2}} $.

Solution

$M_X(t) = E(e^{tX})$ is called the moment generating function.

Now, let's try to calculate $ Cov(e^X, e^Y) = E(e^{X+Y}) - E(e^X)E(e^Y)$

For, that we need to have the following in our arsenal.

$X$ ~ $ N(0, 1)$
$Y$ ~ $ N(0, 1)$
$X+Y$ ~ $ N(0, \sigma^2 = 2(1+\rho))$ [ We will calculate this $\sigma^2$ just below ].

$ \sigma^2 = Var(X+Y) = Var(X) + 2Cov(X,Y) + Var(Y) = 1 + 2\rho + 1 = 2(1+\rho) $.

Now observe the following:

$E(e^{X+Y}) = M_{X+Y}(1) = e^{1+\rho} $
$E(e^X) = M_{X}(1) = e^{\frac{1}{2}} $
$ E(e^Y) = M_{Y}(1) = e^{\frac{1}{2}} $
$ \Rightarrow Cov(e^X, e^Y) = E(e^{X+Y}) - E(e^X)E(e^Y) = e^{1+\rho} - e = e(e^{\rho} - 1)$

Important Observation

$ Cor(e^X, e^Y)$ and $Cor(X,Y) = \rho $ always have the same sign. Can you guess why? There is, in fact, a general result, which we will mention soon.

Now, we are left to calculate $Var(e^X) = (Var(e^Y)$.

$Var(e^X) = E(e^{2X}) - (E(e^{X}))^2 = M_X(2) - M_X(1)^2 = e^{\frac{4}{2}} - (e^{\frac{1}{2}})^2 = e^2 - e^1 = Var(e^Y)$.

Therefore, $ Cor(e^X, e^Y) = \frac{e(e^{\rho} - 1)}{e^2 - e^1} =\frac{e^{\rho} - 1}{e - 1} $.

Observe that the mininum correlation of $e^X$ and $e^Y$ is $\frac{-1}{e}$.

Back to the important observation

$ Cor(e^X, e^Y)$ and $Cor(X,Y) = \rho $ always have the same sign. Why is this true?

Because, $ f(x) = e^x$ is an increasing function. So, if $X$ and $Y$ are positively correlated then, as $X$ increases, $Y$ also increases in general, hence, $e^X$ also increases along with $e^Y$ hence, the result, which is quite intuitive.

Observe that in place of $ f(x) = e^x $ if we would have taken, any increasing function $f(x)$, this will be the case. Can you prove it?

Research Problem of the day ( Is the following true? )

Let $f(x)$ be an increasing function of $x$, then

$ Cor(f(X), f(Y)) > 0 \iff Cor(X, Y) > 0$
$Cor(f(X), f(Y)) = 0 \iff Cor(X, Y) = 0$
$ Cor(f(X), f(Y)) < 0 \iff Cor(X, Y) < 0$

Cycles, Symmetry, and Counting | ISI MStat 2016 PSB | Problem 2

This problem is a beautiful and elegant application of basic counting principles, symmetry and double counting principles in combinatorics. This is Problem 2 from ISI MStat 2016 PSB.

Problem

Determine the average value of
$$
i_{1} i_{2}+i_{2} i_{3}+\cdots+i_{9} i_{10}+i_{10} i_{1}
$$ taken over all permutations $i_{1}, i_{2}, \ldots, i_{10}$ of $1,2, \ldots, 10$.

Prerequisites

Permutation and Combination
Symmetry Argument in Counting

Solution

The problem may seem mind boggling at first, when you will even try to do it for $n = 4$, instead of $n = 10$.

But, in mathematics, symmetry is really intriguing. Let's see how a symmetry argument holds here. It is just by starting to count. Let's see this problem in a geometrical manner.

$ i_{1}\to i_{2} \to i_{3} \to \cdots \to i_{9} \to i_{10} \to i_{1}$ is sort of a cycle right?

Now, the symmetry argument starts from this symmetric figure.

We will do the problem for general $n$.

Central Idea: Let's fix a pair say [ 4 - 5 ], we will see in all the permutations, in how many times, [ 4 - 5 ] can occur.

We will see that there is nothing particular about [ 4 - 5 ], and this is the symmetry argument. Therefore, the number is symmetric along with all such pairs.

Observe, along every such cycle containing [ 4 - 5 ], there are three parameters:

The position of the [ 4 - 5 ] in which edge of the cycle?
The permutation of that [ 4 - 5 ], as [ 4 - 5 ] or [ 5 - 4 ].
The number of arrangements of the remaining $ n - 2$ numbers.

The number corresponding to the above questions are the following:

$n$ options of position of edge since, there are $n$ edges.
2! = 2 ways of arranging.
$ (n-2)! $ ways of arranging the rest of the $ n -2$ numbers.

So, in total [ 4 - 5 ] edge will occur $ n \times 2! \times (n-2)! $ times.

By the symmetry argument, every edge [$ i - j$], will occur $ n \times 2! \times (n-2)! $ times.

Thus, when we sum over all such permutations, we get the following

$$ \sum_{i, j = 1; i \neq j}^{n} {ij} \times \text{number of times [i - j] pair occur} $$

$$ = \sum_{i, j = 1; i \neq j}^{n} {ij} \times n \times 2! \times (n-2)! = n \times (n-2)! \times \sum_{i, j = 1; i \neq j}^{n} {2ij} $$

Now, there are $ n!$ permutations in total. So, to take the average, we divide by $ n!$ to get $$ \frac{\sum_{i, j = 1; i \neq j}^{n} {2ij}}{n-1} $$

$$ = \frac{(\sum_{i}^{n} i)^2 - \sum_{i}^{n} i^2 }{n-1} $$

$$ = \frac{\frac{n^2(n+1)^2}{4} - \frac{n(n+1)(2n+1)}{6}}{n-1} = \frac{n(n+1)(3n+2)}{12} $$

Edit 1:

One of the readers, Vishal Routh has shared his solution using Conditional Expectation, I am sharing his solution in picture format.

Video Solution:

Restricted Regression Problem | ISI MStat 2017 PSB Problem 7

This problem is a regression problem, where we use the ordinary least square methods, to estimate the parameters in a restricted case scenario. This is ISI MStat 2017 PSB Problem 7.

Problem

Consider independent observations ${\left(y_{i}, x_{1 i}, x_{2 i}\right): 1 \leq i \leq n}$ from the regression model
$$
y_{i}=\beta_{1} x_{1 i}+\beta_{2} x_{2 i}+\epsilon_{i}, i=1, \ldots, n
$$ where $x_{1 i}$ and $x_{2 i}$ are scalar covariates, $\beta_{1}$ and $\beta_{2}$ are unknown scalar
coefficients, and $\epsilon_{i}$ are uncorrelated errors with mean 0 and variance $\sigma^{2}>0$. Instead of using the correct model, we obtain an estimate $\hat{\beta_{1}}$ of $\beta_{1}$ by minimizing
$$
\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}
$$ Find the bias and mean squared error of $\hat{\beta}_{1}$.

Prerequisites

Ordinary Least Square Method
Minimizing the Square Loss Error Function
Multiple Regression
Mean Square Error = $\text{Variance} + \text{Bias}^2$.
Bias

Solution

It is sort of a restricted regression problem because maybe we have tested the fact that $\beta_2 = 0$. Hence, we are interested in the estimate of $\beta_1$ given $\beta_2 = 0$. This is essentially the statistical significance of this problem, and we will see how it turns out in the estimate of $\beta_1$.

Let's start with some notational nomenclature.
$ \sum_{i=1}^{n} a_{i} b_{i} = s_{a,b} $

Let's minimize $ L(\beta_1) = \sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2}$ by differentiating w.r.t $\beta_1$ and equating to 0.

$ \frac{dL(\beta_1)}{d\beta_1}\sum_{i=1}^{n}\left(y_{i}-\beta_{1} x_{1 i}\right)^{2} = 0$

$ \Rightarrow \sum_{i=1}^{n} x_{1 i} \left(y_{i}-\beta_{1} x_{1 i}\right) = 0 $

$ \Rightarrow \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}} $

From, the given conditions, $ E(Y_{i})=\beta_{1} X_{1 i}+\beta_{2} X_{2 i}$.

$ \Rightarrow E(s_{X_{1},Y}) = \beta_{1}s_{X_{1},X_{1}} +\beta_{2} s_{X_{1},X_{2}} $.

Since, $x's$ are constant, $ E(\hat{\beta_1}) = \beta_{1} +\beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} $.

$ Bias(\hat{\beta_1}) = \beta_{2} \frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}} $.

Thus, observe that the more $ \beta_2 $ is close to 0, the more bias is close to 0.

From, the given conditions,

$ Y_{i} - \beta_{1} X_{1 i} - \beta_{2} X_{2 i}$ ~ Something$( 0 , \sigma^2$).

$ \hat{\beta_1} = \frac{s_{x_{1},y}}{s_{x_{1},x_{1}}}$ ~ Something$( E(\hat{\beta_{1}}) , Var(\hat{\beta_1}))$.

$Var(\hat{\beta_1}) = \frac{\sum_{i=1}^{n} x_{1i}^2 Var(Y_{i})}{s_{X_1, X_1}^2} = \frac{\sigma^2}{s_{X_1, X_1}} $

$ MSE(\hat{\beta_1}) = Variance + \text{Bias}^2 = \frac{\sigma^2}{s_{X_1, X_1}} + \beta_{2}^2(\frac{s_{X_{1},X_{2}}}{s_{X_{1},X_{1}}})^2$

Observe, that even the MSE is minimized if $\beta_2 = 0$.

Lock and Key | ISI MStat 2017 PSB | Problem 6

This problem is a beautiful and elegant probability based on an elementary problem on how to effectively choose the key to a lock. This gives a simulation environment to problem 6 of ISI MStat 2017 PSB.

Problem

Suppose you have a 4-digit combination lock, but you have forgotten the correct combination. Consider the following three strategies to find the correct one:
(i) Try the combinations consecutively from 0000 to 9999.
(ii) Try combinations using simple random sampling with replacement from the set of all possible combinations.
(iii) Try combinations using simple random sampling without replacement from the set of all possible combinations.

Assume that the true combination was chosen uniformly at random from all possible combinations. Determine the expected number of attempts needed to find the correct combination in all three cases.

This problem really intrigues me, which gives me the excitement to solve and solve it.

Prerequisite

Expectation
Simple Random Sampling With and Without Replacement
Discrete Uniform Distribution
Geometric Distribution
Smoothing of Expectation $ E_X(E(Y|X)) = E(Y)$. When $Y$ and $X$ have some relationship, this helps us to calculate the expectation.

Solution

Consecutive Combination

$U$ ~ Discrete Uniform $({0, 1, 2, ..., 9999})$

Suppose, observe that if you select the keys consecutively, then for the true key $U$, you need $U$ attempts. (*)

$N$ denotes the number of attempts required = $U + 1$ due to (*)

$ E(N) = E(U) = \frac{9999}{2}$.

Simple Random Sampling With Replacement

This is something no one does, but let's calculate this and see why we don't do this and why we need to remember the keys that don't work like SRSWOR, which is the next case.

$U$ ~ Discrete Uniform $({0, 1, 2, ..., 9999})$

$N$ denotes the number of attempts required. $ E_U(E(N|U)) = E(N)$

Let's say, we have observed $U$, which is fixed and we will calculate $E(N|U)$.

Observe that $N|U$ ~ Geom(\frac{1}{10000}\), since, there are unlimited trials and success occurs if you pick up the right key $U$, which has a probability of $\frac{1}{10000}$.

Therefore, $E(N|U) = 10000$. Hence, $ E(N) = E_U(E(N|U)) = 10000$

Let's simulate it.

#Simple Random Sampling with Replacement

NUM = 0
size = 1000 # we have taken 1000 for easier calculation
key = sample(size,1)
number = NULL
random = sample(size,1)
N = 1
for (k in 1:1000) {
  number = NULL
  
  for (j in 1:100)
  {
    while (random != key) 
    {
      random = sample(size,1)
      N = N + 1
    }
    number = c(number,N)
    random = sample(size,1)
    N = 1
  }
  NUM = c(NUM,mean(number))
}
mean(NUM)
980.899
hist(NUM) 
#Note Replace = TRUE will not work, since, this is an open-ended program

Hence, this is validated by our simulation.

Simple Random Sampling Without Replacement

This is the sort of key selection, we usually do. Let's investigate it.

$U$ ~ Discrete Uniform $({0, 1, 2, ..., 9999})$

$N$ denotes the number of attempts required. $ E_U(E(N|U)) = E(N)$

Let's say, we have observed $U$, which is fixed and we will calculate $E(N|U)$.

$ p_i = P ((N|U) = i) = \frac{9999}{10000}.\frac{9998}{9999}.\frac{9997}{9998}...\frac{10001-i}{10002-i}.\frac{1}{10001-i} = \frac{1}{10000}$

$E(N|U) = \sum_{i = 0}^{9999} ip_i = \sum_{i = 0}^{9999} i \cdot \frac{1}{10000} = \frac{9999}{2} $. Hence, $ E(N) = E_U(E(N|U)) = \frac{9999}{2}$.

#Simple Random Sampling without Replacement
average = NULL
number = NULL
size = 10000
key = sample(size,1)
for (j in 1:1000)
{
for (i in 1:100) 
    {
      option = sample(size,size, replace = FALSE)
      v = which(option == key)
      number = c(number,v)
    }
average = c(average,mean(number))
}  
mean(average)
4996.567
hist(average, freq = FALSE)

a graph of lock and key problem — Close to 4999.5

Stay tuned!

Stay Blessed!

A Telescopic Sequence| ISI MStat 2018 PSB Problem 2

This is a beautiful problem from ISI MStat 2018 problem 2, which uses the cute little ideas of telescopic sum and partial fractions.

Problem

Let $\{x_{n}\}_{n \geq 1}$ be a sequence defined by $x_{1}=1$ and
$$
x_{n+1}=\left(x_{n}^{3}+\frac{1}{n(n+1)(n+2)}\right)^{1 / 3}, \quad n \geq 1
$$
Show that $\{x_{n}\}_{n \geq 1}$ converges and find its limit.

Prerequisities

Limit of a Sequence
Partial Fraction $ \frac{1}{n(n+1)(n+2)} = \frac1{n(n+1)(n+2)}=\frac12\cdot\frac1n-\frac1{n+1}+\frac12\cdot\frac1{n+2} = -\frac12\left(\underbrace{\frac1{n+1} -\frac1n}\right)+\frac12\left(\underbrace{\frac1{n+2}-\frac1{n+1}}\right)$

Telescopic Sum $ \sum_{i = 1}^{\infty} \left(\underbrace{\frac1{i+1} -\frac1i}\right) = \lim_{n \to \infty} \frac1n - 1 = -1 $

Solution

$x_{n+1} = (x_{n}^{3}+\frac{1}{n(n+1)(n+2)})^{1 / 3} \Rightarrow {x_{n+1}}^3 = x_{n}^{3}+\frac{1}{n(n+1)(n+2)}$

$ \Rightarrow {x_{n+1}}^3 - x_{n}^{3} = \frac{1}{i(i+1)(i+2)}; x_1 = 1$.

$ \Rightarrow \sum_{i = 1}^{n-1} {x_{i+1}}^3 - x_{i}^{3} = \sum_{i = 1}^{n-1} \frac{1}{n(n+1)(n+2)} ; x_1 = 1$.

$ x_{n}^{3} - x_{1}^{3} = \sum_{i = 1}^{n-1} \frac{1}{i(i+1)(i+2)} = \sum_{i = 1}^{n-1} -\frac12\left(\underbrace{\frac1{i+1} -\frac1i}\right)+\frac12\left(\underbrace{\frac1{i+2}-\frac1{i+1}}\right)$

$\lim_{n \to \infty} (x_{n}^{3} - x_{1}^{3}) = \sum_{i = 1}^{\infty} \frac{1}{i(i+1)(i+2)} = \sum_{i = 1}^{\infty} -\frac12\left(\underbrace{\frac1{i+1} -\frac1i}\right)+\frac12\left(\underbrace{\frac1{i+2}-\frac1{i+1}}\right) = \frac14 $

$ \lim_{n \to \infty} x_{n}^{3} = \frac54 \Rightarrow \lim_{n \to \infty} x_{n} = ({\frac54})^\frac13 $.

The Unique Decomposition | ISI MStat 2015 PSB Problem 3

The solution plays with Eigen values and vectors to solve this cute and easy problem in Linear Algebra from the ISI MStat 2015 problem 3.

Problem

Let $A$ be a real valued and symmetric $n \times n$ matrix with entries such that $A \neq \pm I$ and $A^{2}=I$.
(a) Prove that there exist non-zero column vectors $v$ and $w$ such that
$A v=v$ and $A w=-w$.
(b) Prove that every vector $z$ has a unique decomposition $z=x+y$
where $A x=x$ and $A y=-y$.

This problem is from ISI MStat 2015 PSB ( Problem #3).

Prerequisites

Eigen values and Eigen vectors

Solution

(a)

Let's say $\lambda$ is an eigenvalue of $A$. Let's explore the possibilities of $\lambda$.

$Av= \lambda v \Rightarrow A^2v= {\lambda}^2 v \Rightarrow Iv= {\lambda}^2 v \Rightarrow v= {\lambda}^2 v $. Since, $ v$ is arbitrary, we get ${\lambda}^2 = 1 \Rightarrow \lambda = \pm 1$.

Since $A$ is real symmetric, it has real eigenvalues, and the possibilities are 1 and -1. Since, $A \neq \pm I$, there exists non-zero column vectors $v$ and $w$ such that $A v=1.v$ and $A w=-1.w$.

(b)

Suppose $z$ has two decompositions $z= x+y = x'+y'$ where $A x=x$ and $A y=-y$ and $A x'=x'$ and $A y'=-y'$.

Tberefore, $ A(x+y) = A(x'+y') \Rightarrow Ax+Ay = Ax'+Ay' \Rightarrow x - y = x' - y'$.

But, we also have $ x+y = x'+y'$. Thus, by adding and subtracting, we get $x = x', y = y' $.

Invariant Regression Estimate | ISI MStat 2016 PSB Problem 7

This cute little problem gives us the wisdom that when we minimize two functions at a single point uniquely, then their sum is also minimized at the same point. This Invariant Regression Estimate is applied to calculate the least square estimates of two group regression from ISI MStat 2016 Problem 7.

Problem- Invariant Regression Estimate

Suppose ${(y_{i}, x_{1 i}, x_{2 i}, \ldots, x_{k i}): i=1,2, \ldots, n_{1}+n_{2}}$ represents a set of multivariate observations. It is found that the least squares linear regression fit of $y$ on $\left(x_{1}, \ldots, x_{k}\right)$ based on the first $n_{1}$ observations is the same as that based on the remaining $n_{2}$ observations, and is given by
$y=\hat{\beta}_{0}+\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}$
If the regression is now performed using all $\left(n_{1}+n_{2}\right)$ observations, will the regression equation remain the same? Justify your answer.

Prerequisites

$f(\tilde{x})$ and $g(\tilde{x})$ are both uniquely minimized at $ \tilde{x} = \tilde{x_0}$, then $f(\tilde{x}) + g(\tilde{x})$ is uniquely minimized at $ \tilde{x} = \tilde{x_0}$.

Solution

Observe that we need to find the OLS estimates of ${\beta}{i} \forall i $.

$f(\tilde{\beta}) = \sum_{i = 1}^{n_1} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 $, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) $

$g(\tilde{\beta}) = \sum_{i = n_1}^{n_1+n_2} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 $, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) $

$\hat{\tilde{\beta}} = (\hat{{\beta}_0}, \hat{{\beta}_1}, ..., \hat{{\beta}_k )} $

$ h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta}) = \sum_{i = 1}^{n_1+n_2} (y - {\beta}_{0} - (\sum_{j=1}^{k} \hat{\beta}_{j} x_{j}))^2 $, where $\tilde{\beta} = ({\beta}_0, {\beta}_1, ..., {\beta}_k ) $.

Now, $ h(\tilde{\beta})$ is the loss squared erorr under the grouped regression, which needs to be minimized with respect to $\tilde{\beta} $.

Now, by the given conditions, $f(\tilde{\beta})$ and $g(\tilde{\beta})$ are both uniquely minimized at $ \hat{\tilde{\beta}}$, therefore $h(\tilde{\beta}) = f(\tilde{\beta}) + g(\tilde{\beta})$ will be uniquely minimized at $\hat{\tilde{\beta}}$ by the prerequisite.

Hence, the final estimate of $\tilde{\beta}$ will be $ \hat{\tilde{\beta}}$.

Discover the Covariance | ISI MStat 2016 Problem 6

This problem from ISI MStat 2016 is an application of the ideas of indicator and independent variables and covariance of two summative random variables.

Problem- Covariance Problem

Let $X_{1}, \ldots, X_{n}$ ~ $X$ be i.i.d. random variables from a continuous distribution whose density is symmetric around 0. Suppose $E\left(\left|X\right|\right)=2$ . Define $ Y=\sum_{i=1}^{n} X_{i} \quad \text { and } \quad Z=\sum_{i=1}^{n} 1\left(X_{i}>0\right)$.
Calculate the covariance between $Y$ and $Z$.

This problem is from ISI MStat 2016 (Problem #6)

Prerequisites

X has Symmetric Distribution around 0 $ \Rightarrow E(X) = 0$.
$ |X| = X.1( X > 0 ) - X.1( X \leq 0 ) = 2X.1( X > 0 ) - X$, where $X$ is a random variable.
$ X_i$ and $X_j$ are independent $\Rightarrow$ $g( X_i)$ and $f(X_j)$ are independent.
$A$ and $B$ are independent $\Rightarrow Cov(A,B) = 0$.

Solution

$ 2 = E(|X|) = E(X.1(X >0)) - E(X.1(X \leq 0)) = E(2X.1( X > 0 )) - E(X) = 2E(X.1( X > 0 ))$

$ \Rightarrow E(X.1( X > 0 )) = 1 \overset{E(X) = 0}{\Rightarrow} Cov(X, 1( X > 0 )) = 1$.

Let's calculate the covariance of $Y$ and $Z$.

$ Cov(Y, Z) = \sum_{i,j = 1}^{n} Cov( X_i, 1(X_{j}>0))$

$ = \sum_{i = 1}^{n} Cov( X_i, 1(X_{i}>0)) + \sum_{i,j = 1, i \neq j}^{n} Cov( X_i, 1(X_{j}>0)) $

$ \overset{X_i \text{&} X_j \text{are independent}}{=} \sum_{i = 1}^{n} Cov( X_i, 1(X_{i}>0)) = \sum_{i = 1}^{n} 1 = n $.

Tracing the Trace | ISI MStat 2016 PSB Problem 3

This ISI MStat 2016 problem is an application of the ideas of tracing the trace and Eigen values of a matrix and using a cute sum of squares identity.

Problem- Tracing the Trace

Suppose A is an $n × n$ real symmetric matrix such that
$Tr(A^2) = T r(A) = n$. Show that all the eigenvalues of A are equal to 1.

This problem is from ISI MStat 2016 PSB ( Problem #3)

Prerequisites

Trace of a Matrix
Eigen values of $A^n$ w.r.t to the eigen values of $A$.
Sum of Squares $\geq 0$.

Solution

$ Av = {\lambda}v \Rightarrow A^nv = {\lambda}^nv$.

Since, A is a real symmetric matrix, then all the eigen values of the matrix A are real say {$ {\lambda}_1, {\lambda}_2, ..., {\lambda}_n$}.

$Tr(A^2) = \sum_{i=1}^{n} {{\lambda}_i}^2 = Tr(A) = \sum_{i=1}^{n} {{\lambda}_i} = n$

$ \Rightarrow n\sum_{i=1}^{n} {{\lambda}_i}^2 = (\sum_{i=1}^{n} {{\lambda}_i})^2$

$ \Rightarrow (n-1)\sum_{i=1}^{n} {{\lambda}_i}^2 = \sum_{i, j = 1, i \neq j }^{n} 2{\lambda}_i{\lambda}_j$

$ \Rightarrow \sum_{i=1}^{n} ({{\lambda}_i - {\lambda}_j })^2 = 0$

$ \Rightarrow {\lambda}_i = {\lambda}_j = \lambda \forall i \neq j $

$ \Rightarrow Tr(A) = n\lambda = n \Rightarrow \lambda = 1$.

\(X\) is some event. \(Y\) is another event. How much information do \(Y\) and \(X\) give about each other?

\(X\) and \(Y\) are two random variables. How much information do \(Y\) and \(X\) give about each other?

\( X \mid Y\) ~ \(X\) \(\iff\) \( X \perp \!\!\! \perp Y\)

Theorem

Proof

Exercises

Random Thoughts (?)

How to quantify the amount of information contained by a random variable in another random variable?

Exercise

Some useful Links:

Problem

Prerequisites

Solution

Important Observation

Back to the important observation

Problem

Prerequisites

Solution

Edit 1:

Video Solution:

Problem

Prerequisites

Solution

Problem

Prerequisite

Solution

Consecutive Combination

Simple Random Sampling With Replacement

Let's simulate it.

Simple Random Sampling Without Replacement

Problem

Prerequisities

Solution

Problem

Prerequisites

Solution

(a)

(b)

Problem- Invariant Regression Estimate

Prerequisites

Solution

Problem- Covariance Problem

Prerequisites

Solution

Problem- Tracing the Trace

Prerequisites

Solution