Select Page

This concept of independence, conditional probability and information contained always fascinated me. I have thus shared some thoughts upon this.

When do you think some data is useless?

Some data/ information is useless if it has no role in understanding the hypothesis we are interested in.

We are interested in understanding the following problem.

### $X$ is some event. $Y$ is another event. How much information do $Y$ and $X$ give about each other?

We can model an event by a random variable. So, let’s reframe the problem as follows.

### $X$ and $Y$ are two random variables. How much information do $Y$ and $X$ give about each other?

There is something called entropy. But, I will not go into that. Rather I will give a probabilistic view only. The conditional probability marches in here. We have to use the idea that we have used the information of $Y$, i.e. conditioned on $Y$. Hence, we will see how $X \mid Y$ will behave?

How does $X \mid Y$ behave? If $Y$ has any effect on $X$, then $X \mid Y$ would have changed right?

But, if $Y$ has no effect on $X$, then $X \mid Y$ will not change and remain same as X. Mathematically, it means

## $X \mid Y$ ~ $X$ $\iff$ $X \perp \!\!\! \perp Y$

We cannot distinguish between the initial and the final even after conditioning on $Y$.

## Theorem

$X$ and $Y$ are independent $\iff$ $f(x,y) = P(X =x \mid Y = y)$ is only a function of $x$.

#### Proof

$\Rightarrow$

$X$ and $Y$ are independent $\Rightarrow$ $f(x,y) = P(X =x \mid Y = y) = P(X = x)$ is only a function of $x$.

$\Leftarrow$

Let $\Omega$ be the support of $Y$.

$P(X =x \mid Y = y) = g(x) \Rightarrow$

$P(X=x) = \int_{\Omega} P(X =x \mid Y = y).P(Y = y)dy$

$= g(x) \int_{\Omega} P(Y = y)dy = g(x) = P(X =x \mid Y = y)$

## Exercises

1. $(X,Y)$ is a bivariate standard normal with $\rho = 0.5$ then $2X – Y \perp \!\!\! \perp Y$.
2. $X, Y, V, W$ are independent standard normal, then $\frac{VX + WY}{\sqrt{V^2+W^2}} \perp \!\!\! \perp (V,W)$.

## Random Thoughts (?)

#### How to quantify the amount of information contained by a random variable in another random variable?

Information contained in $X$ = Entropy of a random variable $H(X)$ is defined by $H(X) = E(-log(P(X))$.

Now define the information of $Y$ contained in $X$ as $\mid H(X) – H(X|Y) \mid$.

Thus, it turns out that $H(X) – H(X|Y) = E_{(X,Y)} (log(\frac{P(X \mid Y)}{P(X)})) = H(Y) – H(Y|X) = D(X,Y)$.

$D(X,Y)$ = Amount of information contained in $X$ and $Y$ about each other.

## Exercise

• Prove that $H(X) \geq H(f(X))$.
• Prove that $X \perp \!\!\! \perp Y \Rightarrow D(X,Y) = 0$.

Note: This is just a mental construction I did, and I am not sure of the existence of the measure of this information contained in literature. But, I hope I have been able to share some statistical wisdom with you. But I believe this is a natural construction, given the properties are satisfied. It will be helpful, if you get hold of some existing literature and share it to me in the comments.