Categories

# Useless Data, Conditional Probability, and Independence | Cheenta Probability Series

This concept of independence, conditional probability and information contained always fascinated me. I have thus shared some thoughts upon this.

This concept of independence, conditional probability and information contained always fascinated me. I have thus shared some thoughts upon this.

When do you think some data is useless?

Some data/ information is useless if it has no role in understanding the hypothesis we are interested in.

We are interested in understanding the following problem.

### $$X$$ is some event. $$Y$$ is another event. How much information do $$Y$$ and $$X$$ give about each other?

We can model an event by a random variable. So, let’s reframe the problem as follows.

### $$X$$ and $$Y$$ are two random variables. How much information do $$Y$$ and $$X$$ give about each other?

There is something called entropy. But, I will not go into that. Rather I will give a probabilistic view only. The conditional probability marches in here. We have to use the idea that we have used the information of $$Y$$, i.e. conditioned on $$Y$$. Hence, we will see how $$X \mid Y$$ will behave?

How does $$X \mid Y$$ behave? If $$Y$$ has any effect on $$X$$, then $$X \mid Y$$ would have changed right?

But, if $$Y$$ has no effect on $$X$$, then $$X \mid Y$$ will not change and remain same as X. Mathematically, it means

## $$X \mid Y$$ ~ $$X$$ $$\iff$$ $$X \perp \!\!\! \perp Y$$

We cannot distinguish between the initial and the final even after conditioning on $$Y$$.

## Theorem

$$X$$ and $$Y$$ are independent $$\iff$$ $$f(x,y) = P(X =x \mid Y = y)$$ is only a function of $$x$$.

#### Proof

$$\Rightarrow$$

$$X$$ and $$Y$$ are independent $$\Rightarrow$$ $$f(x,y) = P(X =x \mid Y = y) = P(X = x)$$ is only a function of $$x$$.

$$\Leftarrow$$

Let $$\Omega$$ be the support of $$Y$$.

$$P(X =x \mid Y = y) = g(x) \Rightarrow$$

$$P(X=x) = \int_{\Omega} P(X =x \mid Y = y).P(Y = y)dy$$

$$= g(x) \int_{\Omega} P(Y = y)dy = g(x) = P(X =x \mid Y = y)$$

## Exercises

1. $$(X,Y)$$ is a bivariate standard normal with $$\rho = 0.5$$ then $$2X – Y \perp \!\!\! \perp Y$$.
2. $$X, Y, V, W$$ are independent standard normal, then $$\frac{VX + WY}{\sqrt{V^2+W^2}} \perp \!\!\! \perp (V,W)$$.

## Random Thoughts (?)

#### How to quantify the amount of information contained by a random variable in another random variable?

Information contained in $$X$$ = Entropy of a random variable $$H(X)$$ is defined by $$H(X) = E(-log(P(X))$$.

Now define the information of $$Y$$ contained in $$X$$ as $$\mid H(X) – H(X|Y) \mid$$.

Thus, it turns out that $$H(X) – H(X|Y) = E_{(X,Y)} (log(\frac{P(X \mid Y)}{P(X)})) = H(Y) – H(Y|X) = D(X,Y)$$.

$$D(X,Y)$$ = Amount of information contained in $$X$$ and $$Y$$ about each other.

## Exercise

• Prove that $$H(X) \geq H(f(X))$$.
• Prove that $$X \perp \!\!\! \perp Y \Rightarrow D(X,Y) = 0$$.

Note: This is just a mental construction I did, and I am not sure of the existence of the measure of this information contained in literature. But, I hope I have been able to share some statistical wisdom with you. But I believe this is a natural construction, given the properties are satisfied. It will be helpful, if you get hold of some existing literature and share it to me in the comments. 