Conditions and Chance | ISI MStat 2018 PSB Problem 5

This problem is a cute application of joint distribution and conditional probability. This is the problem 5 from ISI MStat 2018 PSB.

Problem

Suppose \(X_{1}\) and \(X_{2}\) are identically distributed random variables, not necessarily independent, taking values in \(\{1,2\}\). If \(\mathrm{E}\left(X_{1} X_{2}\right)= \frac{7}{3} \) and \(\mathrm{E}\left(X_{1}\right) = \frac{3}{2},\) obtain the joint distribution of \(\left(X_{1}, X_{2}\right)\).

Prerequisites

Solution

This problem is mainly about crunching the algebra of the conditions and get some good conditions for you to easily trail your path to the solution.

Usually, we go forward starting with the distribution of \(X_1\) and \(X_2\) to the distribution of (\(X_1, X_2\)). But, we will go backward from the distribution of (\(X_1, X_2\)) to \(X_1\), \(X_2\) and \(X_1X_2\)with the help of conditional probability.

conditional probability

Now, observe \(p_{21} = p_{12}\) because \(X_1\) and \(X_2\) are identically distributed.

Let's calculate the following:

\(P(X_1 = 1 )= p_{11} + p_{12} = P(X_2 = 1)\)

\(P(X_1 = 2) = p_{12} + p_{22} = P(X_2 = 2)\)

\(E(X_1) = p_{11} + 3p_{12} + 2p_{22} = \frac{3}{2}\)

Now, \(X_1X_2\) can take values {\(1, 2, 4\)}.

\(X_1 = 1, X_2 = 1 \iff X_1X_2 = 1\) \( \Rightarrow P(X_1X_2 = 2) = p_{11}\).

\(X_1 = 2, X_2 = 2 \iff X_1X_2 = 4\) \( \Rightarrow P(X_1X_2 = 4) = p_{22}\).

\(X_1 = 1, X_2 = 1\) or \(X_1 = 2, X_2 = 1 \iff X_1X_2 = 2\) \( \Rightarrow P(X_1X_2 = 2) = 2p_{12}\).

\(E(X_1X_2) = p_{11} + 4p_{12} + 4p_{22} = \frac{7}{3}\).

Now, we need another condition, do you see that ?

\(p_{11} + 2p_{12} + p_{44} = 1\).

Now, you can solve it easily to get the solutions \( p_{11} = \frac{1}{3}, p_{12} = \frac{1}{6}, p_{22} =\frac{1}{3} \).

Food for Thought

Now, what do you think, how many expectation values will be required if \(X_1\) and \(X_2\) takes values in {1, 2, 3}?

What if \(X_1\) and \(X_2\) takes values in {\(1, 2, 3, 4, ..., n\)}?

What if there are \(X_1, X_2, ...., X_n\) taking values in {\(1, 2, 3, 4, ..., m\)}?

This is just another beautiful counting problem.

Enjoy and Stay Tuned!

Application of Cauchy Functional Equations | ISI MStat 2019 PSB Problem 4

This problem is a beautiful application of the probability theory and cauchy functional equations. This is from ISI MStat 2019 PSB problem 4.

Problem - Application of Cauchy Functional Equations

Let \(X\) and \(Y\) be independent and identically distributed random variables with mean \(\mu>0\) and taking values in {\(0,1,2, \ldots\)}. Suppose, for all \(m \geq 0\)
$$
\mathrm{P}(X=k | X+Y=m)=\frac{1}{m+1}, \quad k=0,1, \ldots, m
$$
Find the distribution of \(X\) in terms of \(\mu\).

Prerequisites

Solution

Let \( P(X =i) = p_i\) where $$\sum_{i=0}^{\infty} p_i = 1$$. Now, let's calculate \(P(X+Y = m)\).

$$P(X+Y = m) = \sum_{i=0}^{m} P(X+Y = m, X = i) = \sum_{i=0}^{m} P(Y = m-i, X = i) = \sum_{i=0}^{m} p_ip_{m-i}$$.

$$P( X = k|X+Y = m) = \frac{P( X = k, X+Y = m)}{P(X+Y = m)} = \frac{P( X = k, Y = m-k)}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{p_kp_{m-k}}{\sum_{i=0}^{m} p_ip_{m-i}} = \frac{1}{m+1}$$.

Hence,$$ \forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0$$.

Thus, we get the following set of equations.

$$ p_0p_2 = p_1^2$$ $$ p_0p_3 = p_1p_2$$ Hence, by the thrid prerequisite, \(p_0, p_1, p_2, p_3\) are in geometric progression.

Observe that as a result we get \( p_1p_3 =p_2^2 \). In the line there is waiting:

$$ p_1p_4 = p_2p_3$$. Thus, in the similar way we get \(p_1, p_2, p_3, p_4\) are in geometric progression.

Hence, by induction, we will get that \(p_k; k \geq 0\) form a geometric progression.

This is only possible if \(X, Y\) ~ Geom(\( p\)). We need to find \(p\) now, but here \(X\) counts the number of failures, and \(p\) is the probability of success.

So, \(E(X) = \frac{1-p}{p} = \mu \Rightarrow p = \frac{1}{\mu +1}\).

Challenge Problem

So, can you guess, what it will be its continous version?

It will be the exponential distribution. Prove it. But, what exactly is governing this exponential structure? What is the intuition behind it?

The Underlying Mathematics and Intuition

Observe that the obtained condition

$$ \forall m \geq 0, p_0p_m =p_1p_{m-1} = \dots = p_mp_0.$$ can be written as follows

Find all such functions \(f: \mathbf{N}_0 \to \mathbf{N}_0\) such that \(f(m)f(n) = f(m+n)\) and with the summation = 1 restriction property.

The only solution to this geometric progression structure. This is a variant of the Cauchy Functional Equation. For the continuous case, it will be exponential distribution.

Essentially, this is the functional equation that arises, if you march along to prove that the Geometric Random Variable is the only discrete distribution with the memoryless property.

Stay Tuned!

Central Limit Theorem by Simulation ( R Studio)

This post verifies central limit theorem with the help of simulation in R for distributions of Bernoulli, uniform and poisson.

Central Limit Theorem

Mathematicaly, in \(X_1, X_2, …, X_n\) are random samples taken from a popualaton with mean \(\mu\) and finte variance \(\sigma^2\) and \(\bar{X}\) is the sampe mean, then \(Z = \frac{\sqrt{n}(\bar{X}-\mu)}{\sigma} \to N(0,1) \).

Simulation

Pseudocode

N # Number of trials (population size)
n # Number of simulations
standardized_sample_mean = rep(0,n)
EX #Expectation
VarX #Variance
  for (i in 1:n){
    samp #Sample from any distribution
    sample_mean <- mean(samp) # Sample mean
    standardized_sample_mean[i] <- sqrt(N)*(sample_mean - EX)/sqrt(VarX)
#Standardized Sample Mean
  }
hist(standardized_sample_mean,prob=TRUE) #Histogram
qqnorm(standardized_sample_mean) #QQPlot

Bernoulli \(\frac{1}{2}\)

N <- 2000 # Number of trials (population size)
n <- 1000 # Number of simulations
standardized_sample_mean = rep(0,n)
EX <- 0.5
VarX <- 0.25
  for (i in 1:n){
    samp <- rbinom(1, size = N, prob = 0.05)
    sample_mean <- mean(samp) # sample mean
    standardized_sample_mean[i] <- sqrt(N)*(sample_mean - EX)/sqrt(VarX)
  }
par(mfrow=c(1,2))
hist(standardized_sample_mean,prob=TRUE)
qqnorm(standardized_sample_mean)

Uniform \((0,1)\)

N <- 2000 # Number of trials (population size)
n <- 1000 # Number of simulations
standardized_sample_mean = rep(0,n)
EX <- 0.5
VarX <- 0.25
  for (i in 1:n
){
    samp <- runif( N, 0, 1)
    sample_mean <- mean(samp) # sample mean
    standardized_sample_mean[i] <- sqrt(N)*(sample_mean - EX)/sqrt(VarX)
  }
par(mfrow=c(1,2))
hist(standardized_sample_mean,prob=TRUE)
qqnorm(standardized_sample_mean)

Poission(1)

N <- 2000 # Number of trials (population size)
n <- 1000 # Number of simulations
standardized_sample_mean = rep(0,n)
EX <- 1
VarX <- 1
  for (i in 1:n){
    samp <- rpois(N,1)
    sample_mean <- mean(samp) # sample mean
    standardized_sample_mean[i] <- sqrt(N)*(sample_mean - EX)/sqrt(VarX)
  }
par(mfrow=c(1,2))
hist(standardized_sample_mean,prob=TRUE)
qqnorm(standardized_sample_mean)
Central Limit Theorem by simulation graph

Exercise

Try for other distributions and mixtures and play around and verify yourself.

Stay Tuned! Stay Blessed!

Elchanan Mossel's Dice Paradox | ISI MStat 2018 PSB Problem 6

This problem from ISI MStat 2018 PSB (Problem 6) is called the Elchanan Mossel's Dice Paradox. The problem has a paradoxical nature, but there is always a way out.

Problem

A fair 6 -sided die is rolled repeatedly until a 6 is obtained. Find the expected number of rolls conditioned on the event that none of the rolls yielded an odd number.

Prerequisites

Solution

The Wrong Solution

Let \(X_{1}, X_{2}, \cdots\) be the throws of a die. Let
$$
{T}=\min\{{n: X_{n}=6}\}
$$

Then \(T\) ~ Geo(\(p =\frac{1}{6}\))

But, here it is given that none of the rolls are odd numbers. So,

$$
{T}=\min\{{n: X_{n}=6} | X_n = \text{even}\} = \min\{{n: X_{n}=6} | X_n = \{2, 4, 6\}\}
$$

Then \(T\) ~ Geo(\(p =\frac{1}{3}\)) Since, there are three posibilities in the reduced (conditional) sample space.

So, \(E(T) =3\).

Obviously, this is false. But you are not getting why it is false right? Scroll Down!

Where it went wrong?

It went wrong in observing the given condition of the problem. Observe that it is given that none of the rolls are odd till the roll you got success, not for all the rolls beyond that also.

So, $$
{T}=\min\{{n: X_{n}=6} | X_n = \text{even}, n \leq T\} = \min\{{n: X_{n}=6} | X_n = \{2, 4, 6\}, n \leq T\}
$$

So, we are essentially seeing that the sample space didn't get reduced all along, it got reduced till that point of the roll. This is where the paradox marches in.

We are thinking of the experiment as we are picking up only \( \{ 2, 4, 6\} \) in the experiment and rolling. No!

The Elegant One Liner Solution

The idea is to think from a different perspective as with the case of every elegant solution. Let's reconstruct the experiment in a different way. It is like the following. Remember, we need to exclude the odd numbers, so just throw them away and start anew.

Idea

If you get \(\{1, 3, 5\}\), start counting the number of rolls again from the beginning. Stop when you get 6. This is the exact formulation of the waiting time to get a 6 without getting any odd numbers till that toss. We will show that our success is when we get \(\{1, 3, 5, 6\}\) in this experiment.

Mathematical Form

Let \(\tau\) be the time required to get an outcome different from \(\{2,4\}\) Then \(E(\tau | X_{\tau}=j)\) is independent of \(j\) for \(j \in \{1,3,5,6\}\) because it is same for all \( j\). Thus, by the smooting property of \(E\left(\tau | X_{\tau}=j\right)=E(\tau)\).

Observe, \(\tau\) ~ Geo( \( p = \frac{4}{6}\)). Hence, \( E(\tau) = \frac{3}{2}\).

The Bigger Bash Solution

\(T=\min \{n: X_{n}=6\}\)
We need to calculate \( \mathbb{E}(T | X_{1}, \cdots, X_T \in \{2,4,6\})\).

For that we need to find out the conditional probabilities \(\mathrm{P}\left(\mathrm{T}=\mathrm{k} | \mathrm{X}{1}, \cdots, \mathrm{X}{\mathrm{T}} \in{2,4,6}\right)\) and that is given by
$$
\frac{\mathrm{P}\left(\mathrm{T}=\mathrm{k} \cap\left(\mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{T}} \in \{2,4,6\}\right)\right)}{\mathrm{P}\left(\mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{T}} \in \{2,4,6\} \right)}=\frac{\mathrm{P}\left(X_{\mathrm{k}}=6, \mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{k}-1} \in \{2,4\} \right)}{\mathrm{P}\left(\mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{T}} \in \{2,4,6\} \right)}=\frac{1}{6}\left(\frac{1}{3}\right)^{\mathrm{k}-1} \frac{1}{\alpha}
$$
where \(\alpha=\mathrm{P}\left(\mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{T}} \in \{2,4,6\} \right)\) . Thus \(\mathrm{T} |\left(\mathrm{X}_{1}, \cdots, \mathrm{X}_{\mathrm{T}} \in \{2,4,6\} \right)\) follows a geometric distribution with parameter \(\frac{2}{3}\) and consequently its expectation is \(\frac{3}{2}\).

Stay Tuned! Stay Blessed!

Click here for Detailed Discussion

Simulation in Python

import random

times = 0 #number of times a successful (all-even) sequence was rolled
rolls = 0 #total of all number of rolls it took to get a 6, on successful sequences
curr = 0
alleven = True

for x in range(0, 100000):

  num = random.randint(1,6)
  if num % 2 != 0:
    alleven = False
  else:
    if num == 6:
      if alleven:
        times += 1
        rolls += curr + 1
      curr = 0
      alleven = True
    else:
      curr += 1

print(rolls * 1.0 / times)
#1.51506456241

Source: mathstackexachange

Stay Tuned! Stay Blessed!

Intertwined Conditional Probability | ISI MStat 2016 PSB Problem 4

This is an interesting problem from intertwined conditional probability and Bernoulli random variable mixture, which gives a sweet and sour taste to Problem 4 of ISI MStat 2016 PSB.

Problem

Let \(X, Y,\) and \(Z\) be three Bernoulli \(\left(\frac{1}{2}\right)\) random variables such that \(X\) and \(Y\) are independent, \(Y\) and \(Z\) are independent, and \(Z\) and \(X\) are independent.
(a) Show that \(\mathrm{P}(X Y Z=0) \geq \frac{3}{4}\).
(b) Show that if equality holds in (a), then $$
Z=
\begin{cases}
1 & \text { if } X=Y, \\
0 & \text { if } X \neq Y\\
\end{cases}
$$

Prerequisites

Solution

(a)

\( P(XYZ = 0) \iff P( { X = 0} \cup {Y = 0} \cup {Z = 0}) \)

$$= P(X = 0) + P(Y = 0) + P(Z= 0) - P({ X = 0} \cap {Y = 0}) - P({Y = 0} \cap {Z= 0}) - P({X = 0} \cap {Z= 0}) + P({X = 0} \cap {Y = 0} \cap {Z= 0}). $$

We use the fact that \(X\) and \(Y\) are independent, \(Y\) and \(Z\) are independent, and \(Z\) and \(X\) are independent.

$$= P(X = 0) + P(Y = 0) + P(Z= 0) - P({ X = 0})P({Y = 0}) - P({Y = 0})P({Z= 0}) - P({X = 0})P({Z= 0}) + P({X = 0},{Y = 0},{Z= 0})$$.

\(X, Y,\) and \(Z\) be three Bernoulli \(\left(\frac{1}{2}\right)\) random variables. Hence,

\( P(XYZ = 0) = \frac{3}{4} + P({X = 0},{Y = 0},{Z= 0}) \geq \frac{3}{4}\).

(b)

\( P(XYZ = 0) = \frac{3}{4} \iff P({X = 0},{Y = 0},{Z= 0}) = 0 \).

Now, this is just a logical game with conditional probability.

\( P({X = 0} |{Y = 0},{Z= 0}) = 0 \Rightarrow P({Z= 0} |{Y = 0},{X = 1}) = 1\).

\( P({Y = 0} |{X = 0},{Z= 0}) = 0 \Rightarrow P({Z= 0} |{X = 0},{Y = 1}) = 1\).

\( P({Z = 0} |{X = 0},{Y= 0}) = 0 \Rightarrow P({Z = 1} |{X = 0},{Y= 0}) = 1\).

\( P( Z = 0) = P({X = 1},{Y = 0},{Z= 0}) + P({X = 0},{Y = 1},{Z= 0}) + P({X = 1},{Y = 1},{Z= 0}) + P({X = 0},{Y = 0},{Z= 0})\)

\( = \frac{1}{4} + \frac{1}{4} + P({X = 1},{Y = 1},{Z= 0}) \).

Now, \(Z\) is a Bernoulli \(\left(\frac{1}{2}\right)\) random variable. So, \(P(Z = 0) =\frac{1}{2}\) \( \Rightarrow P({X = 1},{Y = 1},{Z= 0}) = 0 \Rightarrow P({Z = 0} | {Y = 1},{X= 1}) = 0 \).

\( P({Z= 0} |{Y = 0},{X = 1}) = 1\).

\(P({Z= 0} |{X = 0},{Y = 1}) = 1\).

\(P({Z = 1} |{X = 0},{Y= 0}) = 1\).

\( P({Z = 1} | {Y = 1},{X= 1}) = 1\).

Hence, $$
Z=
\begin{cases}
1 & \text { if } X=Y, \\
0 & \text { if } X \neq Y\\
\end{cases}
$$.

Venny Venny AMy GMy | ISI MStat 2016 PSB Problem 3

This problem is a very basic and cute application of set theory, Venn diagram and AM GM inequality to solve the ISI MStat 2016 PSB Problem 3.

Problem - Venn diagram and AM GM inequality

For any two events \(A\) and \(B\), show that
$$
(\mathrm{P}(A \cap B))^{2}+\left(\mathrm{P}\left(A \cap B^{c}\right)\right)^{2}+\left(\mathrm{P}\left(A^{c} \cap B\right)\right)^{2}+\left(\mathrm{P}\left(A^{c} \cap B^{c}\right)\right)^{2} \geq \frac{1}{4}
$$

Prerequisites

Solution

Draw the Venn Diagram

venn diagram and am gm inequality problem

P(region Red) = \(Y\)

P(region Blue) = \(Z\)

P(region Grey) = \(W\)

P(region Brown) = \(X\)

Observe that \( W + X + Y + Z = 1\). \( W, X, Y, Z \geq 0\).

Now, Calculate Given Probability of Sets in terms of \( W, X, Y, Z \).

\({P}(A \cap B) = Z\).

\({P}\left(A \cap B^{c}\right) = Y\).

\({P}\left(A^{c} \cap B\right) = W\).

\( {P}\left(A^{c} \cap B^{c}\right) = X\).

The Final Inequality

\( W, X, Y, Z \geq 0\).

\( W + X + Y + Z = 1\).

Observe that \( 3(W^2 + X^2 + Y^2 + Z^2) = (W^2+X^2) + (W^2+Y^2) + (W^2+Z^2) + (X^2+Y^2) + (X^2+Z^2) + (Y^2+Z^2)\).

\( 3(W^2 + X^2 + Y^2 + Z^2) \geq 2WX + 2WY + 2WZ + 2XY + 2XZ + 2YZ \) by AM - GM Inequality.

\( \Rightarrow 4(W^2 + X^2 + Y^2 + Z^2) \geq (W + X + Y + Z)^2 = 1\).

\( \Rightarrow (W^2 + X^2 + Y^2 + Z^2) \geq \frac{1}{4} \).

Hence,

$$
(\mathrm{P}(A \cap B))^{2}+\left(\mathrm{P}\left(A \cap B^{c}\right)\right)^{2}+\left(\mathrm{P}\left(A^{c} \cap B\right)\right)^{2}+\left(\mathrm{P}\left(A^{c} \cap B^{c}\right)\right)^{2} \geq \frac{1}{4}
$$

Correlation of two ab(Normals) | ISI MStat 2016 PSB Problem 6

This problem is an interesting application of the moment generating function of normal random variable to see how the correlation behaves under monotone function. This is the problem 6 from ISI MStat 2016 PSB.

Problem

Suppose that random variables \(X\) and \(Y\) jointly have a bivariate normal distribution with \(\mathrm{E}(X)=\mathrm{E}(Y)=0, {Var}(X)={Var}(Y)=1,\) and
correlation \(\rho\). Compute the correlation between \(e^{X}\) and \(e^{Y}\).

Prerequisites

Solution

\(M_X(t) = E(e^{tX})\) is called the moment generating function.

Now, let's try to calculate \( Cov(e^X, e^Y) = E(e^{X+Y}) - E(e^X)E(e^Y)\)

For, that we need to have the following in our arsenal.

\( \sigma^2 = Var(X+Y) = Var(X) + 2Cov(X,Y) + Var(Y) = 1 + 2\rho + 1 = 2(1+\rho) \).

Now observe the following:

Important Observation

\( Cor(e^X, e^Y)\) and \(Cor(X,Y) = \rho \) always have the same sign. Can you guess why? There is, in fact, a general result, which we will mention soon.

Now, we are left to calculate \(Var(e^X) = (Var(e^Y)\).

\(Var(e^X) = E(e^{2X}) - (E(e^{X}))^2 = M_X(2) - M_X(1)^2 = e^{\frac{4}{2}} - (e^{\frac{1}{2}})^2 = e^2 - e^1 = Var(e^Y)\).

Therefore, \( Cor(e^X, e^Y) = \frac{e(e^{\rho} - 1)}{e^2 - e^1} =\frac{e^{\rho} - 1}{e - 1} \).

Observe that the mininum correlation of \(e^X\) and \(e^Y\) is \(\frac{-1}{e}\).

Back to the important observation

\( Cor(e^X, e^Y)\) and \(Cor(X,Y) = \rho \) always have the same sign. Why is this true?

Because, \( f(x) = e^x\) is an increasing function. So, if \(X\) and \(Y\) are positively correlated then, as \(X\) increases, \(Y\) also increases in general, hence, \(e^X\) also increases along with \(e^Y\) hence, the result, which is quite intuitive.

Observe that in place of \( f(x) = e^x \) if we would have taken, any increasing function \(f(x)\), this will be the case. Can you prove it?

Research Problem of the day ( Is the following true? )

Let \(f(x)\) be an increasing function of \(x\), then

Discover the Covariance | ISI MStat 2016 Problem 6

This problem from ISI MStat 2016 is an application of the ideas of indicator and independent variables and covariance of two summative random variables.

Problem- Covariance Problem

Let \(X_{1}, \ldots, X_{n}\) ~ \(X\) be i.i.d. random variables from a continuous distribution whose density is symmetric around 0. Suppose \(E\left(\left|X\right|\right)=2\) . Define \( Y=\sum_{i=1}^{n} X_{i} \quad \text { and } \quad Z=\sum_{i=1}^{n} 1\left(X_{i}>0\right)\).
Calculate the covariance between \(Y\) and \(Z\).

This problem is from ISI MStat 2016 (Problem #6)

Prerequisites

  1. X has Symmetric Distribution around 0 \( \Rightarrow E(X) = 0\).
  2. \( |X| = X.1( X > 0 ) - X.1( X \leq 0 ) = 2X.1( X > 0 ) - X\), where \(X\) is a random variable.
  3. \( X_i\) and \(X_j\) are independent \(\Rightarrow\) \(g( X_i)\) and \(f(X_j)\) are independent.
  4. \(A\) and \(B\) are independent \(\Rightarrow Cov(A,B) = 0\).

Solution

\( 2 = E(|X|) = E(X.1(X >0)) - E(X.1(X \leq 0)) = E(2X.1( X > 0 )) - E(X) = 2E(X.1( X > 0 ))\)

\( \Rightarrow E(X.1( X > 0 )) = 1 \overset{E(X) = 0}{\Rightarrow} Cov(X, 1( X > 0 )) = 1\).

Let's calculate the covariance of \(Y\) and \(Z\).

\( Cov(Y, Z) = \sum_{i,j = 1}^{n} Cov( X_i, 1(X_{j}>0))\)

\( = \sum_{i = 1}^{n} Cov( X_i, 1(X_{i}>0)) + \sum_{i,j = 1, i \neq j}^{n} Cov( X_i, 1(X_{j}>0)) \)

\( \overset{X_i \text{&} X_j \text{are independent}}{=} \sum_{i = 1}^{n} Cov( X_i, 1(X_{i}>0)) = \sum_{i = 1}^{n} 1 = n \).

Inverse Uniform Distribution | ISI MStat 2007 PSB Problem 4

This problem is an interesting application of the inverse uniform distribution family, which has infinite mean. This problem is from ISI MStat 2007. The problem is verified by simulation.

Problem

The unit interval (0,1) is divided into two sub-intervals by picking a point at random from inside the interval. Denoting by \(Y\) and \(Z\) the
lengths of the long and the shorter sub-intervals respectively show that \(\frac{Y}{Z}\) does not have a finite expectation.

This is the 4th Problem ISI MStat 2008. Enjoy it.

Prerequisites

Solution

\( \frac{Y}{Z} + 1 = \frac{Y+Z}{Z} = \frac{1}{Z} \), where \(Z\) is the shorter length of the broken stick.

So, \( E( \frac{Y}{Z}) = E(\frac{1}{Z}) - 1 \).

Let's try to find the distribution of \(\frac{1}{Z}\).

Let \( U \) ~ Unif \((0,1)\) whcih denotes the random uniform cut.

Number line - Inverse Uniform Distribution

The shorter stick of length smaller than \( x\) can be achieved if the stick is cut either before \(x\) or it is cut after \( 1-x\).

Observe that \( P( Z \leq x) = P ( U \leq x ) + P ( U \geq 1 - x) = x + 1 - (1-x) = 2x \). This answer is natural since, the total valid length is \(2x\).

\( P( \frac{1}{Z} \leq z) = P ( Z \geq \frac{1}{z} ) = 1 - \frac{2}{z} \Rightarrow F_{\frac{1}{Z}}(z) = 1 - \frac{2}{z}\) if \( 2 \leq z < \infty \).

Therefore, \(f_{\frac{1}{Z}}(z) = \frac{2}{z^2}\) if \( 2 \leq z < \infty \).

Hence, \( E( \frac{Y}{Z}) = E(\frac{1}{Z}) - 1 = (\int_{2}^{\infty} \frac{2}{z} dz) - 1 = \infty \)

Simulation and Verification

Exercise: Prove that \(F_{\frac{Y}{Z}}(x) = \frac{(x-1)}{(x+1)}\) if \(1 \leq x < \infty \).

 u = runif(1000,0,1)
  w = 1 - u
  Z = pmin(u,w)
  Y = pmax(u,w)
  YbyZ = Y/Z
plot(ecdf(YbyZ), xlim = c(0,50))
x = seq(0, 50, 0.01)
curve((x - 1)/(x+1), from = 0, col = "red", add = TRUE)
Graph - Inverse Uniform Distribution
Red = Actual Distribution Function, Black = Simulated ECDF

The Mean moves really slowly to infinity ~ logx. Hence it is really hard to show it is going to \(\infty\). Also, the probability of occurrence of high value is almost 0. Hence, it really hard to show my simulation that the mean is \(\infty\). But, we can show that the mean of the maximum values is really large.

v = rep(0,200)
m = NULL
for ( i in 1:200)
{
  #v[i] = 100*i
  u = runif(10000,0,1)
  w = 1 - u
  Z = pmin(u,w)
  Y = pmax(u,w)
  YbyZ = Y/Z
  m = c(m, max(YbyZ))
}
mean(m) = 79079.43

Beware of the simulation, it can be totally counterintuitive. This is really enjoyable though.

Stay Tuned!

Collect All The Toys | ISI MStat 2013 PSB Problem 9

Remember, we used to collect all the toy species from our chips' packets. We were all confused about how many more chips to buy? Here is how, probability guides us through in this ISI MStat 2013 Problem 9.

Toys - ISI MStat 2013 Problem 9
Collect all the toys offer

Problem

Chips are on sale for Rs. 30 each. Each chip contains exactly one toy, which can be one of four types with equal probability. Suppose you keep on buying chips and stop when you collect all the four types of toys. What will be your expected expenditure?

Prerequisites

Solution

I am really excited to find the solution. Remember, in your childhood, you were excited to buy packets of chips to collect all the toys and show them to your friends. There was a problem, that you don't have enough money to buy the chips. A natural question was how much money you have to spend?

See, the problem is right here! Let's dive into the solution. Remember we have to be a little bit mathematical here.

Let's see how we get the four new toys.

1st new toy \( \rightarrow \) 2nd new toy \( \rightarrow \) 3rd new toy \( \rightarrow \) 4th new toy.

\( N_1\) = The number of chips to be bought to get the first new toy.

\( N_2\) = The number of chips to be bought to get the second new toy after you got the first new toy.

\( N_3\) = The number of chips to be bought to get the third new toy after you got the second new toy.

\( N_4\) = The number of chips to be bought to get the fourth new toy after you got the third new toy.

Observe that the expected number of chips to be bought = \( E ( N_1 + N_2 + N_3 + N_4 \) = \(E (N_1) + E(N_2) + E(N_3) + E(N_4) \).

Now, can you guess what are the random variables \( N_1, N_2, N_3, N_4 \)?

There are all geometric random variables. ( Why? )

[ Hint : Success is when you don't get the already observed toys ]

Observe that \( N_i\) ~ Geom(\(\frac{4-i}{4}\)) ( Why? )

If \(N\) ~ Geom(\(p\)), then \(E(N) = \frac{1}{p}\).

Therefore, the required number of chips to be brought is \( 4 \times \sum_{i=1}{4} \frac{1}{4} = 8 + \frac{1}{3}\).

Therefore, you can guess, what will be the answer if there are \(n\) toys? ( \(n H_n\) ) , where \(H_n\) is the nth harmonic number.

Simulation and Verification

v = NULL
N = 0
n = 4  #number of toys
init = rep(0,n)
toy = rep(0,n)
True = rep(TRUE,n)
for (j in 1:10000)
{
  for (i in 1:100) 
  {
    s = sample(4,1)
    toy[s] = toy[s] + 1
    N = N + 1
    if (identical (toy > init, True))
    {
      break
    }
  }
v = c(v,N)
N = 0
toy = rep(0,n)
}
mean(v) = # 8.3214

Therefore, it is verified by simulation.

This image has an empty alt attribute; its file name is image-4.png
This is the histogram of the mean, which is expected to be around 8.33

Stay Tuned! Stay Shambho!