ISI MStat PSB 2004 Problem 1 | Games and Probability

This is a very beautiful sample problem from ISI MStat PSB 2004 Problem 1. Games are best ways to understand the the role of chances in life, solving these kind of problems always indulges me to think and think more on the uncertainties associated with the system. Think it over !!

Problem- ISI MStat PSB 2004 Problem 1

Suppose two teams play a series of games, each producing a winner and a loser, until one team has won two or more games than the other. Let G be the number of games played. Assume echs team has a chance of 0.5 to win each game, independent of the results of the previous games.

(a) Find the probability distribution of G.

(b) Find the expected value of G.


Naive Probability

Counting priciples

Geometric distribution.

Conditional expectation .

Solution :

While solving this kind of problem, First what we should do is to observe those things which remains invariant.

Here, observe that the game will always terminate, with consecutive wins of one team. Imagine there are two teams \(T_1\) and \( T_2\) . If the first two matches are won by any of the two teams we are done, but say \(T_1\) won the first match, but \(T_2\) won the 2nd match, so its a draw and the 2 matches played are basically lost, and we should renew the counting a fresh.

So, can I claim that G (as defined in the question), will always be even !! verify this claim yourself !

so, considering the event G=g , g is even. So, if the game terminates at the gth game, then it is evident from the logic we established above, then both the (g-1)th and gth game was won by the winning team. So, among the first (g-2) games, both the team won equal number of games, and ended in a draw. So, after the (g-2)th game, the teams can be at a draw in \( 2^( \frac{g-2}{2}) \) ways, and the last two matches can be won by any of the two teams in 2 ways. And the g matches can result in \(2^g\) different arrangements of wins and loss (from a perspective of any of the teams ).

(a) So, P(G=g)= \( \frac{2* 2^( \frac{g-2}{2})}{2^g}= \frac{1}{2^{\frac{g}{2}}} \) ; g=2,4,6,.......

Hence the distribution of G. Hold on ! is is looking like geometric somehow ?? Find it out!!

(b) While finding expectation of G, we can use the conventional definition of expectation and, and since I said that distribution of G is (somewhat) geometric, basically to be precise \( \frac{G}{2} \sim Geo(0.5) \), so, clearly expectation will be 4. But I will take this chance to show another beautiful and elegant method to find out expectation, using conditional expectation. So, we will first condition on the win over first match and develop a recursion, which I'm obsessed with. Though one may not find this method useful in this problem since the distribution of G is known to us, but life is not about this problem, isn't it ! What if the distribution is not known but the pattern is visible, only if you are ready to see it. Lets proceed,

Suppose, without loss of generality, let us assume, \(T_1\) won the first game, so 1 game is gone with probability 0.5 and with same probability it is expected that E(G|\(T_1\) is leading by 1 game ) number of games is to be played, similarly( OR) if \(T_2\) wins the first game, then with probability 0.5 , 1 game is gone and an expected E(G|\(T_2\) is leading by 1 game) number of games is to be played.

So, if we right the above words mathematically, it looks like,

E(G)= P(\(T_1\) wins the 1rst game )( 1+E(G|\(T_1\) is leading by 1 game))+P(\(T_4\) wins the 1rst game)(1+E(G|\(T_2\) is leading by 1 game)),......................(*)

So, now we need to find out E(G|\(T_1\) is leading by 1 game), the other is same due to symmetry!

so, expected number of games to be played when we know that \(T_1\) is leading the by 1 game, is the next game can be a win for \(T_1\), with probability 0.5, OR, \(T_1\) can lose the game with probability 0.5, and reach the stage of draw, and from here, all the games played is wasted and we start counting afresh (Loss of memory, remember !! ) , so 1 game is lost and still a expected E(G) numbers of games to follow before it terminates. So, mathematically,

E(G|\(T_1\) is leading by 1 game)= P(\(T_1\) wins again)x1 + P(\(T_1\) looses and draws)(1+E(G))=0.5+0.5+(0.5)E(G)=1+(0.5)E(G).

plugging in this, in (*), one will obtain a recursion of E(G), and calculate is as 4. So, on an average, the game terminates after 4 matches.

Food For Thought

Can you generalize the above problem, when I say, that the chance of winning each match, for one of the two team is some p, ( 0<p<1) ? Try it !

Wait!! Before leaving, lets toss some coins,

You toss a fair coin repeatedly. What is the expected number of tosses you think, you have to perform to get the pattern of HH (heads and heads) for the first time? What about the expected number of tosses when your pattern of interest is TH (or HT) ?? for which pattern you think you need to spend more time?? Does your intuition corroborates, the mathematical conclusions?? if not what's the reason you think, you are misled by your intuition ??

Think it over and over !! You are dealing with one of the most beautiful perspective of uncertainty !!

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube

ISI MStat PSB 2014 Problem 9 | Hypothesis Testing

This is a another beautiful sample problem from ISI MStat PSB 2014 Problem 9. It is based on testing simple hypothesis, but reveals and uses a very cute property of Geometric distribution, which I prefer calling sister to Loss of memory . Give it a try !

Problem- ISI MStat PSB 2014 Problem 9

Let \( X_i \sim Geo(p_1)\) and \( X_2 \sim Geo(p_2)\) be independent random variables, where Geo(p) refers to Geometric distribution whose p.m.f. f is given by,

\(f(k)=p(1-p)^k, k=0,1,.....\)

We are interested in testing the null hypothesis \(H_o : p_1=p_2\) against the alternative \( H_1: p_1<p_2\). Intuitively it is clear that we should reject if \(X_1\) is large, but unfortunately, we cannot compute the cut-off because the distribution of \(X_1\) under \(H_o\) depends on the unknown (common) value \(p_1\) and \(p_2\).

(a) Let \(Y= X_1 +X_2\). Find the conditional distribution of \( X_1|Y=y\) when \(p_1=p_2\).

(b) Based on the result obtained in (a), derive a level 0.05 test for \(H_o\) against \(H_1\) when \(X_1\) is large.


Geometric Distribution.

Negative binomial distribution.

Discrete Uniform distribution .

Conditional Distribution . .

Simple Hypothesis Testing.

Solution :

Well, Part (a), is quite easy, but interesting and elegant, so I'm leaving it as an exercise, for you to have the fun. Hint: verify whether the required distribution is Discrete uniform or not ! If you are done, proceed .

Now, part (b), is further interesting, because here we will not use the conventional way of analyzing the distribution of \(X_1\) and \( X_2\), whereas we will be concentrating ourselves on the conditional distribution of \( X_1 | Y=y\) ! But why ?

The reason behind this adaptation of strategy is required, one of the reason is already given in the question itself, but the other reason is more interesting to observe , i.e. if you are done with (a), then by now you found that , the conditional distribution of \(X_1|Y=y\) is independent of any parameter ( i.e. ithe distribution of \(X_1\) looses all the information about the parameter \(p_1\) , when conditioned by Y=y , \(p_1=p_2\) is a necessary condition), and the parameter independent conditional distribution is nothing but a Discrete Uniform {0,1,....,y}, where y is the sum of \(X_1 \) and \(X_2\) .

so, under \(H_o: p_1=p_2\) , the distribution of \(X_1|Y=y\) is independent of the both common parameter \(p_1 \) and \(p_2\) . And clearly as stated in the problem itself, its intuitively understandable , large value of \(X_1\) exhibits evidences against \(H_o\). Since large value of \(X_1\) is realized, means the success doesn't come very often .i.e. \(p_1\) is smaller.

So, there will be strong evidence against \(H_o\) if \(X_1 > c\) , where , for some constant \(c \ge y\), where y is given the sum of \(X_1+X_2\).

So, for a level 0.05 test , the test will reject \(H_o\) for large value of k , such that,

\( P_{H_o}( X_1 > c| Y=y)=0.05 \Rightarrow \frac{y-c}{y+1} = 0.05 \Rightarrow c= 0.95 y - 0.05 .\)

So, we reject \(H_o\) at level 0.05, when we observe \( X_1 > 0.95y - 0.05 \) , where it is given that \(X_1+X_2\) =y . That's it!

Food For Thought

Can you show that for this same \(X_1 \) and \( X_2\) ,

\(P(X_1 \le n)- P( X_1+X_2 \le n)= \frac{1-p}{p}P(X_1+X_2= n) \)

considering \(p_1=p_2=p\) , where n=0,1,.... What about the converse? Does it hold? Find out!

But avoid loosing memory, it's beauty is exclusively for Geometric ( and exponential) !!

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube