## ISI MStat Entrance 2020 Problems and Solutions

This post contains Indian Statistical Institute, ISI MStat Entrance 2020 Problems and Solutions. Try to solve them out.

## Subjective Paper – ISI MStat Entrance 2020 Problems and Solutions

• Let $f(x)=x^{2}-2 x+2$. Let $L_{1}$ and $L_{2}$ be the tangents to its graph at $x=0$ and $x=2$ respectively. Find the area of the region enclosed by the graph of $f$ and the two lines $L_{1}$ and $L_{2}$.

Solution
• Find the number of $3 \times 3$ matrices $A$ such that the entries of $A$ belong to the set $\mathbb{Z}$ of all integers, and such that the trace of $A^{t} A$ is 6 . $\left(A^{t}\right.$ denotes the transpose of the matrix $\left.A\right)$.

Solution
• Consider $n$ independent and identically distributed positive random variables $X_{1}, X_{2}, \ldots, X_{n},$ Suppose $S$ is a fixed subset of ${1,2, \ldots, n}$ consisting of $k$ distinct elements where $1 \leq k<n$
(a) Compute $\mathbb{E}\left[\frac{\sum_{i \in S} X_{i}}{\sum_{i=1}^{n} X_{i}}\right]$

(b) Assume that $X_{i}$ ‘s have mean $\mu$ and variance $\sigma^{2}, 0<\sigma^{2}<\infty$. If $j \notin S,$ show that the correlation between $\left(\sum_{i \in S} X_{i}\right) X_{j}$ and $\sum_{i \in S} X_{i}$ lies between -$\frac{1}{\sqrt{k+1}} \text { and } \frac{1}{\sqrt{k+1}}$.

Solution
• Let $X_{1,} X_{2}, \ldots, X_{n}$ be independent and identically distributed random variables. Let $S_{n}=X_{1}+\cdots+X_{n}$. For each of the following statements, determine whether they are true or false. Give reasons in each case.

(a) If $S_{n} \sim E_{x p}$ with mean $n,$ then each $X_{i} \sim E x p$ with mean 1 .

(b) If $S_{n} \sim B i n(n k, p),$ then each $X_{i} \sim B i n(k, p)$

Solution
• Let $U_{1}, U_{2}, \ldots, U_{n}$ be independent and identically distributed random variables each having a uniform distribution on (0,1) . Let $X=\min \{U_{1}, U_{2}, \ldots, U_{n}\}$, $Y=\max \{U_{1}, U_{2}, \ldots, U_{n}\}$

Evaluate $\mathbb{E}[X \mid Y=y]$ and $\mathbb{E}[Y \mid X=x]$.

Solution
• Suppose individuals are classified into three categories $C_{1}, C_{2}$ and $C_{3}$ Let $p^{2},(1-p)^{2}$ and $2 p(1-p)$ be the respective population proportions, where $p \in(0,1)$. A random sample of $N$ individuals is selected from the population and the category of each selected individual recorded.

For $i=1,2,3,$ let $X_{i}$ denote the number of individuals in the sample belonging to category $C_{i} .$ Define $U=X_{1}+\frac{X_{3}}{2}$

(a) Is $U$ sufficient for $p ?$ Justify your answer.

(b) Show that the mean squared error of $\frac{U}{N}$ is $\frac{p(1-p)}{2 N}$

Solution
• Consider the following model: $y_{i}=\beta x_{i}+\varepsilon_{i} x_{i}, \quad i=1,2, \ldots, n$, where $y_{i}, i=1,2, \ldots, n$ are observed; $x_{i}, i=1,2, \ldots, n$ are known positive constants and $\beta$ is an unknown parameter. The errors $\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}$ are independent and identically distributed random variables having the probability density function $f(u)=\frac{1}{2 \lambda} \exp \left(-\frac{|u|}{\lambda}\right), \quad-\infty<u<\infty$ and $\lambda$ is an unknown parameter.

(a) Find the least squares estimator of $\beta$.

(b) Find the maximum likelihood estimator of $\beta$.

Solution
• Assume that $X_{1}, \ldots, X_{n}$ is a random sample from $N(\mu, 1),$ with $\mu \in \mathbb{R}$. We want to test $H_{0}: \mu=0$ against $H_{1}: \mu=1$. For a fixed integer $m \in{1, \ldots, n},$ the following statistics are defined:

\begin{aligned}
T_{1} &= \frac{\left(X_{1}+\ldots+X_{m}\right)}{m} \\
T_{2} &= \frac{\left(X_{2}+\ldots+X_{m+1}\right)}{m} \\
\vdots &=\vdots \\
T_{n-m+1} &= \frac{\left(X_{n-m+1}+\ldots+X_{n}\right)}{m}
\end{aligned}

$\operatorname{Fix} \alpha \in(0,1) .$ Consider the test

Reject $H_{0}$ if $\max \{T_{i}: 1 \leq i \leq n-m+1\}>c_{m, \alpha}$

Find a choice of $c_{m, \alpha} \in \mathbb{R}$ in terms of the standard normal distribution function $\Phi$ that ensures that the size of the test is at most $\alpha$.

Solution
• A finite population has $N$ units, with $x_{i}$ being the value associated with the $i$ th unit, $i=1,2, \ldots, N$. Let $\bar{x}{N}$ be the population mean. A statistician carries out the following experiment.

Step 1: Draw an SRSWOR of size $n({1}$ and denote the sample mean by $\bar{X}{n}$

Step 2: Draw a SRSWR of size $m$ from $S_{1}$. The $x$ -values of the sampled units are denoted by $\{Y_{1}, \ldots, Y_{m}\}$

An estimator of the population mean is defined as,

$\widehat{T}{m}=\frac{1}{m} \sum{i=1}^{m} Y_{i}$

(a) Show that $\widehat{T}{m}$ is an unbiased estimator of the population mean.

(b) Which of the following has lower variance: $\widehat{T}{m}$ or $\bar{X}_{n} ?$

Solution

## Objective Paper

 1. C 2. D 3. A 4. B 5. A 6. B 7. C 8. A 9. C 10. A 11. C 12. D 13. C 14. B 15. B 16. C 17. D 18. B 19. B 20. C 21. C 22. D 23. A 24. B 25. D 26. B 27. D 28. D 29. B 30. C

Watch videos related to the ISI MStat Problems here.

## How to roll a Dice by tossing a Coin ? Cheenta Statistics Department

How can you roll a dice by tossing a coin? Can you use your probability knowledge? Use your conditioning skills.

Suppose, you have gone to a picnic with your friends. You have planned to play the physical version of the Snake and Ladder game. You found out that you have lost your dice.

The shit just became real!

Now, you have an unbiased coin in your wallet / purse. You know Probability.

### Aapna Time Aayega

starts playing in the background. :p

## Can you simulate the dice from the coin?

Ofcourse, you know chances better than others. :3

Take a coin.

Toss it 3 times. Record the outcomes.

HHH = Number 1

HHT = Number 2

HTH = Number 3

HTT = Number 4

THH = Number 5

THT = Number 6

TTH = Reject it, don’t ccount the toss and toss again

TTT = Reject it, don’t ccount the toss and toss again

Voila done!

What is the probability of HHH in this experiment?

Let X be the outcome in the restricted experiment as shown.

How is this experiment is different from the actual experiment?

This experiment is conditioning on the event A = {HHH, HHT, HTH, HTT, THH, THT}.

$P( X = HHH) = P (X = HHH | X \in A ) = \frac{P (X = HHH)}{P (X \in A)} = \frac{1}{6}$

Beautiful right?

Can you generalize this idea?

## Food for thought

• Give an algorithm to simulate any conditional probability.
• Give an algorithm to simulate any event with probability $\frac{m}{2^k}$, where $m \leq 2^k$.
• Give an algorithm to simulate any event with probability $\frac{m}{2^k}$, where $n \leq 2^k$.
• Give an algorithm to simulate any event with probability $\frac{m}{n}$, where $m \leq n \leq 2^k$ using conditional probability.

## Watch the Video here:

Books for ISI MStat Entrance Exam

How to Prepare for ISI MStat Entrance Exam

ISI MStat and IIT JAM Stat Problems and Solutions

Cheenta Statistics Program for ISI MStat and IIT JAM Stat

Simple Linear Regression – Playlist on YouTube

## ISI MStat PSB 2013 Problem 5 | Simple Random Sampling

This is a sample problem from ISI MStat PSB 2013 Problem 5. It is based on the simple random sampling model, finding the unbiased estimates of the population size. But think over the “Food for Thought” any kind of discussion will be appreciated. Give it a try!

## Problem– ISI MStat PSB 2013 Problem 5

A box has a unknown number of tickets serially numbered 1,2,…..,N. Two tickets are drawn using simple random sampling without replacement (SRSWOR) from the box. If X and Y are the numbers on the tickets and Z=max(X,Y), show that

(a) Z is not ubiased for N.

(b) $aX+ bY+ c$ is unbiased for N if and only if $a+b=2$ and $c=-1$.

### Prerequisites

Naive Probability

Counting priciples

Unbiased estimators.

Simple random sampling .

## Solution :

For this problem, first let us find the pmf of Z, where we will need some counting techniques.

Since we are drawing balls at a random and not replacing the drawn ball after each draw (SRSWOR), so, clearly its about choosing two numbers from the se of N elements {1,…..,N}. So, all possible sample of size 2 , that than be drawn from the population of N units is ${ N \choose 2}$ .

now Z defined here as the maximum of the two chosen numbers, so, all possible values of Z are 2,3,….,N.

Now lets assume that Z=k, so now we just need to find out what are the possible pairs, such that k comes the max among both, or in other words if k is the maximum of the drawn numbers, what are the possible values that the other number can take ? Well, its simple the other ticket can carry any number less than k, an since there are k-1 such numbers. So there are (k-1) such pairs where the maximum numbered ticket is k. (not concerned on the ordering, of the two observation)

So, the pmf of Z=max(X,Y) , i.e. $P(Z=k) = \begin{cases} \frac{k-1}{{N \choose 2}} & k=2,3,….,N \\ 0 & otherwise \end{cases}$

(a) So, now to check whether Z is unbiased for N, we need to check E(Z),

$E(Z)= \sum_{k=2}^N{k}{\frac{k-1}{{N \choose 2}}} =\frac{1}{{N \choose 2}}\sum_{k=2}^N{k(k-1)}=(\frac{2}{3}) (N+1)$.

so, $E(Z)=\frac{2}{3} (N+1) \neq N$. Hence Z is not Unbiased for the population size, N.

(b) Similarly, we find the expectation of T=aX+bY+c,

$E(T)=aE(X)+bE(Y)+c= a \sum_{i=1}^N i P(X=i) + b \sum_{j=1}^N j P(Y=j) + c,$

now here $P(X=i)=P(Y=i)= \frac{1}{N}$, so, $E(T) = a \frac{N+1}{2}+ b\frac{N+1}{2}+c = (a+b) \frac{N+1}{2} +c,$

clearly, E(T) = N, i.e T will be unbiased for N, iff a+b=2 and c=-1.

Hence we are done !

## Food For Thought

Now, suppose that the numbers on the tickets are random, that is it can be any positive integer, ( like say 220 or 284), but thankfully you now the total number of tickets .i.e. N is known . Now you are collecting tickets for yourself and k-1 of your friends, and the number c is lucky for you and you wish to keep it in your collection, and select the remaining k-1 tickets out of N-1 tickets, and you calculate a sample mean(of the collected numbers) $\bar{y’}$, Can I claim that $c+ (N-1)\bar{y’}$ is an unbiased estimator of the population total ? Do you know this estimator shows less variance than the conventional unbiased estimator of the population total? Can you show that too?? why do you think the variance minimizes??

By the Way, Do you know, in mathematics 220 and 284 are quite special ?? They are the first “amicable numbers”. One can obtain the other by summing over its own divisors !! So, to become amicable one needs to increase the size of their mind and heart !! Keep increasing both!! Till then…. bye.

## Unbiased, Pascal and MLE | ISI MStat 2019 PSB Problem 7

This is a problem from the ISI MStat Entrance Examination,2019 involving the MLE of the population size and investigating its unbiasedness.

## The Problem:

Suppose an SRSWOR of size n has been drawn from a population labelled $1,2,3,…,N$ , where the population size $N$ is unknown.

(a)Find the maximum likelihood estimator $\hat{N}$ of $N$.

(b)Find the probability mass function of $\hat{N}$.

(c)Show that $\frac{n+1}{n}\hat{N} -1$ is an unbiased estimator of $N$.

## Prerequisites:

(a) Simple random sampling (SRSWR/SRSWOR)

(b)Maximum Likelihood estimator and how to find it.

(c)Unbiasedness of an estimator.

(d)Identities involving Binomial coefficients. (For this, you may refer to any standard text on Combinatorics like R.A.Brualdi,Miklos Bona etc.)

## Solution:

(a) Let $X_1,X_2,..X_n$ be the sample to be selected. In the SRSWOR scheme,

the selection probability of a sample of size $n$ is given by $P(s)=\frac{1}{{N \choose n}}$.

As, $X_1,..,X_n \in \{1,2,…,N \}$ , we have the maximum among them , that is the $n$ th order statistic, $X_{(n)}$ is always less than $N$.

Now, ${N \choose n}$ is an increasing function of $N$. So, of course, ${X_{(n)} \choose n} \le {N \choose n }$ , thus on reciprocating, we have $P(s) \le \frac{1}{ {X_{(n)} \choose n}}$. Hence the maximum likelihood estimator of $N$ i.e. $\hat{N}$ is $X_{(n)}$.

(b) We need to find the pmf of $\hat{N}$.

See that $P(\hat{N}=m) = \frac{ {m \choose n} – {m-1 \choose n } }{ {N \choose n }}$ , where $m=n,n+1,…,N$.

Can you convince yourself why?

(c) We use a well known identity , the Pascal’s Identity to rewrite the distribution of $\hat{N}=X_{(n)}$ a bit more precisely:

We write $P(\hat{N}=m) = \frac{ {m-1 \choose n-1}}{ {N \choose n} } ; \text{whenever m=n,n+1,…,N }$

Thus, we have :

\begin{align} E(\hat{N})&=\sum_{m=n}^N m P(\hat{N}=m) =\frac{n}{\binom{N}{n}}\sum_{m=n}^N \frac{m}{n}\binom{m-1}{n-1} =\frac{n}{\binom{N}{n}}\sum_{m=n}^N \binom{m}{n} \end{align}

Also, use the Hockey Stick Identity to see that $\sum_{m=n}^{N} {m \choose n} = {N+1 \choose n+1}$

So, we have $E(\hat{N})=\frac{n}{ {N \choose n}} {N+1 \choose n+1}=\frac{n(N+1)}{n+1}$.

Thus, we get $E( \frac{n+1}{n}\hat{N} -1) = N$

## Useful Exercise:

Look up the many proofs of the Hockey Stick Identity. But make sure you at least learn the proof by a combinatorial argument and an alternative proof involving visualizing the identity via the Pascal’s Triangle.

## Collect All The Toys | ISI MStat 2013 PSB Problem 9

Remember, we used to collect all the toy species from our chips’ packets. We were all confused about how many more chips to buy? Here is how, probability guides us through in this ISI MStat 2013 Problem 9.

## Problem

Chips are on sale for Rs. 30 each. Each chip contains exactly one toy, which can be one of four types with equal probability. Suppose you keep on buying chips and stop when you collect all the four types of toys. What will be your expected expenditure?

## Solution

I am really excited to find the solution. Remember, in your childhood, you were excited to buy packets of chips to collect all the toys and show them to your friends. There was a problem, that you don’t have enough money to buy the chips. A natural question was how much money you have to spend?

See, the problem is right here! Let’s dive into the solution. Remember we have to be a little bit mathematical here.

Let’s see how we get the four new toys.

1st new toy $\rightarrow$ 2nd new toy $\rightarrow$ 3rd new toy $\rightarrow$ 4th new toy.

$N_1$ = The number of chips to be bought to get the first new toy.

$N_2$ = The number of chips to be bought to get the second new toy after you got the first new toy.

$N_3$ = The number of chips to be bought to get the third new toy after you got the second new toy.

$N_4$ = The number of chips to be bought to get the fourth new toy after you got the third new toy.

Observe that the expected number of chips to be bought = $E ( N_1 + N_2 + N_3 + N_4$ = $E (N_1) + E(N_2) + E(N_3) + E(N_4)$.

Now, can you guess what are the random variables $N_1, N_2, N_3, N_4$?

There are all geometric random variables. ( Why? )

[ Hint : Success is when you don’t get the already observed toys ]

Observe that $N_i$ ~ Geom($\frac{4-i}{4}$) ( Why? )

If $N$ ~ Geom($p$), then $E(N) = \frac{1}{p}$.

Therefore, the required number of chips to be brought is $4 \times \sum_{i=1}{4} \frac{1}{4} = 8 + \frac{1}{3}$.

Therefore, you can guess, what will be the answer if there are $n$ toys? ( $n H_n$ ) , where $H_n$ is the nth harmonic number.

## Simulation and Verification

v = NULL
N = 0
n = 4  #number of toys
init = rep(0,n)
toy = rep(0,n)
True = rep(TRUE,n)
for (j in 1:10000)
{
for (i in 1:100)
{
s = sample(4,1)
toy[s] = toy[s] + 1
N = N + 1
if (identical (toy > init, True))
{
break
}
}
v = c(v,N)
N = 0
toy = rep(0,n)
}
mean(v) = # 8.3214


Therefore, it is verified by simulation.

Stay Tuned! Stay Shambho!