## Pigeonhole Principle

“The Pigeonhole principle” ~ Students who have never heard may think that it is a joke. Pigeonhole Principle is one of the simplest but most useful ideas in mathematics. Let’s learn the Pigeonhole Principle with some applications.

## Pigeonhole Principle Definition:

In Mathematics, the pigeonhole principle states that if we must put N + 1 or more pigeons into N Pigeon Holes, then some pigeonholes must contain two or more pigeons.

### Pigeonhole Principle Example:

If Kn+ 1 (where k is a positive integer) pigeons are distributed among n holes than some hole contains at least k + 1 pigeons.

### Applications of Pigeonhole Principle:

This principle is applicable in many fields like Number Theory, Probability, Algorithms, Geometry, etc.

## Problems:

### Problem 1

A bag contains beads of two colours: black and white. What is the smallest number of beads which must be drawn from bag, without looking so that among these beads, two are of the same colour?

Solution: We can draw three beads from bags. If there were no more than one bead of each colour among these, then there would be no more than two beads altogether. This is obvious and contradicts the fact that we have chosen there beads. On the other hand, it is clear that choosing two beads is not enough. Here the beads play the role of pigeons, and the colours (black and white) play the role of pigeonhole.

### Problem 2

Find the minimum number of students in a class such that three of them are born in the same month?

Solution: Number of month n =12

According to the given condition,

K+1 = 3

K = 2

M = kn +1 = 2*12 + 1 = 25.

### Problem 3

Show that from any three integers, one can always chose two so that $a^3$b – a$b^3$ is divisible by 10.

Solution: We can factories the term $a^3$b – a$b^3$ = ab(a + b)(a – b), which is always even, irrespective of the pair of integers we choose.

If one of three integers from the above factors is in the form of 5k, which is a multiple of 5, then our result is proved.

If none of the integers are a multiple of 5 then the chosen integers should be in the form of (5k)+-(1) and (5k)+-(2) respectively.

Clearly, two of these three numbers in the above factors from the given expression should lie in one of the above two from, which follows by the virtue of this principle.

These two integers are the ones such that their sum and difference is always divisible by 5. Hence, our result is proved.

### Problem 4

If n is a positive integer not divisible by 2 or 5 then n has a multiple made up of 1’s.

Watch the solution:

## ISI MStat Entrance 2020 Problems and Solutions

This post contains Indian Statistical Institute, ISI MStat Entrance 2020 Problems and Solutions. Try to solve them out.

## Subjective Paper – ISI MStat Entrance 2020 Problems and Solutions

• Let $f(x)=x^{2}-2 x+2$. Let $L_{1}$ and $L_{2}$ be the tangents to its graph at $x=0$ and $x=2$ respectively. Find the area of the region enclosed by the graph of $f$ and the two lines $L_{1}$ and $L_{2}$.

Solution
• Find the number of $3 \times 3$ matrices $A$ such that the entries of $A$ belong to the set $\mathbb{Z}$ of all integers, and such that the trace of $A^{t} A$ is 6 . $\left(A^{t}\right.$ denotes the transpose of the matrix $\left.A\right)$.

Solution
• Consider $n$ independent and identically distributed positive random variables $X_{1}, X_{2}, \ldots, X_{n},$ Suppose $S$ is a fixed subset of ${1,2, \ldots, n}$ consisting of $k$ distinct elements where $1 \leq k<n$
(a) Compute $\mathbb{E}\left[\frac{\sum_{i \in S} X_{i}}{\sum_{i=1}^{n} X_{i}}\right]$

(b) Assume that $X_{i}$ ‘s have mean $\mu$ and variance $\sigma^{2}, 0<\sigma^{2}<\infty$. If $j \notin S,$ show that the correlation between $\left(\sum_{i \in S} X_{i}\right) X_{j}$ and $\sum_{i \in S} X_{i}$ lies between -$\frac{1}{\sqrt{k+1}} \text { and } \frac{1}{\sqrt{k+1}}$.

Solution
• Let $X_{1,} X_{2}, \ldots, X_{n}$ be independent and identically distributed random variables. Let $S_{n}=X_{1}+\cdots+X_{n}$. For each of the following statements, determine whether they are true or false. Give reasons in each case.

(a) If $S_{n} \sim E_{x p}$ with mean $n,$ then each $X_{i} \sim E x p$ with mean 1 .

(b) If $S_{n} \sim B i n(n k, p),$ then each $X_{i} \sim B i n(k, p)$

Solution
• Let $U_{1}, U_{2}, \ldots, U_{n}$ be independent and identically distributed random variables each having a uniform distribution on (0,1) . Let $X=\min \{U_{1}, U_{2}, \ldots, U_{n}\}$, $Y=\max \{U_{1}, U_{2}, \ldots, U_{n}\}$

Evaluate $\mathbb{E}[X \mid Y=y]$ and $\mathbb{E}[Y \mid X=x]$.

Solution
• Suppose individuals are classified into three categories $C_{1}, C_{2}$ and $C_{3}$ Let $p^{2},(1-p)^{2}$ and $2 p(1-p)$ be the respective population proportions, where $p \in(0,1)$. A random sample of $N$ individuals is selected from the population and the category of each selected individual recorded.

For $i=1,2,3,$ let $X_{i}$ denote the number of individuals in the sample belonging to category $C_{i} .$ Define $U=X_{1}+\frac{X_{3}}{2}$

(a) Is $U$ sufficient for $p ?$ Justify your answer.

(b) Show that the mean squared error of $\frac{U}{N}$ is $\frac{p(1-p)}{2 N}$

Solution
• Consider the following model: $y_{i}=\beta x_{i}+\varepsilon_{i} x_{i}, \quad i=1,2, \ldots, n$, where $y_{i}, i=1,2, \ldots, n$ are observed; $x_{i}, i=1,2, \ldots, n$ are known positive constants and $\beta$ is an unknown parameter. The errors $\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}$ are independent and identically distributed random variables having the probability density function $f(u)=\frac{1}{2 \lambda} \exp \left(-\frac{|u|}{\lambda}\right), \quad-\infty<u<\infty$ and $\lambda$ is an unknown parameter.

(a) Find the least squares estimator of $\beta$.

(b) Find the maximum likelihood estimator of $\beta$.

Solution
• Assume that $X_{1}, \ldots, X_{n}$ is a random sample from $N(\mu, 1),$ with $\mu \in \mathbb{R}$. We want to test $H_{0}: \mu=0$ against $H_{1}: \mu=1$. For a fixed integer $m \in{1, \ldots, n},$ the following statistics are defined:

\begin{aligned}
T_{1} &= \frac{\left(X_{1}+\ldots+X_{m}\right)}{m} \\
T_{2} &= \frac{\left(X_{2}+\ldots+X_{m+1}\right)}{m} \\
\vdots &=\vdots \\
T_{n-m+1} &= \frac{\left(X_{n-m+1}+\ldots+X_{n}\right)}{m}
\end{aligned}

$\operatorname{Fix} \alpha \in(0,1) .$ Consider the test

Reject $H_{0}$ if $\max \{T_{i}: 1 \leq i \leq n-m+1\}>c_{m, \alpha}$

Find a choice of $c_{m, \alpha} \in \mathbb{R}$ in terms of the standard normal distribution function $\Phi$ that ensures that the size of the test is at most $\alpha$.

Solution
• A finite population has $N$ units, with $x_{i}$ being the value associated with the $i$ th unit, $i=1,2, \ldots, N$. Let $\bar{x}{N}$ be the population mean. A statistician carries out the following experiment.

Step 1: Draw an SRSWOR of size $n({1}$ and denote the sample mean by $\bar{X}{n}$

Step 2: Draw a SRSWR of size $m$ from $S_{1}$. The $x$ -values of the sampled units are denoted by $\{Y_{1}, \ldots, Y_{m}\}$

An estimator of the population mean is defined as,

$\widehat{T}{m}=\frac{1}{m} \sum{i=1}^{m} Y_{i}$

(a) Show that $\widehat{T}{m}$ is an unbiased estimator of the population mean.

(b) Which of the following has lower variance: $\widehat{T}{m}$ or $\bar{X}_{n} ?$

Solution

## Objective Paper

 1. C 2. D 3. A 4. B 5. A 6. B 7. C 8. A 9. C 10. A 11. C 12. D 13. C 14. B 15. B 16. C 17. D 18. B 19. B 20. C 21. C 22. D 23. A 24. B 25. D 26. B 27. D 28. D 29. B 30. C

Watch videos related to the ISI MStat Problems here.

Categories

## Testing of Hypothesis | ISI MStat 2016 PSB Problem 9

This is a problem from the ISI MStat Entrance Examination, 2016 involving the basic idea of Type 1 error of Testing of Hypothesis but focussing on the fundamental relationship of Exponential Distribution and the Geometric Distribution.

## The Problem:

Suppose $X_{1}, X_{2}, \ldots, X_{n}$ is a random sample from an exponential distribution with mean $\lambda$.

Assume that the observed data is available on $\left[X_{1}\right], \ldots,\left[X_{n}\right]$, instead of $X_{1}, \ldots, X_{n},$ where $[x]$ denotes the largest integer less than or equal to $x$.

Consider a test for $H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$ which rejects $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Given $\alpha \in(0,1),$ obtain values of $c_{n}$ such that the size of the test converges to $\alpha$ as $n \rightarrow \infty$.

## Prerequisites:

(a) Testing of Hypothesis

(b) Type 1 Error

(c) Exponential Distribution

(d) Relationship of Exponential Distribution and Geometric Distribution

(e) Central Limit Theorem

## Solution:

• X ~ Exponential($\lambda$), then $Y = [\frac{X}{a}]$ ~ Geom($p$), where $p = 1-e^{-\lambda a} \in(0,1)$

Proof:

$Y$ is clearly discrete taking values in the set of non-negative integers, due to the flooring. Then, for any integer $n \geq 0$ we have
$\begin{array}{c} P(Y=n)=P(X \in[\text {an, } a(n+1))) \ =\int_{a n}^{a(n+1)} \lambda \mathrm{e}^{-\lambda x} d x=(1-p)^{n} p \end{array}$
where $p=1-e^{-\lambda a} \in(0,1),$ as $\lambda>0$ and $a>0$.

• $X_i$ ~ Geom($p$), then $\sum_{i = 1}^{n}$ ~ NBinom(n,p)
• $X_i$ ~ Exponential($\lambda$), then $S_n = \sum_{i=1}^{n}\left[X_{i}\right]$ ~ NBinom($(n,p)$), where $p = 1-e^{-\lambda} \in(0,1)$

#### Testing of Hypothesis

$H_{0}: \lambda=1$ vs $H_{1}: \lambda>1$

We reject $H_{0}$ when $\sum_{i=1}^{n}\left[X_{i}\right]>c_{n} .$

Here, the size of the test i.e the Type 1 error (for simple hypothesis), $\alpha_n$ = $P(S_n > c_{n} | \lambda=1)$.

We want to select $c_n$ such that $\alpha_n \to \alpha$.

$S_n$ ~ NBinom($n,p$), where $p = 1-e^{-1}$ under $H_0$.

Now, $\frac{\sqrt{n}(\frac{S_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} \rightarrow Z = N(0,1)$ by Central Limit Theorem.

Observe that thus, $\alpha_n = P(S_n > c_{n} | \lambda=1) \rightarrow P(Z > \frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}}) = \alpha$.

Thus, $\frac{\sqrt{n}(\frac{c_n}{n} – \frac{1}{p})}{\sqrt{\frac{1-p}{p^2}}} = Z_{\alpha}$.

We can solve this to find $c_n$, where $p = 1-e^{-1}$

## Food for Thought

If X ~ Exponential($\lambda$), then what is the distribution of {X} [ The fractional part of X]. This question is crucial is getting back Exponential Distrbution from Geometric Distribution.

Rather, the food for thought, asks you how do we get Exponential Distribution from Geometric Distribution.

Stay Tuned. Stay Blessed! See you in the next post.

## How to roll a Dice by tossing a Coin ? Cheenta Statistics Department

How can you roll a dice by tossing a coin? Can you use your probability knowledge? Use your conditioning skills.

Suppose, you have gone to a picnic with your friends. You have planned to play the physical version of the Snake and Ladder game. You found out that you have lost your dice.

The shit just became real!

Now, you have an unbiased coin in your wallet / purse. You know Probability.

### Aapna Time Aayega

starts playing in the background. :p

## Can you simulate the dice from the coin?

Ofcourse, you know chances better than others. :3

Take a coin.

Toss it 3 times. Record the outcomes.

HHH = Number 1

HHT = Number 2

HTH = Number 3

HTT = Number 4

THH = Number 5

THT = Number 6

TTH = Reject it, don’t ccount the toss and toss again

TTT = Reject it, don’t ccount the toss and toss again

Voila done!

What is the probability of HHH in this experiment?

Let X be the outcome in the restricted experiment as shown.

How is this experiment is different from the actual experiment?

This experiment is conditioning on the event A = {HHH, HHT, HTH, HTT, THH, THT}.

$P( X = HHH) = P (X = HHH | X \in A ) = \frac{P (X = HHH)}{P (X \in A)} = \frac{1}{6}$

Beautiful right?

Can you generalize this idea?

## Food for thought

• Give an algorithm to simulate any conditional probability.
• Give an algorithm to simulate any event with probability $\frac{m}{2^k}$, where $m \leq 2^k$.
• Give an algorithm to simulate any event with probability $\frac{m}{2^k}$, where $n \leq 2^k$.
• Give an algorithm to simulate any event with probability $\frac{m}{n}$, where $m \leq n \leq 2^k$ using conditional probability.

## Watch the Video here:

Books for ISI MStat Entrance Exam

How to Prepare for ISI MStat Entrance Exam

ISI MStat and IIT JAM Stat Problems and Solutions

Cheenta Statistics Program for ISI MStat and IIT JAM Stat

Simple Linear Regression – Playlist on YouTube

Categories

## Probability in Marbles | AMC 10A, 2010| Problem No 23

Try this beautiful Problem on Probability in Marbles based on smallest value AMC 10 A, 2010. You may use sequential hints to solve the problem.

## Probability in Marbles – AMC-10A, 2010- Problem 23

Each of 2010 boxes in a line contains a single red marble, and for $1 \leq k \leq 2010$, the box in the $k$ th position also contains $k$ white marbles. Isabella begins at the first box and successively draws a single marble at random from each box, in order. She stops when she first draws a red marble. Let $P(n)$ be the probability that Isabella stops after drawing exactly $n$ marbles. What is the smallest value of $n$ for which $P(n)<\frac{1}{2010}$ ?

,

• $20$
• $22$
• $44$
• $45$
• $46$

Probability

Combination

Marbles

## Suggested Book | Source | Answer

Pre College Mathematics

#### Source of the problem

AMC-10A, 2010 Problem-23

#### Check the answer here, but try the problem first

$45$

## Try with Hints

#### First Hint

Given that Each of 2010 boxes in a line contains a single red marble, and for $1 \leq k \leq 2010$, the box in the $k$ th position also contains $k$ white marbles..

Therefore The probability of drawing a white marble from box $k$ is $\frac{k}{k+1}$ and the probability of drawing a red marble from box $k$ is $\frac{1}{k+1}$

Now can you finish the problem?

#### Second Hint

Also given that She stops when she first draws a red marble. Let $P(n)$ be the probability that Isabella stops after drawing exactly $n$ marbles.

Therefore we can say $P(n)=\left(\frac{1}{2} \cdot \frac{2}{3} \cdot \frac{3}{4} \cdots \frac{n-1}{n}\right) \cdot \frac{1}{n+1}=\frac{1}{n(n+1)}$

Now Can you finish the Problem?

#### Third Hint

Therefore the probability $\frac{1}{n(n+1)}<\frac{1}{2010}$ or $n(n+1)>2010$

Now $n^2+n-2010>0$

Now to find out the factorization we see that $45 \times 46=2070$ and $44 \times 45 =1980$

As $n$ is smallest so $n=45$

Categories

## Bayes’ in-sanity || Cheenta Probability Series

One of the most controversial approaches to statistics, this post mainly deals with the fundamental objections to Bayesian methods and Bayesian school of thinking. Turning to the Bayesian crank, Fisher put forward a vehement objection towards Bayesian Inference, describing it as “fallacious rubbish”.

However, ironically enough, it’s interesting to note that Fisher’s greatest statistical failure, fiducialism, was essentially an attempt to “enjoy the Bayesian omelette without breaking any Bayesian eggs” !

### Inductive Logic

An inductive logic is a logic of evidential support. In a deductive logic, the premises of a valid deductive argument logically entail the conclusion, where logical entailment means that every logically possible state of affairs that makes the premises true must make the conclusion truth as well. Thus, the premises of a valid deductive argument provide total support for the conclusion. An inductive logic extends this idea to weaker arguments. In a good inductive argument, the truth of the premises provides some degree of support for the truth of the conclusion, where this degreeofsupport might be measured via some numerical scale.

If a logic of good inductive arguments is to be of any real value, the measure of support it articulates should be up to the task. Presumably, the logic should at least satisfy the following condition:

The logic should make it likely (as a matter of logic) that as evidence accumulates, the total body of true evidence claims will eventually come to indicate, via the logic’s measure of support, that false hypotheses are probably false and that true hypotheses are probably true.

One practical example of an easy inductive inference is the following:

” Every bird in a random sample of 3200 birds is black. This strongly supports the following conclusion: All birds are black. “

This kind of argument is often called an induction by enumeration. It is closely related to the technique of statistical estimation.

### Critique of Inductive Logic

Non-trivial calculi of inductive inference are shown to be incomplete. That is, it is impossible for a calculus of inductive inference to capture all inductive truths in some domain, no matter how large, without resorting to inductive content drawn from outside that domain. Hence inductive inference cannot be characterized merely as inference that conforms with some specified calculus.
A probabilistic logic of induction is unable to separate cleanly neutral support from disfavoring evidence (or ignorance from disbelief). Thus, the use of probabilistic representations may introduce spurious results stemming from its expressive inadequacy. That such spurious results arise in the Bayesian “doomsday argument” is shown by a re-analysis that employs fragments of inductive logic able to represent evidential neutrality. Further, the improper introduction of inductive probabilities is illustrated with the “self-sampling assumption.”

### Objections to Bayesian Statistics

While Bayesian analysis has enjoyed notable success with many particular problems of inductive inference, it is not the one true and universal logic of induction. Some of the reasons arise at the global level through the existence of competing systems of inductive logic. Others emerge through an examination of the individual assumptions that, when combined, form the Bayesian system: that there is a real valued magnitude that expresses evidential support, that it is additive and that its treatment of logical conjunction is such that Bayes’ theorem ensues.

The fundamental objections to Bayesian methods are twofold: on one hand, Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience. The second objection to Bayes’ comes from the opposite direction and addresses the subjective strand of Bayesian inference.

Andrew Gelman , a staunch Bayesian pens down an interesting criticism of the Bayesian ideology in the voice of a hypothetical anti-Bayesian statistician.

Here is the list of objections from a hypothetical or paradigmatic non-Bayesian ; and I quote:

“Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications. Subjective prior distributions don’t transfer well from person to person, and there’s no good objective principle for choosing a non-informative prior (even if that concept were mathematically defined, which it’s not). Where do prior distributions
come from, anyway? I don’t trust them and I see no reason to recommend that other people do, just so that I can have the warm feeling of philosophical coherence. To put it another way, why should I believe your subjective prior? If I really believed it, then I could just feed you some data and ask you for your subjective posterior. That would save me a lot of effort!”

In 1986 , a statistician as prominent as Brad Efron restates these concerns mathematically:

“I like unbiased estimates and I like confidence intervals that really have their advertised confidence coverage. I know that these aren’t always going to be possible, but I think the right way forward is to get as close to these goals as possible and to develop robust methods that work with minimal assumptions. The Bayesian approach—to give up even trying to approximate unbiasedness and to instead rely on stronger and stronger assumptions—that seems like the wrong way to go. When the priors I see in practice are typically just convenient conjugate forms. What a coincidence that, of all the infinite variety of priors that could be chosen, it always seems to be the normal, gamma, beta, etc., that turn out to be the right choices?

Well that really sums up every frequentist’s rant about Bayes’ 😀 !

### And the torrent of complaints never ceases….

Some frequentists believe that in the old days, Bayesian methods at least had the virtue of being mathematically
clean. Nowadays, they all seem to be computed using Markov chain Monte Carlo, which means that, not only can you not realistically evaluate the statistical properties of the method, you can’t even be sure it’s converged, just adding one more item to the list of unverifiable (and unverified) assumptions in Bayesian belief.

As the applied statistician Andrew Ehrenberg wrote :

Bayesianism assumes:

(a) Either a weak or uniform prior, in which case why bother?,

(b) Or a strong prior, in which case why collect new data?,

(c) Or more realistically, something in between,in which case Bayesianism always seems to duck the issue.”

Many are skeptical about the new found empirical approach of Bayesians which always seems to rely on the assumption of “exchangeability”, which is almost impossible to obtain in practical scenarios.

### Finally Peace!!!

No doubt, some of these are strong arguments worthy enough to be taken seriously.

There is an extensive literature, which sometimes seems to overwhelm that of Bayesian inference itself, on
the advantages and disadvantages of Bayesian approaches. Bayesians’ contributions to this discussion have included defense (explaining how our methods reduce to classical methods as special cases, so that we can be as inoffensive as anybody if needed).

Obviously, Bayesian methods have filled many loopholes in classical statistical theory.

And always remember that you are subjected to mass-criticism only when you have done something truly remarkable walking against the tide of popular opinion.

Hence : “All Hail the iconoclasts of Statistical Theory:the Bayesians

N.B. The above quote is mine XD

Wait for our next dose of Bayesian glorification!

Till then ,

Stay safe and cheers!

## References

1.”Critique of Bayesianism”- John D Norton

2.”Bayesian Informal Logic and Fallacy” – Kevin Korb

3.”Bayesian Analysis”- Gelman

4.”Statistical Re-thinking”- Richard McElreath

Categories

## Laplace in the World of Chances| Cheenta Probability Series

In this post, we will be discussing mainly, naive Bayes Theorem, and how Laplace, developed the same idea as Bayes, independently and his law of succession go.

I cannot conceal the fact here that in the specific application of these rules, I foresee many things happening which can cause one to badly mistaken if he does not proceed cautiously.

James Bernoulli

While watching a cricket match we often, try to predict what may happen in the next ball, and several time, we guess it correctly, I don’t know much about others, but my predictions very often turns out to be true, even to the extent that, if I say, ” may be Next ball will be an out-side edge caught behind by the keeper” and such thing really happens withing next 2 or 3 balls if not the immediate next ball. In college, I had a friend who could also give such precise predictions while watching a cricket match, even though he was not a student of probability. So, you see while at home or among friends, people think that we are getting lucky about our predictions.

Well, truly speaking, there’s nothing wrong in that assumptions, we are indeed guessing and getting lucky. But what matters is our chance of getting lucky with our predictions is relatively higher than others !! While talking about chances, remember while making our judgements, we have no mathematical chances in our hand on which we are making predictions. What we just know is that, the proposition, we are predicting has reasonably higher probability than any other outcomes, we can think off. But how reasonable ?? Really No idea !! Actually see to take a decision regarding what may happen in the next ball, we don’t need to know the mathematical probabilities, rather the need of developing probability is quite the other way around. i.e. for a judgement or proposition, you think its gonna happen or its true, we need to develop probabilistic calculation to judge how significant is my prediction.

Say, you are a manager of a cricket team(not an ordinary), and you need to pick a team for a future tournament, and you need to observe the performance in this current season, as you want to give a significant weightage on the current form of the players. So, here working with your instinctive judgements can even cost you your job. So, here you need to be sure about the relative-significance of your judgements, and take a final decision. We will come to these sort of problems, later while discussing about how decision making can be aided by Bayesian thinking. And that’s where the real need of this theory lies. But as it happens, to apply first we need to our idea about the nature of these thinking quite clear. So, for now we will deal with some hypothetical but interesting problems.

#### Am I really Guessing ?

Well, it depends what definition of guessing you are setting. Ofcourse I was guessing, but the question is if my guesses are often correct, what is the possible explanation ?? The answer is quite simple, I’m not making judgements emotionally !! Often people realise that this may be their favorite batsman may miss a ton, but still stay emotional in predicting that !! What, parameters I always look into is the parameters where a sane probability believer will put his/her eyes on, i.e. How often, the batsman scores runs in consecutive matches, which bowler bowling and his\her ability ton swing the ball away from the batsman, in order to have an outside kiss from the bat, how often the batsman facing the ball, leaves or play balls outside off, etc etc etc. Any serious cricket lover will keep these things in account while making judgements. So, you see we are not actually guessing randomly. We are using information from every single ball. Hence, I’m always updating the chance of the propositions which I think may happen, with the information, I’m extracting after each ball is played. In precise our decision making is itself a Bayesian Robot, if and only if we are ready to give our biases !!

### Naive Bayes

We have already discussed about how the seed of inverse thinking to establish possible causal explanation was planted by Thomas Bayes. (if you haven’t read our previous post, here it is Bayes and The Billiard Table | Cheenta Probability Series ). The astonishing thing is that, even though Bayes’ idea of evaluating inverse probability using available information was intuitive and mathematical enough, it still remained unknown or criticized if known in most of the Europe. There were mainly two reasons for that, first, may advanced thinking was not the cup of tea which the 18th century mathematicians and probability people, were ready to drink, they eventually needed the evolution of Computer to drink that cup completely, and the second reason was that, even though Bayes’ idea was intuitive and radical, it needed serious mathematical support, or it would have collapsed.

So, Bayes idea was quite simple and elegant. Suppose you have a suspicion, say $S$, say the batsman will not score a ton. Then, you have a set of information say $I$, say that s\he scored a ton in the last match. So, the chance (or expectation) of your suspicion $S$ to be come true, when you have observed $I$ is the ratio of the chance (or expectation) that you had observed this kind of information $I$, when actually your suspicion was correct and the chance of observing what you have observed i.e. chance of observing $I$. So, mathematically,

$P(S|I)=\frac{P(I|S)P(S)}{P(I)}$

If we break down the $P(I)$, using Total Probability (or expectation) law, (remember !!), then we will get the form of Bayes theorem, we are accustomed to see in our textbooks,

$P(S|I)=\frac{P(I|S)P(S)}{P(I|S)P(S)+P(I|S^c)P(S^c)}$ .

Hence, here our Prior probability is $P(S)$ .i,e. chance of your suspicion to be true, gets updated to the posterior probability $P(S|I)$, i.e. chance of your suspicion to be true when you have observed some information supporting or doubting your suspicion. The point is you state about the truth of your prediction is changing towards the reality !

Now in the above, expression, the place where controversies arises, is what is the nature of $P(S)$ ? that is how often, your (our), suspicion about a particular thing turns out to be true ? Here comes our hypothetical problem of Extrasensory Perception which we will ultimately converge in to the Law of Succession, developed by none other than the great Laplace.

## Laplace Places his Thoughts

Now, suppose we are interested to know what is the chance, that my guess about the next next ball will be correct, when it is already known that some of the guesses I made earlier turned out to be correct.

Let, I, have made $n$ guesses earlier as, $G_1,G_2,….,G_n$ among which $k$ guesses turned out to be correct, now if I make another guess say, $G_{n+1}$, what is the chance that my current guess will turn out to be true ?

Now, we will present the solution to this problem, but we will first develop the the story and intuition developed by one of the pioneer of this field. The solution turned out to be a law in future.

Thoughts are often like noises, that pops-up here and there, when in England, Bayes’s hidden work got published and didn’t got due attention, then in other part of Europe, the similar thoughts pops-up in the mind of young but brilliant Pierre-Simon Laplace. Now obviously I don’t need to say more about who he is.

That was the era when Astronomy was most quantified and respected branch of science. The Science was looking forward to test Newton’s Theories by explaining how precisely gravitation effects the movements of tides, interacting planets and comets, our moon, and the shape of the Earth and other planets. Years of Empirical data was collected. The Scientists and astronomers everyday went to sleep with the fear that, a single exception in their expected data could bring the entire edifice tumbling down. The question which all mattered is whether the Universe is stable !!

Astronomers, knew the planets are moving. There came a time some of them feared that slowly accelerating Jupiter will smash into the Sun someday !! The problem of predicting the motions of many interacting bodies over long periods of time is complex even today, and Newton concluded that God’s miraculous intervention kept the heavens in equilibrium.

Laplace who was an Astronomer turned mathematician, took it as a challenge to explain the stability of the Universe and decided dedicating his thoughts in that. He said that while doing this Mathematics will be his telescope in hand. For a time being, he started considering ways to modify Newtons’s theory of gravitation by making gravity vary with a body’s velocity as well as with its mass and distance. He also wondered fleetingly whether comets might be disturbing the orbits of Jupiter and Saturn. But he changed his mind almost immediately. He realised the problem was not Newtons Theory, but the data collected by the astronomers.

Newtons’s system of Gravitation, could have been verified, only if the measurements would come precise and as expected. But observational astronomy was awash with information, some of it uncertain and inadequate. That’s where Laplace felt the need to introduce probability in his scientific research. This is also a very important moment for probability theory, it came out from its gambling table and got preference on the papers of a scientist. But still Laplace was far enough from the Bayesian ideas, which he was to develop in future.

In next five years Laplace wrote 13 papers in solving problems in astronomy and mathematics of celestial mechanics but still was rejected from getting membership, in French Royal Academy of Sciences. Then a time came when he actually started considered , of emigrating to Prussia to work in their academies. During this frustrated period, when he used to spent his afternoons digging in mathematical literature in libraries. And remember he was still worried about the problem with the errors in the measured astronomical data, and was beginning to think that it would require a fundamentally new way of thinking, may be probability theory to deal with the uncertainties prevading many events and their causes. That is when he began to see the light. And in that light he found the same book, which even stimulated the grey cells of Thomas Bayes, just a decade ago, he got “The Doctrine of Chances” by Abraham de Moivre. May be Laplace studied a new version of the book, unlike Bayes.

Laplace’s growing interest in probability theory created a diplomatic problem, stalwarts like d’Alembert believed probability was too subjective for developing scientific arguments. But Laplace was young and daring to bring revolution in the thinking. He was quite sure that only probability can help him in getting precise solution while dealing with the complex problems of movements in celestial bodies. And in the process he immortalized Probability Theory while finding its application in such a higher form of scientific investigations. He began thinking, how he can find an causal explanation, behind the divergence in the error filled observations. He independently developed a thought behind developing ” Probability of Causes” derived from the already happened events.

In is first paper on this topic, in 1773, atheist Laplace compared ignorant mankind, not with God but with an imaginary intelligence capable of knowing it all. Because humans can never know everything with certainty, probability is the mathematical expression of our ignorance : “We owe to the frailty of the human mind one of the most delicate and ingenious of mathematical theories, namely the science of chance or probabilities.

He often said he did not believe in God, but neither her Biographer could decipher whether he was an atheist or a diest. But his probability of causes was a mathematical expression of the universe, and for the rest of his days he updated his theories about God and probability of causes as new evidence became available.

#### Laplace’s Principle of Succession

Laplace, at first dealt with the same problem as Bayes, about judging the bias of a coin, by flipping it a number of times. But, he modified a version which was quite identical to the philosophical problem, proposed by Hume, which asks the probability that the sun going to rise tomorrow when you know that sun is being rising everyday for the past $5000$ years. Observe that it also very much coincides with the problem of guessing I presented at the beginning of this section.

He developed his principle, which mathematically equates as the formula we came across in the Naive Bayes, infact that form of Bayes rule is more due to Laplace than due to Bayes himself !! So, using his principle, and accepting the restrictive assumption that all his possible causes or hypotheses were equally likely, he started using the Uniform prior. Laplace calculates the probability of success in the next trial ( sun rising tomorrow ), given there are $n$ successes earlier in all $n$ trials.

He, defined, a variable ( which we call Random Variable), $X_i$ which takes value of $1$, if success comes at $i$ th trial or $0$ if failure. Now, with what probability, a success will come that is unknown to us, and that what the unknown bias is, hence he took that chance say, $p$ to be distributed uniformly within the interval, $(0,1)$. Let the probability density of $p$, be $f$. Now, let $S_n$ be the number of success in $n$ trials. Then, $S_n= X_1+X_2+….+X_n$. Here, $S_n=n$. So, we need, $P(X_{n+1}=1 |X_1=1,X_2=1,….,X_n=1)$ which is precisely, $P(X_{n+1}|S_n=n)$.

Laplace principle was, The probability of a cause ( success in the next trial) given an event ( past $n$ trials all resulted in success) is proportional to the probability of the event, given the cause. Which is mathematically,

$P(X_{n+1}=1 | S_n=n) \propto P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)$

Now, see that the event of success in next trial can occur with probability $p$ that we don’t yet know, and wish to know. So, with $X_{n+1}=1$ we are actually claiming the chance of success is $p$, which is uniformly distributed within $(0,1)$. So, Now the question is what a should be the constant of proportionality ?? Laplace is witty enough to answer that the constant of proportionality is nothing but the normalizing constant of the posterior probability, $P(X_{n+1}=1 |S_n=n)$ !! Since we know, conditional probabilities are also probabilities and they also follow the conglomerability and adds up to 1. Hence, in this case, the required constant is $\frac{1}{P(S_n=n)}$ .

Now, in our statement of proportionality becomes,

$P(X_{n+1}=1|S_n=n)=\frac{P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)}{P(S_n=n)}$. Isn’t it look like the Bayes rule we all know !!

Now there are two, ways the probability can be computed, I will present the elegant and more complicated way, the other you can search yourself!!

As, I was discussing that, the event $X{n+1}=1$ is bijective to the even that the success chance is some $p$. So,

$P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)=P(S_n=n| success \ probability \ is p \ is \ uniform \ in \ 0<p<1 )P(X_{n+1}=1|success \ probability \ is p \ is \ uniform \ in \ 0<p<1) \\= \int^1_0 p^n p \,dp= \frac{1}{n+2}$, integrated since we consider all values within the interval $(0,1)$ has same density i.e. $f(p)=1$ when $0<p<1$. Now our required posterior is,

$P(X_{n+1}=1|S_n) \propto \frac{1}{n+2}$,

Now, one can verify that, our normalizing constant, $P(S_n=n)$ is$\frac{1}{n+1}$. Use, Law of total probability over $0<p<1$, using the prior density of $p$. Hence, finally, Laplace got,

$P(X_{n+1}=1|S_n=n)=\frac{n+1}{n+2}$. Hence the chance of the sun rising tomorrow when it has risen, past $n$ days is $n+1$ out of $n+2$. Now, the solution to the guessing problem is also a matter of assessing the same arguments, which I leave in the hands of the reader, to find out. Another thing to note here, that Laplace, was the first called this conditional probability as likelihood, which became a quite important part of literature in Bayesian inference.

This principle, then went on to be known as the “Laplace Law of Succession“. The rationale behind the nomenclature is, that with the information about the outcome of every trial, one can update the information about the chances of the success, in a successive order. Just like Thomas Bayes updated his information about the position of his read ball relative to the position of each black ball rolled on the billiard table.

Notice that for large numbers of trials an application of Laplace’s rule is very close to simply taking the relative frequency of heads as ones’s probability for heads the next time. In this setting, with a lot of data, naive frequentism does not go far wrong. But who, on initially getting two heads, would give probability one on heads the next time ?

## Laplace Generalizes

Now, the controversy or may be in some cases, fallacy of this more rightfully called, Bayes-Laplace Rule, was at the uniform approximation of the priors. Suppose a flat prior is not appropriate. That is in most cases the coin may be biased, but it is unlikely to be very biased. Perhaps one might want a prior like a symmetric bell-shaped distribution,

or it may be more likely to be biased in one direction having a skewed bell-shaped prior.

Then the questions arises are, Can the simplicity and tractability of the Bayes-Laplace analysis be retained ? It can. We choose an appropriate prior density proportional to the likelihood.

As, I discussed in the solution above, Laplace, wittily used the normalizer of the posterior probability of distribution, as the constant of proportionality, which further made the prior density to integrate to $1$.

The distribution we basically considered in the above solution could be generalized by Beta distribution, whose shapes are governed by the parameters of it that are often names as $n$ and $m$. The density of beta looks like,

$\frac{p^{n-1}(1-p)^{m-1}}{normalizer}$, here, the Bayes-Laplace flat prior has both $n$ and $m$ equals to 1. While in the symmetric bell-shaped prior, which is peaked at $\frac{1}{2}$, has both $n$ and $m$ to be equal to $10$, whereas in the second case of the skewed prior, the $n$ is taken to $5$ and $m$ kept same as $10$.

Now, since the principle of Laplace states the prior density is proportional to the likelihood, pilling up frequency data keeps the updated density in the beta family. Suppose starting with parameters $n$ and $m$, in a squence of $t$ trials, we incurred $s$ successes. Hence, our new beta density will have parameters $n+s$ and $m+(t-s)$. The resulting rule of succession gives us the probability of success for the next trial, on the evidence of $s$ successes in $t$ trials, as $\frac{s+n}{t+n+m}$,

Clearly as claimed at the end of the last section, this ratio almost becomes the relative frequency $\frac{s}{t}$, for large number of trials, which again swamps the prior. How fast this swamps the prior that depends on the magnitude of $n$ and $m$.

This is here where we can actually look into not only the predictive power of this rule, but also how it updates its densities about the unknown.

### Priors Modified for Coin Tossing

Suppose, we have $62$ heads in $100$ tosses. The updated densities from our uniform, symmetric, and skewed priors doesn’t show much difference. Bernoulli’s inference from frequency to chance doesn’t look too bad here, but now we know what assumptions we had to make to get that result.

There are limited number of shapes that can be made with beta priors. Now if one is aware of the technicalities of coin tossing, then one might want a different shape to quantify their state of prior ignorance. Persi Diaconis, a dedicated Bayesian and an experienced person regarding coin tossing, points out that coins spun on edge tend to be biased one way or another but more often towards tails. So, if an unknown coin is to be spun, Persi would prefer to put his beliefs on a bimodal prior density with somewhat higher peak on the tails’ side, which can’t be represented by beta distribution. However, we can represent such distributions, by mixtures of two beta densities, one peaked towards heads and one peaked towards tails, where the second peak is of higher altitude. Updating on frequency evidence is still relatively simple, treating the two betas as metahypotheses and their weights as prior probabilities.

More generally, one has a very high rich palette of shapes available for quantifying prior states of beliefs using finite mixtures of betas. Arguably one can get anything one might find rational to represent their prior mixture of knowledge and ignorance. As before, with lot of evidence such niceties will not matter much. But if we are going to risk a lot on the next few trials, it would be prudent for us to devote some thought to putting whatever we know into our prior.

## Laplace continues…

Having his principle structured , he first applied his new, “probability of causes”, to solve two gambling problems when he realized that his principle need more modification. In each case he understood intuitively what should happen but got bogged down trying to prove it mathematically. First problem, we worked with an urn filled with black and white tickets in an unknown proportion ( his cause). He first drew some number of tickets from the urn and based on that experience, asked for the probability that in the next draw his ticket will be white. To, prove the answer , he fought a frustrating battle and had to write $45$ equations, covering every corner of four quarto-sized pages. Today those $45$ equations became redundant, or better to say reduced and compressed within of lines of simulation codes.

His second problem involved a piquet, a game requiring both luck and skill. Two people start playing but stop midway through the game and have to figure out how to divide the kitty by estimating their relative skill levels ( the cause). This problems, surely reminds us about the problems on which Pascal and Fermat worked, but there they both assumed that the players have equal skills. Laplace’s version is more realistic.

With these two gambling problems, Laplace dealt with two very important perspective of uncertainties, first that is unknown parameter, first problem quite remarkably portrays the basic motive of Statistical Inference. And in the second problem, he dealt with even more finer perspective of uncertainty, that is Chance and Causes, which in future make this Bayes-Laplace model to be an important and comprehensive tool in drawing conclusion in the new Science of Cause and Effect.

Laplace, was then to move towards solving his actual problems in astronomy. How should they deal with different observations of the same phenomenon ? He was all set to address three of that era’s biggest problems, that involved Gravitational attraction on the motions of our moon, the motions of the planets Jupiter and Saturn, and shape of the Earth. We shall keep the application of Bayesian Probabilities in these astronomical problems for some other day.

### Laplace eventually credits Bayes

Eventhough, after the surfacing and developments of the Bayesian perspective, Statistical fraternity, got divided into the two groups of Frquentists and Bayesians, ironically, both Bayes and Laplace were neutral themselves. Bayes, even in his published essay, referred his dependencies on the frequencies while get an idea about his prior assumption, and never ignited the debate neither foresee such kind of debates in future.

Similarly Laplace, in his book on Probabilities, acknowledges the relative resemblances in his principle of Probability of Causes and frequency methods, which I tried putting light on, in the previous sections. He besides from being the resurrecting Bayes’ rule, also invented the Central Limit Theorem, which is more kind of an Frequencist’s tool than a Bayesians’.

When Laplace started grappling with his probability of causes, and attacking problems in celestial mechanics in 1781, Richard Price arrives Paris and informed them about the discovery of Bayes’. Laplace immediately latched onto Bayes’ ingenious invention, the starting guess, and incorporated it into his own, earlier version of the probability of causes. Hence, he was now confident that he was on the right track in assuming the prior causes equally likely, and assured himself about the validity of his principle. Everytime he gets a new information he could use the answer from his last solution as the starting point for another calculation, That is he goes on successively. And by assuming all the prior causes equally likely, he could now formulate his principle into a law or a theorem. Though soon he was to realise about the shortcomings of his assumption of equally likely, and hence the need for generalizing, which we already talked about a bit under the section Laplace Generalizes.

Laplace later credited Bayes with being first when he wrote, “The theory whose principles I explained some years after,…. he accomplished in an acute and very ingenious, though slightly awkward, manner.

Although Bayes originated the probability of causes, Laplace discovered the same on his own. When, Bayes’ Essay eas published by his friend Price, Laplace was only 15. The approach and the principle both Bayes and Laplace developed are independent mathematically speaking. We will be discussing in more details the mathematical perspectives of both Laplace and Bayes in our coming articles.

Till then, stay safe, and keep finding the solutions for the Gambling Problems Laplace worked on, they no more need 45 equations to be solved nowadays !!

References

1. 1. Probability Theory- the logic of science – E.T.Jaynes
2. 2. A Philosophical Essay on Probabilities – Peirre-Simon Laplace
3. 3. The theory that would not Die- Sharon Bertsch Mcgrayne
4. 4. Ten Great Ideas About Chance- Skyrms, Diaconis

Categories

## ISI MStat PSB 2009 Problem 4 | Polarized to Normal

This is a very beautiful sample problem from ISI MStat PSB 2009 Problem 4. It is based on the idea of Polar Transformations, but need a good deal of observation o realize that. Give it a Try it !

## Problem– ISI MStat PSB 2009 Problem 4

Let $R$ and $\theta$ be independent and non-negative random variables such that $R^2 \sim {\chi_2}^2$ and $\theta \sim U(0,2\pi)$. Fix $\theta_o \in (0,2\pi)$. Find the distribution of $R\sin(\theta+\theta_o)$.

### Prerequisites

Convolution

Polar Transformation

Normal Distribution

## Solution :

This problem may get nasty, if one try to find the required distribution, by the so-called CDF method. Its better to observe a bit, before moving forward!! Recall how we derive the probability distribution of the sample variance of a sample from a normal population ??

Yes, you are thinking right, we need to use Polar Transformation !!

But, before transforming lets make some modifications, to reduce future complications,

Given, $\theta \sim U(0,2\pi)$ and $\theta_o$ is some fixed number in $(0,2\pi)$, so, let $Z=\theta+\theta_o \sim U(\theta_o,2\pi +\theta_o)$.

Hence, we need to find the distribution of $R\sin Z$. Now, from the given and modified information the joint pdf of $R^2$ and $Z$ are,

$f_{R^2,Z}(r,z)=\frac{r}{2\pi}exp(-\frac{r^2}{2}) \ \ R>0, \theta_o \le z \le 2\pi +\theta_o$

Now, let the transformation be $(R,Z) \to (X,Y)$,

$X=R\cos Z \\ Y=R\sin Z$, Also, here $X,Y \in \mathbb{R}$

Hence, $R^2=X^2+Y^2 \\ Z= \tan^{-1} (\frac{Y}{X})$

Hence, verify the Jacobian of the transformation $J(\frac{r,z}{x,y})=\frac{1}{r}$.

Hence, the joint pdf of $X$ and $Y$ is,

$f_{X,Y}(xy)=f_{R,Z}(x^2+y^2, \tan^{-1}(\frac{y}{x})) J(\frac{r,z}{x,y}) \\ =\frac{1}{2\pi}exp(-\frac{x^2+y^2}{2})$ , $x,y \in \mathbb{R}$.

Yeah, Now it is looking familiar right !!

Since, we need the distribution of $Y=R\sin Z=R\sin(\theta+\theta_o)$, we integrate $f_{X,Y}$ w.r.t to $X$ over the real line, and we will end up with, the conclusion that,

$R\sin(\theta+\theta_o) \sim N(0,1)$. Hence, We are done !!

## Food For Thought

From the above solution, the distribution of $R\cos(\theta+\theta_o)$ is also determinable right !! Can you go further investigating the occurrence pattern of $\tan(\theta+\theta_o)$ ?? $R$ and $\theta$ are the same variables as defined in the question.

Give it a try !!

Categories

## ISI MStat PSB 2008 Problem 7 | Finding the Distribution of a Random Variable

This is a very beautiful sample problem from ISI MStat PSB 2008 Problem 7 based on finding the distribution of a random variable . Let’s give it a try !!

## Problem– ISI MStat PSB 2008 Problem 7

Let $X$ and $Y$ be exponential random variables with parameters 1 and 2 respectively. Another random variable $Z$ is defined as follows.

A coin, with probability p of Heads (and probability 1-p of Tails) is
tossed. Define $Z$ by $Z=\begin{cases} X & , \text { if the coin turns Heads } \\ Y & , \text { if the coin turns Tails } \end{cases}$
Find $P(1 \leq Z \leq 2)$

### Prerequisites

Cumulative Distribution Function

Exponential Distribution

## Solution :

Let , $F_{i}$ be the CDF for i=X,Y, Z then we have ,

$F_{Z}(z) = P(Z \le z) = P( Z \le z | coin turns Head )P(coin turns Head) + P( Z \le z | coin turns Tail ) P( coin turns Tail)$

=$P( X \le z)p + P(Y \le z ) (1-p)$ = $F_{X}(z)p+F_{Y}(y) (1-p)$

Therefore pdf of Z is given by $f_{Z}(z)= pf_{X}(z)+(1-p)f_{Y}(z)$ , where $f_{X} and f_{Y}$ are pdf of X,Y respectively .

So , $P(1 \leq Z \leq 2) = \int_{1}^{2} \{pe^{-z} + (1-p) 2e^{-2z}\} dz = p \frac{e-1}{e^2} +(1-p) \frac{e^2-1}{e^4}$

## Food For Thought

Find the the distribution function of $K=\frac{X}{Y}$ and then find $\lim_{K \to \infty} P(K >1 )$

Categories

## ISI MStat PSB 2009 Problem 6 | abNormal MLE of Normal

This is a very beautiful sample problem from ISI MStat PSB 2009 Problem 6. It is based on the idea of Restricted Maximum Likelihood Estimators, and Mean Squared Errors. Give it a Try it !

## Problem-ISI MStat PSB 2009 Problem 6

Suppose $X_1,…..,X_n$ are i.i.d. $N(\theta,1)$, $\theta_o \le \theta \le \theta_1$, where $\theta_o < \theta_1$ are two specified numbers. Find the MLE of $\theta$ and show that it is better than the sample mean $\bar{X}$ in the sense of having smaller mean squared error.

### Prerequisites

Maximum Likelihood Estimators

Normal Distribution

Mean Squared Error

## Solution :

This is a very interesting Problem ! We all know, that if the condition “$\theta_o \le \theta \le \theta_1$, for some specified numbers $\theta_o < \theta_1$” had been not given, then the MLE would have been simply $\bar{X}=\frac{1}{n}\sum_{k=1}^n X_k$, the sample mean of the given sample. But due to the restriction over $\theta$ things get interestingly complicated.

So, simplify a bit, lets write the Likelihood Function of $theta$ given this sample, $\vec{X}=(X_1,….,X_n)’$,

$L(\theta |\vec{X})={\frac{1}{\sqrt{2\pi}}}^nexp(-\frac{1}{2}\sum_{k=1}^n(X_k-\theta)^2)$, when $\theta_o \le \theta \le \theta_1$ow taking natural log both sides and differentiating, we find that ,

$\frac{d\ln L(\theta|\vec{X})}{d\theta}= \sum_{k=1}^n (X_k-\theta)$.

Now, verify that if $\bar{X} < \theta_o$, then $L(\theta |\vec{X})$ is always a decreasing function of $\theta$, [ where, $\theta_o \le \theta \le \theta_1$], Hence the maximum likelihood attains at $\theta_o$ itself. Similarly, when, $\theta_o \le \bar{X} \le \theta_1$, the maximum likelihood attains at $\bar{X}$, lastly the likelihood function will be increasing, hence the maximum likelihood will be found at $\theta_1$.

Hence, the Restricted Maximum Likelihood Estimator of $\theta$, say

$\hat{\theta_{RML}} = \begin{cases} \theta_o & \bar{X} < \theta_o \\ \bar{X} & \theta_o\le \bar{X} \le \theta_1 \\ \theta_1 & \bar{X} > \theta_1 \end{cases}$

Now, to check that, $\hat{\theta_{RML}}$ is a better estimator than $\bar{X}$, in terms of Mean Squared Error (MSE).

Now, $MSE_{\theta}(\bar{X})=E_{\theta}(\bar{X}-\theta)^2=\int^{-\infty}_\infty (\bar{X}-\theta)^2f_X(x)\,dx$

$=\int^{-\infty}_{\theta_o} (\bar{X}-\theta)^2f_X(x)\,dx+\int^{\theta_o}_{\theta_1} (\bar{X}-\theta)^2f_X(x)\,dx+\int^{\theta_1}_\infty (\bar{X}-\theta)^2f_X(x)\,dx$.

$\ge \int^{-\infty}_{\theta_o} (\theta_o-\theta)^2f_X(x)\,dx+\int^{\theta_o}_{\theta_1} (\bar{X}-\theta)^2f_X(x)\,dx+\int^{\theta_1}_\infty (\theta_1-\theta)^2f_X(x)\,dx$

$=E_{\theta}(\hat{\theta_{RML}}-\theta)^2=MSE_{\theta}(\hat{\theta_{RML}})$.

Hence proved !!

## Food For Thought

Now, can you find an unbiased estimator, for $\theta^2$ ?? Okay!! now its quite easy right !! But is the estimator you are thinking about is the best unbiased estimator !! Calculate the variance and also compare weather the Variance is attaining Cramer-Rao Lowe Bound.

Give it a try !! You may need the help of Stein’s Identity.