Laplace in the World of Chances| Cheenta Probability Series

In this post, we will be discussing mainly, naive Bayes Theorem, and how Laplace, developed the same idea as Bayes, independently and his law of succession go.

I cannot conceal the fact here that in the specific application of these rules, I foresee many things happening which can cause one to badly mistaken if he does not proceed cautiously.

James Bernoulli

While watching a cricket match we often, try to predict what may happen in the next ball, and several time, we guess it correctly, I don't know much about others, but my predictions very often turns out to be true, even to the extent that, if I say, " may be Next ball will be an out-side edge caught behind by the keeper" and such thing really happens withing next 2 or 3 balls if not the immediate next ball. In college, I had a friend who could also give such precise predictions while watching a cricket match, even though he was not a student of probability. So, you see while at home or among friends, people think that we are getting lucky about our predictions.

Well, truly speaking, there's nothing wrong in that assumptions, we are indeed guessing and getting lucky. But what matters is our chance of getting lucky with our predictions is relatively higher than others !! While talking about chances, remember while making our judgements, we have no mathematical chances in our hand on which we are making predictions. What we just know is that, the proposition, we are predicting has reasonably higher probability than any other outcomes, we can think off. But how reasonable ?? Really No idea !! Actually see to take a decision regarding what may happen in the next ball, we don't need to know the mathematical probabilities, rather the need of developing probability is quite the other way around. i.e. for a judgement or proposition, you think its gonna happen or its true, we need to develop probabilistic calculation to judge how significant is my prediction.

Say, you are a manager of a cricket team(not an ordinary), and you need to pick a team for a future tournament, and you need to observe the performance in this current season, as you want to give a significant weightage on the current form of the players. So, here working with your instinctive judgements can even cost you your job. So, here you need to be sure about the relative-significance of your judgements, and take a final decision. We will come to these sort of problems, later while discussing about how decision making can be aided by Bayesian thinking. And that's where the real need of this theory lies. But as it happens, to apply first we need to our idea about the nature of these thinking quite clear. So, for now we will deal with some hypothetical but interesting problems.

Am I really Guessing ?

Well, it depends what definition of guessing you are setting. Ofcourse I was guessing, but the question is if my guesses are often correct, what is the possible explanation ?? The answer is quite simple, I'm not making judgements emotionally !! Often people realise that this may be their favorite batsman may miss a ton, but still stay emotional in predicting that !! What, parameters I always look into is the parameters where a sane probability believer will put his/her eyes on, i.e. How often, the batsman scores runs in consecutive matches, which bowler bowling and his\her ability ton swing the ball away from the batsman, in order to have an outside kiss from the bat, how often the batsman facing the ball, leaves or play balls outside off, etc etc etc. Any serious cricket lover will keep these things in account while making judgements. So, you see we are not actually guessing randomly. We are using information from every single ball. Hence, I'm always updating the chance of the propositions which I think may happen, with the information, I'm extracting after each ball is played. In precise our decision making is itself a Bayesian Robot, if and only if we are ready to give our biases !!

Naive Bayes

We have already discussed about how the seed of inverse thinking to establish possible causal explanation was planted by Thomas Bayes. (if you haven't read our previous post, here it is Bayes and The Billiard Table | Cheenta Probability Series ). The astonishing thing is that, even though Bayes' idea of evaluating inverse probability using available information was intuitive and mathematical enough, it still remained unknown or criticized if known in most of the Europe. There were mainly two reasons for that, first, may advanced thinking was not the cup of tea which the 18th century mathematicians and probability people, were ready to drink, they eventually needed the evolution of Computer to drink that cup completely, and the second reason was that, even though Bayes' idea was intuitive and radical, it needed serious mathematical support, or it would have collapsed.

So, Bayes idea was quite simple and elegant. Suppose you have a suspicion, say \(S\), say the batsman will not score a ton. Then, you have a set of information say \(I\), say that s\he scored a ton in the last match. So, the chance (or expectation) of your suspicion \(S\) to be come true, when you have observed \(I\) is the ratio of the chance (or expectation) that you had observed this kind of information \(I\), when actually your suspicion was correct and the chance of observing what you have observed i.e. chance of observing \(I\). So, mathematically,

\(P(S|I)=\frac{P(I|S)P(S)}{P(I)}\)

If we break down the \(P(I)\), using Total Probability (or expectation) law, (remember !!), then we will get the form of Bayes theorem, we are accustomed to see in our textbooks,

\(P(S|I)=\frac{P(I|S)P(S)}{P(I|S)P(S)+P(I|S^c)P(S^c)}\) .

Hence, here our Prior probability is \(P(S)\) .i,e. chance of your suspicion to be true, gets updated to the posterior probability \(P(S|I)\), i.e. chance of your suspicion to be true when you have observed some information supporting or doubting your suspicion. The point is you state about the truth of your prediction is changing towards the reality !

Now in the above, expression, the place where controversies arises, is what is the nature of \(P(S)\) ? that is how often, your (our), suspicion about a particular thing turns out to be true ? Here comes our hypothetical problem of Extrasensory Perception which we will ultimately converge in to the Law of Succession, developed by none other than the great Laplace.

Laplace Places his Thoughts

Now, suppose we are interested to know what is the chance, that my guess about the next next ball will be correct, when it is already known that some of the guesses I made earlier turned out to be correct.

Let, I, have made \(n\) guesses earlier as, \(G_1,G_2,....,G_n\) among which \(k\) guesses turned out to be correct, now if I make another guess say, \(G_{n+1}\), what is the chance that my current guess will turn out to be true ?

Now, we will present the solution to this problem, but we will first develop the the story and intuition developed by one of the pioneer of this field. The solution turned out to be a law in future.

Thoughts are often like noises, that pops-up here and there, when in England, Bayes's hidden work got published and didn't got due attention, then in other part of Europe, the similar thoughts pops-up in the mind of young but brilliant Pierre-Simon Laplace. Now obviously I don't need to say more about who he is.

Perrie Simon Laplace
Perrie-Simon Laplace

That was the era when Astronomy was most quantified and respected branch of science. The Science was looking forward to test Newton's Theories by explaining how precisely gravitation effects the movements of tides, interacting planets and comets, our moon, and the shape of the Earth and other planets. Years of Empirical data was collected. The Scientists and astronomers everyday went to sleep with the fear that, a single exception in their expected data could bring the entire edifice tumbling down. The question which all mattered is whether the Universe is stable !!

Astronomers, knew the planets are moving. There came a time some of them feared that slowly accelerating Jupiter will smash into the Sun someday !! The problem of predicting the motions of many interacting bodies over long periods of time is complex even today, and Newton concluded that God's miraculous intervention kept the heavens in equilibrium.

Laplace who was an Astronomer turned mathematician, took it as a challenge to explain the stability of the Universe and decided dedicating his thoughts in that. He said that while doing this Mathematics will be his telescope in hand. For a time being, he started considering ways to modify Newtons's theory of gravitation by making gravity vary with a body's velocity as well as with its mass and distance. He also wondered fleetingly whether comets might be disturbing the orbits of Jupiter and Saturn. But he changed his mind almost immediately. He realised the problem was not Newtons Theory, but the data collected by the astronomers.

Newtons's system of Gravitation, could have been verified, only if the measurements would come precise and as expected. But observational astronomy was awash with information, some of it uncertain and inadequate. That's where Laplace felt the need to introduce probability in his scientific research. This is also a very important moment for probability theory, it came out from its gambling table and got preference on the papers of a scientist. But still Laplace was far enough from the Bayesian ideas, which he was to develop in future.

In next five years Laplace wrote 13 papers in solving problems in astronomy and mathematics of celestial mechanics but still was rejected from getting membership, in French Royal Academy of Sciences. Then a time came when he actually started considered , of emigrating to Prussia to work in their academies. During this frustrated period, when he used to spent his afternoons digging in mathematical literature in libraries. And remember he was still worried about the problem with the errors in the measured astronomical data, and was beginning to think that it would require a fundamentally new way of thinking, may be probability theory to deal with the uncertainties prevading many events and their causes. That is when he began to see the light. And in that light he found the same book, which even stimulated the grey cells of Thomas Bayes, just a decade ago, he got "The Doctrine of Chances" by Abraham de Moivre. May be Laplace studied a new version of the book, unlike Bayes.

Laplace's growing interest in probability theory created a diplomatic problem, stalwarts like d'Alembert believed probability was too subjective for developing scientific arguments. But Laplace was young and daring to bring revolution in the thinking. He was quite sure that only probability can help him in getting precise solution while dealing with the complex problems of movements in celestial bodies. And in the process he immortalized Probability Theory while finding its application in such a higher form of scientific investigations. He began thinking, how he can find an causal explanation, behind the divergence in the error filled observations. He independently developed a thought behind developing " Probability of Causes" derived from the already happened events.

In is first paper on this topic, in 1773, atheist Laplace compared ignorant mankind, not with God but with an imaginary intelligence capable of knowing it all. Because humans can never know everything with certainty, probability is the mathematical expression of our ignorance : "We owe to the frailty of the human mind one of the most delicate and ingenious of mathematical theories, namely the science of chance or probabilities."

He often said he did not believe in God, but neither her Biographer could decipher whether he was an atheist or a diest. But his probability of causes was a mathematical expression of the universe, and for the rest of his days he updated his theories about God and probability of causes as new evidence became available.

Laplace's Principle of Succession

Laplace, at first dealt with the same problem as Bayes, about judging the bias of a coin, by flipping it a number of times. But, he modified a version which was quite identical to the philosophical problem, proposed by Hume, which asks the probability that the sun going to rise tomorrow when you know that sun is being rising everyday for the past \(5000\) years. Observe that it also very much coincides with the problem of guessing I presented at the beginning of this section.

He developed his principle, which mathematically equates as the formula we came across in the Naive Bayes, infact that form of Bayes rule is more due to Laplace than due to Bayes himself !! So, using his principle, and accepting the restrictive assumption that all his possible causes or hypotheses were equally likely, he started using the Uniform prior. Laplace calculates the probability of success in the next trial ( sun rising tomorrow ), given there are \(n\) successes earlier in all \(n\) trials.

He, defined, a variable ( which we call Random Variable), \(X_i \) which takes value of \(1\), if success comes at \(i\) th trial or \(0\) if failure. Now, with what probability, a success will come that is unknown to us, and that what the unknown bias is, hence he took that chance say, \(p\) to be distributed uniformly within the interval, \((0,1)\). Let the probability density of \(p\), be \(f\). Now, let \(S_n\) be the number of success in \(n\) trials. Then, \(S_n= X_1+X_2+....+X_n\). Here, \(S_n=n\). So, we need, \(P(X_{n+1}=1 |X_1=1,X_2=1,....,X_n=1)\) which is precisely, \(P(X_{n+1}|S_n=n)\).

Laplace principle was, The probability of a cause ( success in the next trial) given an event ( past \(n\) trials all resulted in success) is proportional to the probability of the event, given the cause. Which is mathematically,

\(P(X_{n+1}=1 | S_n=n) \propto P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)\)

Now, see that the event of success in next trial can occur with probability \(p\) that we don't yet know, and wish to know. So, with \(X_{n+1}=1\) we are actually claiming the chance of success is \(p\), which is uniformly distributed within \((0,1)\). So, Now the question is what a should be the constant of proportionality ?? Laplace is witty enough to answer that the constant of proportionality is nothing but the normalizing constant of the posterior probability, \(P(X_{n+1}=1 |S_n=n)\) !! Since we know, conditional probabilities are also probabilities and they also follow the conglomerability and adds up to 1. Hence, in this case, the required constant is \(\frac{1}{P(S_n=n)}\) .

Now, in our statement of proportionality becomes,

\(P(X_{n+1}=1|S_n=n)=\frac{P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)}{P(S_n=n)}\). Isn't it look like the Bayes rule we all know !!

Now there are two, ways the probability can be computed, I will present the elegant and more complicated way, the other you can search yourself!!

As, I was discussing that, the event \(X{n+1}=1\) is bijective to the even that the success chance is some \(p\). So,

\(P(S_n=n|X_{n+1}=1)P(X_{n+1}=1)=P(S_n=n| success \ probability \ is p \ is \ uniform \ in \ 0<p<1 )P(X_{n+1}=1|success \ probability \ is p \ is \ uniform \ in \ 0<p<1) \\= \int^1_0 p^n p \,dp= \frac{1}{n+2}\), integrated since we consider all values within the interval \((0,1)\) has same density i.e. \(f(p)=1\) when \(0<p<1\). Now our required posterior is,

\(P(X_{n+1}=1|S_n) \propto \frac{1}{n+2}\),

Now, one can verify that, our normalizing constant, \(P(S_n=n)\) is\( \frac{1}{n+1}\). Use, Law of total probability over \(0<p<1\), using the prior density of \(p\). Hence, finally, Laplace got,

\(P(X_{n+1}=1|S_n=n)=\frac{n+1}{n+2}\). Hence the chance of the sun rising tomorrow when it has risen, past \(n\) days is \(n+1\) out of \(n+2\). Now, the solution to the guessing problem is also a matter of assessing the same arguments, which I leave in the hands of the reader, to find out. Another thing to note here, that Laplace, was the first called this conditional probability as likelihood, which became a quite important part of literature in Bayesian inference.

This principle, then went on to be known as the "Laplace Law of Succession". The rationale behind the nomenclature is, that with the information about the outcome of every trial, one can update the information about the chances of the success, in a successive order. Just like Thomas Bayes updated his information about the position of his read ball relative to the position of each black ball rolled on the billiard table.

Notice that for large numbers of trials an application of Laplace's rule is very close to simply taking the relative frequency of heads as ones's probability for heads the next time. In this setting, with a lot of data, naive frequentism does not go far wrong. But who, on initially getting two heads, would give probability one on heads the next time ?

Laplace Generalizes

Now, the controversy or may be in some cases, fallacy of this more rightfully called, Bayes-Laplace Rule, was at the uniform approximation of the priors. Suppose a flat prior is not appropriate. That is in most cases the coin may be biased, but it is unlikely to be very biased. Perhaps one might want a prior like a symmetric bell-shaped distribution,

The Symmetric Prior
The Symmetric Prior ; Beta with n=m=10

or it may be more likely to be biased in one direction having a skewed bell-shaped prior.

The Skewed Prior
The Skewed Prior ; Beta with n=5, m=10

Then the questions arises are, Can the simplicity and tractability of the Bayes-Laplace analysis be retained ? It can. We choose an appropriate prior density proportional to the likelihood.

As, I discussed in the solution above, Laplace, wittily used the normalizer of the posterior probability of distribution, as the constant of proportionality, which further made the prior density to integrate to \(1\).

The distribution we basically considered in the above solution could be generalized by Beta distribution, whose shapes are governed by the parameters of it that are often names as \(n\) and \(m\). The density of beta looks like,

\(\frac{p^{n-1}(1-p)^{m-1}}{normalizer} \), here, the Bayes-Laplace flat prior has both \(n\) and \(m\) equals to 1. While in the symmetric bell-shaped prior, which is peaked at \(\frac{1}{2}\), has both \(n\) and \(m\) to be equal to \(10\), whereas in the second case of the skewed prior, the \(n\) is taken to \(5\) and \(m\) kept same as \(10\).

Now, since the principle of Laplace states the prior density is proportional to the likelihood, pilling up frequency data keeps the updated density in the beta family. Suppose starting with parameters \(n\) and \(m\), in a squence of \(t\) trials, we incurred \(s\) successes. Hence, our new beta density will have parameters \(n+s\) and \(m+(t-s)\). The resulting rule of succession gives us the probability of success for the next trial, on the evidence of \(s\) successes in \(t\) trials, as \(\frac{s+n}{t+n+m}\),

Clearly as claimed at the end of the last section, this ratio almost becomes the relative frequency \(\frac{s}{t}\), for large number of trials, which again swamps the prior. How fast this swamps the prior that depends on the magnitude of \(n\) and \(m\).

This is here where we can actually look into not only the predictive power of this rule, but also how it updates its densities about the unknown.

Priors Modified for Coin Tossing

Suppose, we have \(62\) heads in \(100\) tosses. The updated densities from our uniform, symmetric, and skewed priors doesn't show much difference. Bernoulli's inference from frequency to chance doesn't look too bad here, but now we know what assumptions we had to make to get that result.

The Posteriors after 100 tosses
The Posteriors after 100 tosses, for corresponding priors.

There are limited number of shapes that can be made with beta priors. Now if one is aware of the technicalities of coin tossing, then one might want a different shape to quantify their state of prior ignorance. Persi Diaconis, a dedicated Bayesian and an experienced person regarding coin tossing, points out that coins spun on edge tend to be biased one way or another but more often towards tails. So, if an unknown coin is to be spun, Persi would prefer to put his beliefs on a bimodal prior density with somewhat higher peak on the tails' side, which can't be represented by beta distribution. However, we can represent such distributions, by mixtures of two beta densities, one peaked towards heads and one peaked towards tails, where the second peak is of higher altitude. Updating on frequency evidence is still relatively simple, treating the two betas as metahypotheses and their weights as prior probabilities.

More generally, one has a very high rich palette of shapes available for quantifying prior states of beliefs using finite mixtures of betas. Arguably one can get anything one might find rational to represent their prior mixture of knowledge and ignorance. As before, with lot of evidence such niceties will not matter much. But if we are going to risk a lot on the next few trials, it would be prudent for us to devote some thought to putting whatever we know into our prior.

Laplace continues...

Having his principle structured , he first applied his new, "probability of causes", to solve two gambling problems when he realized that his principle need more modification. In each case he understood intuitively what should happen but got bogged down trying to prove it mathematically. First problem, we worked with an urn filled with black and white tickets in an unknown proportion ( his cause). He first drew some number of tickets from the urn and based on that experience, asked for the probability that in the next draw his ticket will be white. To, prove the answer , he fought a frustrating battle and had to write \(45 \) equations, covering every corner of four quarto-sized pages. Today those \(45\) equations became redundant, or better to say reduced and compressed within of lines of simulation codes.

His second problem involved a piquet, a game requiring both luck and skill. Two people start playing but stop midway through the game and have to figure out how to divide the kitty by estimating their relative skill levels ( the cause). This problems, surely reminds us about the problems on which Pascal and Fermat worked, but there they both assumed that the players have equal skills. Laplace's version is more realistic.

With these two gambling problems, Laplace dealt with two very important perspective of uncertainties, first that is unknown parameter, first problem quite remarkably portrays the basic motive of Statistical Inference. And in the second problem, he dealt with even more finer perspective of uncertainty, that is Chance and Causes, which in future make this Bayes-Laplace model to be an important and comprehensive tool in drawing conclusion in the new Science of Cause and Effect.

Laplace, was then to move towards solving his actual problems in astronomy. How should they deal with different observations of the same phenomenon ? He was all set to address three of that era's biggest problems, that involved Gravitational attraction on the motions of our moon, the motions of the planets Jupiter and Saturn, and shape of the Earth. We shall keep the application of Bayesian Probabilities in these astronomical problems for some other day.

Laplace eventually credits Bayes

Eventhough, after the surfacing and developments of the Bayesian perspective, Statistical fraternity, got divided into the two groups of Frquentists and Bayesians, ironically, both Bayes and Laplace were neutral themselves. Bayes, even in his published essay, referred his dependencies on the frequencies while get an idea about his prior assumption, and never ignited the debate neither foresee such kind of debates in future.

Similarly Laplace, in his book on Probabilities, acknowledges the relative resemblances in his principle of Probability of Causes and frequency methods, which I tried putting light on, in the previous sections. He besides from being the resurrecting Bayes' rule, also invented the Central Limit Theorem, which is more kind of an Frequencist's tool than a Bayesians'.

When Laplace started grappling with his probability of causes, and attacking problems in celestial mechanics in 1781, Richard Price arrives Paris and informed them about the discovery of Bayes'. Laplace immediately latched onto Bayes' ingenious invention, the starting guess, and incorporated it into his own, earlier version of the probability of causes. Hence, he was now confident that he was on the right track in assuming the prior causes equally likely, and assured himself about the validity of his principle. Everytime he gets a new information he could use the answer from his last solution as the starting point for another calculation, That is he goes on successively. And by assuming all the prior causes equally likely, he could now formulate his principle into a law or a theorem. Though soon he was to realise about the shortcomings of his assumption of equally likely, and hence the need for generalizing, which we already talked about a bit under the section Laplace Generalizes.

Laplace later credited Bayes with being first when he wrote, "The theory whose principles I explained some years after,.... he accomplished in an acute and very ingenious, though slightly awkward, manner. "

Although Bayes originated the probability of causes, Laplace discovered the same on his own. When, Bayes' Essay eas published by his friend Price, Laplace was only 15. The approach and the principle both Bayes and Laplace developed are independent mathematically speaking. We will be discussing in more details the mathematical perspectives of both Laplace and Bayes in our coming articles.

Till then, stay safe, and keep finding the solutions for the Gambling Problems Laplace worked on, they no more need 45 equations to be solved nowadays !!

References

  1. 1. Probability Theory- the logic of science - E.T.Jaynes
  2. 2. A Philosophical Essay on Probabilities - Peirre-Simon Laplace
  3. 3. The theory that would not Die- Sharon Bertsch Mcgrayne
  4. 4. Ten Great Ideas About Chance- Skyrms, Diaconis

Some Previous Posts

Bayes and The Billiard Table | Cheenta Probability Series

This is the first of the many posts, that I will be writing on the evolution of Bayesian Thinking and Inverse Inferences, in Probability Theory, which actually changed Statistics from a tool of Data interpretation to Causal Science.

When the facts change, I change my opinion. What do you do, sir ?

-John Maynard Keynes

In the climax of our last discussion, I kept my discussion about the Jelly-bean example incomplete to begin here afresh. (If you haven't read that, you can read it before we start, here it is Judgements in a Fitful Realm | Cheenta Probability Series ). There we were actually talking about the instances, how evidences can exihibit chanciness in this uncertain world. Today we will discuss how we can update our beliefs or judgements ( Judgemental Probabilities), based on these uncertain evidences, provided we have observed a pattern in the occurrence of this so-called circumstantial evidences.

Or in more formal literature, it is referred as Inverse-Inference, as we will first observe some outcomes and then we will go deeper investigating the plausible explanations in terms of chances, so as to have some presumed idea about future outcomes . There arises two immediate questions,

Before discussing these questions, let us discuss about the structure and some ideas behind this way of Probability Analysis. I hope with some example, the reader will able to answer the above questions themselves, and eventually appreciate this particular school of thought which inspite of lot of controversies inspired independent fields of Statistics, which made statistics one of the most important knowledge of this century. Statistics doesn't remain just a mere tool of data interpreting but, is now capable of giving causal explanations to anything and everything, from questions like whether "Smoking Causes Cancer", or " What is the chance of having a Nuclear accident?".

A century earlier, asking this sort of questions to a statistician, was outrageous, as most of the statisticians ( very likely to be egoistic), would not admit their inability in answering these sorts, would say more likely " its not answerable, due to lack of evidences", or in other words implying, "in order to find the chance of a nuclear accident, you first need to organize a planned nuclear accident !!"

Bayes makes his Glorious Entry

In 1763, in an article, "Essays towards solving a Problem in Doctrine of Chances", as authored by Thomas Bayes, he put his ideas as,

"Given the number of times in which an unknown event happened or failed.

Required the chance that probability of its happening in a single trial lies somewhere between any two degrees of Probability that can be named. "

Its Strange, that what Bayes stated is so coinciding with the idea of conglomerability stated by De Fenetti nearly after 200years. This is where, I feel the evolution of probability theory is so perplexing, since often quite advanced ideas emerged earlier, and then there basic explanations were put in to words afterwards. And then, there are people who put these pieces of jigsaw puzzles in places, we will come back to this works later some other day.

As Bayes' gravestone suggests, he died in 1761 at the age of 59. After 2 years of his death, his friend Richard Price, published his Essay. Price communicated the essay, together with an introduction and an appendix by himself to the Royal Society, got it published in its Philosophical Transactions in 1763. Price, while referring to Bayes' idea writes,

Rev Thomas Bayes
Rev Thomas Bayes
like his rule, there lies controversies regarding this portrait also !! But unfortunately is the only available portrait, where this person is believed to be Bayes with all certainties.

".....he says that his design at first thinks of the subject of it was, to find out a method by which we might judge concerning the probability that an event has to happen, in given circumstances, upon the supposition that we know nothing concerning it but that, under the same circumstances, it has happened a certain number of times and failed a certain other number of times. "

Basically, Bayes was talking about a machinery which would find the predictive probability that something will happen, next time, from the past information. Bayes predecessors, even including Bernoulli and de Moivre, had reasoned from chances to frequency. Bayes gave a mathematical foundation for- inference from frequencies to chances.

Even though, with advancement of his theory, Bayes' rule found many useful application from Breaking Enigma, to answering whether, Smoking causes Cancer or many other sorts, Bayes himself was not motivated to put his ideas on paper for solving a practical problem, on the contrary what motivated Bayes, was a philosophical debate which demanded mathematical argument. To, me what Bayes' idea propagates is the sole uniformity and subjectivity of nature. In one way it makes us convince that we are by virtue dependent on chances, but on the other hand it suggest with every new information, we always have a scope of improving our ideas about the uncertainty, which seemed more uncertain, before that extra bit of information. It simply tells, that it all depends on some God damn Information.

Bayes sees the Light

An incendiary mix of religion and mathematics exploded over England in 1748, when the Scottish philosopher David Hume published an essay attacking some of fundamental narratives of organized religions. Hume believed that we can't be absolutely certain about anything that is based only on traditional beliefs, testimony, habitual relationships, or cause and effect.

As it happens, God was regarded as the First Cause of everything, Hume's skepticism about cause-and-effect relationships was especially unsettling. Hume claimed that there is always association between certain objects or event, and how they occur. Like the earlier discussion, we are likely to umbrella on a rainy day, so there is a strong association with the weather and your carrying of umbrella, but that doesn't any how implies your umbrella is the cause why it is cloudy out there, rather its the other way around. This was a pretty straight forward illustration, but as Hume illustrates more philosophically, that,

"....Being determined by the custom transfer the past to the future, in all our inferences; where the past has been entirely regular and uniform, we expect the event with the greatest assurance, and leave no room for any contrary supposition. But where different effects have been found to follow from causes, which are to appearance exactly similar, all these various effects must occur to the mind in transferring the past to the future, and enter into our consideration, when we determine the probability of the event. Though we give preference to that which has been found most usual, and believe that this effect will exist, we must not overlook the other effects, but must to each of them a particular wei9ght and authority, in proportion as we have found it to be more less frequent. "

What actually, Hume tried to claim is that, you are taking umbrella that also even doesn't imply, its rainy or cloudy even, it may happen that you will use the umbrella to protect yourself from the heat, it may be less likely ( for a given person), but still not at all unworthy of neglecting it completely. And most important, the "design of the world" does not prove the existence of a creator, an ultimate cause. Because we can seldom be certain that a particular cause will have a particular effect, we must be content with finding only probable causes and probable effects.

Even though, Hume's essay was not mathematically sound it had profound scientific food for Bayes to think over it and develop a mathematics to quantify such probabilities. Many mathematicians and scientists used to believe that the inexplicability of the laws of the Nature, proves the existence of God, their First Cause. As de Moivre put it in his "Doctrine of Chances" , calculations about natural events would eventually reveal the underlying order of the universe and its exquisite "Wisdom and Design".

The arguments, motivated Bayes, and he became keen to find ways to treat these thoughts mathematically. Sitting in that century, directly develop a probabilistic mathematics was quite difficult, as the idea of Probability was itself not very clear to the then Thinkers and Mathematicians. It was that era, when people would only understand Gambling, if you utter the word Chance. By that time, while spending his days in French Prison ( because he was a Protestant), De Moivre already had solved a gambling problem, when he worked out from cause-to-effect( like finding the chance of getting four aces in one poker hand). But still no-one ever thought of working a problem other way around, i.e. predict the causes, for an observed effect. Bayes, got in interested in questions as, what if a poker player deals himself four aces in each of the three consecutive hands ? What is the underlying chance (or cause) that his deck is loaded ?

As, Bayes himself kept his idea hidden until his fried Price, rediscovered it, it is very difficult to guess what exactly piqued Bayes' interest in the problem of inverse probability. Though he was aware of De Moivre's works, and getting interested in probability as it applied to gambling. Alternatively, it may also happen that, he was worried about the cause of Gravity, that Newton suggested, but Newton neither gave any Causal validation of Gravity , nor he talked about the truthfulness of his theory. Hence this also can be the possible reason, why he got interested in developing mathematical arguments, to predict the cause from observed effects. Finally Bayes' interest may have been stimulated by Hume's philosophical essay.

Crystallizing the essence of inverse probability problem in his mind, Bayes decided that his ai is to achieve the approximate chance of a future event, about which he knew nothing about except the pattern regarding its past occurrence. It is guessed that sometime sandwiched between 1746 and 1749, when he developed an ingenious solution. To reach the solution Bayes devised a thought experiment, which can be metaphorically referred as a 1700s version of a computer simulation. We will get to the problem, after discussing a bit about how Bayes, modified the frequency interpreting of probability.

Bayes Modifies

At the very beginning of the essay Bayes takes the liberty to modify the general frequency interpretation, and ended up defining conditional probability, and as it happens his definition of probability were actually remarkable anticipations of the judgemental coherence views, which were developed by likes of De Fenetti and Ramsay, years after. After defining what we call mutually set of mutually exclusive and exhaustive set of events, Bayes goes forward explaining probability as,

"The Probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon its happening. "

Like a true probabilist, Bayes defined probability from a gambling point of view, talking about payoff as an outcome of each event. But we also can treat the result itself as the payoff or expected value as a result of certain events.

As we already discussed and I tried to make the point several time, that probability of any event can be interpreted as the weighted average of the judgemental probabilities ( conditional probabilities), which are obtained while observing some available evidences, and the weights of the so-defined mean are the probability of observing those evidences.

\(P(A)=P(A|E_1)P(E_1)+P(A|E_2)P(E_2)+........+P(A|E_n)P(E_n)\) ; here A is any event, which is depending on some set of Evidences, say \(E={E_1, E_2,.....,E_n}\).

Though very important restriction imposed by Bayes here is that, the set of possible evidences must be mutually exclusive and form an exhaustive set. i.e. \(E_1,E_2,....,E_n\) are mutually exclusive and exhaustive set.

This visualization of probability is important, once you enter the Bayesian regime. Moreover, even though frequency probability is our basic and primary understanding of probability, I find this interpretation of judgemental probabilities or sometimes also called Likelihoods( we will see later), more general model of probability, though a bit of abstraction associated, but that the true nature of an art, right ! And probability is an Art !

so, getting back to Bayes' definition of probability, mathematically speaking, If your total Judgement about an experiment (or gamble) is \(N\) (that is you put \(N\) unit on contract in case of gamble), and the there is an event \(e\), then the payoff from your investment of \(N\), you may expect from the occurrence of the event \(e\) is \(N.P(e)\), or

\(P(e)=\frac{ Expected \ value \ of \ out \ of \ N, \ if \ e }{N}\)

where, \(P(e)\) as the chance of the event \(e\). He completes his definition by claiming that "by Chance I mean Probability".

On basis of this definition, Bayes argues for the basic properties of probability, like additivity of disjoint probabilities in terms of additivity of expectations. But I choose not to elaborate here, as we already discussed about this in our last post and also in the post about Conglomerability. ( read this article, for more elaborate discussion Nonconglomerability and the Law of Total Probability || Cheenta Probability Series ).

Bayes Essay
Opening page of Bayes' Essay

Bayes goes on to establish the definition of conditional probability. He gives a separate treatment for the case where the conditioning event precedes the conditioned one and the case where the conditioning is subsequent to the conditioned one. The latter case is a bit perplexing as it is saying like some thing already happened, now we need to travel back the time and find what might have happened (behind the scene), such that it can explain our observation. But thats what Bayes claimed to find right !! So, here Bayes give a very interesting argument in his fourth proposition, where he invites us to consider an infinite number of trials determining the occurrence of the conditioning and conditioned events,

"If there be two subsequent events to be determined every day, and each day the probability of the 2nd is \(\frac{b}{N}\) and the probability of both \(\frac{P}{N}\), and I am to receive \(N\) if both events happen on the 1rst day on which the 2nd does ; I say, according to these considerations, the probability of my obtaining \(N\) is \(\frac{P}{b}\)....."

So, what Bayes says is on the first day either the condition happens- or if not he is facing the same wager as before :

"Likewise, if this coincident should not happen I have an expectation of being reinstated in my former circumstances."

This is to say, the Probability that a event occurring, when you already observed that another event has occurred already, is just the ratio of the Expectation of the coincidence ( that both the desired event and the event which occured happened) and the Expectation of the the event that has occurred. Some time this ratio is often referred as the likelihood of the desired event, while using it in the Bayesian Probability structure.

taking the gambling realm as Bayes, the probability of win on the supposition that \(E_2\) ( the second ) did not happen on the first day is just the original probability of a win. Let us assume unit stakes, so that expectation equals Probability, to simplify the exposition.

Then letting \(E_1\) be the first event and \(E_2\) the second , he argues as follows:

\(P(win)=P(win \ on \ day \ 1)+P(win \ later)\)

\(= P(E_1 \ and \ E_2)+P( not \ E_2)P(win)\)

\(=P(E_1 \ and \ E_2)+ (1-P(E_2))P(win)\)

\(P(win)=\frac{P(E_1 \ and \ E_2)}{P(E_2)}\).

This is what Bayes considered as the probability of \(E_1\) on the supposition \(E_2\) is taken as a corollary ( that is \(E_2\) has occurred or true ), but the exposition of the corollary contains an interesting twist, it goes like,

"Suppose after the expectation given me in foregoing proposition, and before it is all known whether the first event has happened or not, I should find that the second event has happened; from hence I can only infer that the event is determined on which my expectation depended, and have no reason to esteem the value of my expectation either greater or less than before. "

Here with expectation, he always means the odds of that particular event, and now I explained several times how probability can actually be interpreted as expectation, so I hope readers face no difficulty ( unfamiliarity may still exist) while going along with this kind of literature.

Now, Bayes gives a money-pump argument :

"For if I have reason to think it less, it would be reasonable to give something to be reinstated in my former circumstances, and this over and over again as I should be informed that the second event had happened, which is evidently absurd. "

He concludes explaining the opposite scenario as,

"And the like absurdity plainly follows if you say I ought to set a greater value on my expectation than before, for yhen it would be reasonable for me to refuse something if offered on the condition that I relinquish it, and be reinstated in my former circumstances...."

These arguments by Bayes gives two basic implications that, eventhough he didn't developed the sound mathematics of the nature of the probabilities he proposed, he had the idea of coherence and by extension conglomerability, which were yet to be put into mathematical literature.

Bayes in front of the Billiard Table, Finally !!

With conditional probability in hand, Bayes proceeds to the problem with which he begins the Essay. Suppose a coin, about whose bias we know nothing at all, has been flipped \(n\) times and has been heads \(m\) times. If \(x\) is the chance that coin comes up heads on a single toss, Bayes requires

\( P( x \ in \ [a,b] | m \ heads \ in \ n \ tosses) \) .

\(=\frac{P(x \ in \ [a,b] \ and \ m \ heads \ in \ n \ tosses)}{P(m \ heads \ in \ n \ tosses)}\).

To evaluate this, Bayes must assume something about the prior probability density over the chances. Prior probability density is the basically the prior (or initial) information about the desired unknown (here it is \(x\)), which he first assumes, and then he went on finding the required probability, which is called the posterior probability, based on the priors he assumed and the observations he made. So, basically he keeps updating his knowledge about the desired unknown starting with a mere information about the desired unknown (\(x\)). But the controversy arises where, he assumes the prior probability, or he makes an assumption about the prior information, that is the overall pattern on the nature of \(x\). We will come to these later, first express Bayes' final touches while completing the solution.

Now Bayes assumes a uniform prior density as the correct quantification of knowing nothing concerning it. Anticipating that this might prove controversial, as I mentioned above, and of course it has, he later offers a different justification in a scholium. On this basis, he applies Newton's calculus to get,

\(\frac{\int^b_a{n \choose m} x^m(1-x)^{n-m}\,dx}{\int^1_0{n \choose m}x^m(1-x)^{n-m}\,dx}\).

How are these to be solved? Bayes evaluates integral in the denominator by a geometrical trick. This is Bayes' "billiard table" argument.

Bayes' Billiard Table - Evolution of Bayesian Thinking
Bayes' "Billiard Table" illustration as done by him in his Essay.

Suppose we throw a red ball at random on a table and mark its distance from the leftmost side. Now then we toss \(n\) black balls one by one on the table, as shown in the figure.Lets call a ball that falls to the right of the red ball a head and one that falls to the left a tail. This corresponds to choosing a bias at random and flipping a coin of that bias \(n\) times. Now nothing hangs on the first ball being the red one. We could just throw \(n+1\) balls on the table and choose the one to be the red ball, the one to set bias, at random. But if we choose the leftmost ball to the red one, all is black balls count as heads and if we choose the right one to be the red ball, no black balls count as heads, and so forth. Thus the probability of \(m\) heads in \(n\) tosses is same for \(m=0,1,....,n\), hence the required probability must be \(\frac{1}{n+1}\). This is the value of the integral in the denominator. The integral in the numerator is harder and no such close form solution exists. Bayes however gives a way of approximating it too.

In scholium , Bayes uses his evaluation of the denominator to argue for his quantification of ignorance. He argues that, he knows nothing about the event except that there are \(n\) trials, he have no reason to think that it would succeed in some number of trials rather than another. Hence, he suggests that there is nothing wrong in taking

\(P(m \ heads \ in \ n \ tosses)=\frac{1}{n+1}\), as our quantification of ignorance about outcomes. The uniform prior, in fact follows from this - although Bayes did not have the proof !!

Priors to Posteriors- Journey Continues !

Once Bayes suggested a way of solving the inverse problem, of finding a bias of a coin given you observed a numbers of heads out of a number of tosses,

Or even extending the "billiard table" argument, suppose you are facing towards the wall and I threw the red ball and it stopped some where on the table, now you need to actually pin-point the position of the red ball, so I kept tossing each black ball (\(n\) times ) and noting whether the black ball is landing towards the left of the red ball or the right, now using this information about the black ball with respect to the randomly placed red ball, you can actually have the idea about the portion of the table where the red ball had stopped, right ! Bayes already answered that !!

Now say if you want to be more precise about the position of the red ball, so you requested me to throw another set of \(n\) balls, and repeat what I was doing. But now you have extra bit of information, that is you atleast know the possible portion of the red balls, from the posteriors, that Bayes calculated for you, so now you don't need to make the uniform assumption, whereas now you can you your newly acquired information, as your new prior and again update your posterior to an improved posterior probability about where on the damn table, your red ball rests.

So, this is where the genius of Bayes, takes probability to another level, using two most beautiful aspects of mathematics, that is inverse thinking and recursion.

We will get back into the next discussion, where we will be discussing about more example, the aftermath of the Bayesian introduction in the world of uncertainty, the man who did everything to give Bayesian Probability its firm footing, and obviously, "How to calculate the probability that the sun will rise tomorrow, given it has risen everyday for 5000years !!" .

Till then, stay safe, and keep finding the red ball on the billiard table, but don't turn around !!

References

  1. 1. An Essay towards Solving a Problem in the Doctrine of Chances - Thomas Bayes
  2. 2. The theory that would not Die- Sharon Bertsch Mcgrayne
  3. 3. Ten Great Ideas About Chance- Skyrms, Diaconis

Judgements in a Fitful Realm | Cheenta Probability Series

This post discusses how judgments can be quantified to probabilities, and how the degree of beliefs can be structured with respect to the available evidence in decoding uncertainty leading towards Bayesian Thinking.

The object of reasoning is to find out, from the consideration of what we already know, something else, which we do not know. Consequently, reasoning is good if it be such as to give a true conclusion from premises, and not otherwise.

-C.S. Pierce

In our quest for the actual form of uncertainty, and developing laws of chances, one of the most important thing is being judgemental.

Always, in life you don't have the luxury to observe a particular event finitely many times, and structure it in a known equiprobable frequency set-up as we have always tried in case of measuring chances. For example you just can't go out on finitely many cloudy days without an umbrella, to observe how many of the days it actually rains and you get wet. Of course, you can conduct this experiment, but on conducting such experiment, you could end up catching cold or even pneumonia.

But then again, if one fine day you woke up and see that its cloudy out there, and you felt that it may rain today so you carried your umbrella, but fortunately it didn't rained and carrying your umbrella was not of much use, so next similar day you didn't care to carry the umbrella and it rained !! : p

So, here you just relied in your judgement, which made you believe in the first day it would rain, but since it didn't rained again your judgement made you believe that it wouldn't rain the next day also, and you were misled by your judgement. So, now if you conclude that you should not rely on circumstantial judgement ! But again you are wrong ! Why ?? Well, that's what we are going to discuss here.

Can Beliefs be measured ?

Relying on personal beliefs and preferences, may not be encouraged well enough in conventional science, as it should not be, but mathematicians like Bruno De Fenetti, Frank Plumpton Ramsey and some others, suggested that if beliefs or judgements are made measurable (like we measure chances), then we can definitely keep our faith on them. So, now the question is are beliefs measurable ?? Won't those measures will be subjective ??

Ramsey answers that too, in this essay "Truth and Probability" Ramsey writes,

"It is a common view that belief and other psychological variables are not measurable, and if this is true our inquiry will be vain ; and so will the whole theory of probability conceived as a logic of partial belief; for if the phrase 'a belief two-thirds of certainty ' is meaningless, a calculus whose sole object is to enjoin such beliefs will be meaningless also. Therefore unless we are prepared to give up the whole thing as a bad job we are bound to hold that beliefs can to some extent be measured."

He continues,

"But I think beliefs do differ in measurability in the following two ways. First, some beliefs can be measured more accurately than others; and, secondly, the measurement of beliefs is almost certainly an ambiguous process leading to a variable answer depending on how exactly the measurement is conducted. The degree of a belief is in this respect like the time interval between two events; before Einstein it was supposed that all the ordinary ways of measuring a time interval would lead to the same result if properly performed. Einstein showed that this was not the case; and time interval can no longer be regarded as an exact notion, but must be discarded in all precise investigations.

I shall try to argue that the degree of a belief is just like a time interval; it has no precise meaning unless we specify more exactly how it is to be measured. But for many purposes we can assume that the alternative ways of measuring it lead to the same result, although this is only approximately true. The resulting discrepancies are more glaring in connection with some beliefs than with others, and these therefore appear less measurable. Both these types of deficiency in measurability, due respectively to the difficulty in getting an exact enough measurement and to an important ambiguity in the definition of the measurement process, occur also in physics and so are not difficulties peculiar to our problem; what is peculiar is that it is difficult to form any idea of how the measurement is to be conducted, how a unit is to be obtained, and so on. "

Now as Ramsey suggests that the idea, of how the beliefs will be measured was crucial here, but the had the answer for that too. What the did is that the made their judgements heavily dependent on the evidences, which the assumed to be quite certain. Though later we will see that evidences are not always certain, but that doesn't makes their measure falsified. Just we need additional machinery to handle the discrepancy in the evidence.

What De Fenetti and Ramsey did, is they expressed judgements and partial beliefs as mathematical probabilities, and believe me its here this mathematics of probability gets immensely powerful and beautiful. Now it is here where you feel that you can express every thought of yours in terms of mathematics.

In our last posts, we have always been talking about equiprobable cases, now even in the equiprobable scenario , your judgements are disguised as symmetry, and hence you don't even realize that you are judgemental by virtue. one of the obvious instance which exposes our virtue of being judgemental is we don't even consider the possiblity of a coin landing on its edge , which ( i discussed elaborately earlier) is afterall not impossible ( may be improbable). So, here we are putting our judgements on the nature of the coin .i.e. " its too thin to land on its edge".

Wise man always carries an Umbrella

"Any fool carries an umbrella on a wet day, but the wise man carries it every day. "โ€” Irish Proverb

Coming back to our example of the cloudy day, here you have no set of nice equiprobable set of cases, that can determine the chance of rain.

Well obviously you can say vaguely, " it rains or it doesn't, so its like a coin tossing " , or you may further suggest, " I have been observing, that its been raining more or less 4 days a week, so the chance of raining today is about 57% ." But the thing is, What about the overcast condition, is it not making you inclined towards the conclusion, "its going to rain" ? But again the question is how much we should be inclined towards the proposition "its going to rain" ?

De Fenetti would have said that since, you are certain about the fact that its cloudy, looking for the chance of raining is like ignoring the evidence that your are privileged of. That is here we must not look for the chance of raining as it is, rather here we must find the chance of raining when you already saw, that it is overcast out there. So, our judgement about rain is basically quantified as the probability of rain conditioned on the evidence that its cloudy.

So, if we say, \(R\) is the event (or proposition) "its going to rain" and \(C\) is another event(or proposition), "there is a cloud-cover". Then probability that its going to rain, i.e. \(P(R)\)is basically transformed to \(P(R|C)\), where we read this as "Probability its going to rain, given that there is a cloud cover". Hence the forecast you going to make is basically a coherent judgement of the situation which is basically expressed as conditional probabilities. So, here the concept of probability, reaches a spiritual level ( if I may say so ), where it is actually quantification of our belief based on evidences which are apparently certain. But still the question of transformation of belief to probability stays alive to be killed. :p

Believing in Probabilities

Beliefs as such are definitely a vague thing to put our trusts upon, but once there is a mathematical support behind a particular belief it remains no longer inferior as mathematics herself stands with all her might to defend it.

Now extending our example of cloudy day, suppose you have quite a few detailed observations. Say you observe that during monsoon more or less 4 days a it stays overcast ( from the morning ), and again as you further observed that 3 of the cloudy days end up being a rainy day, and you already observed that 4 days a week it rains (more or less).

So, when it is known that it is cloudy out there, the chance of downpour becomes 3 out of 4 ( or \(\frac{3}{4}\)), by your judgement. This is actually the chance generated from your coherent judgements are probability indeed.

Bruno De Fenetti
often refereed as a "radical probabilist"

De Fenetti showed that coherence is equivalent to one's judgeents having the mathematical structure of probability. He argued the judgements in a mathematical structure behaves like a proportion. (like in the above example we quantified our judgement by "3 out of 4" which is a proportion indeed ).He explained that while defining the chance of raining when you have observed the overcast conditions already, such chances ( judgemental probabilities as called by De Fenetti) are basically proportions of classical probability of two propositions "its going to rain" and "its cloudy out there", i.e. \(P(R|C)\) is basically the ratio of \(P( R \ and \ C)\) (we read this as , probability that its going to rain when it is cloudy ou there )and \(P(C)\) (probability of its cloudy out there). So, \(P(R|C)=\frac{P(R \ and \ C)}{P(C)}\). Hence,

  1. a proportion minimizes to 0.
  2. it also maximizes to 1.
  3. Proportions of a combination of mutually exclusive parts ,i.e. out of 4 cloudy days 3 of them ends up being rainy, and out of 3 clear sky-days (from morning) , 1 day it rains ( in the evening may be due to sudden accumulation of clouds) so the total proportion of rainy day a week is 4 out of 7.

Hence, De Fenetti, showed that coherent judgements can be mapped to mathematical probabilities. He clarified the validity of other way also.

Suppose \(D_i\) represents the total number of rainy day in \(i^{th}\) week when it was cloudy. So the total number of expected rainy days sums up as,

\(E(D_1+D_2+......)=E(D_1)+E(D_2)+......... \)

Now clearly, here as the individual expectations are non-negative, then there definitely non-negative number of rainy days on an average. Concluding that judgements that are mathematical probabilities are coherent.

Hence, De Fenetti concluded,

"Judgemental probabilities are coherent if and only if they have the mathematical structure of classical probabilities."

De Fenetti used Gambling problems to illustrate this concept, but I tried to with a more basic scenario of cloudy-rainy day, as illustrating gambling problems requires a different kind of literature which is too elaborate to explain in a breif discussion, though interested readers can go further and look for De Fenetti's illustrations.

Ramsey Completes the Circle

So, we already extended our visualisation of judgements as probabilities, but still the circle remains incomplete as yet we haven't came back to the classical set up, from judgemental probabilities. We have used classical probability structure to quantify judgements and re-structure judgements as probabilities and calculated the chances of raining and taking decisions on things like, whether you should carry an umbrella or not.

We are however helping ourselves to the classical equally probable cases and stipulating that the agent in the question takes them to be equally probable. (the agent is Cloudy-Raindy-Day example is you, who is taking the decision, whether to carry an umbrella). Ab thoroughgoing judgemental account would get all probabilities out of personal preferences. Now you might be tempted to say thats impossible. but that is what exactly Ramsey accomplished in his essay "Truth and Probability".

We have already got the rationale towards building our beliefs or judgements, what is still missing is Symmetry or in Ramsey's words idea of "ethically neutral" proposition. This is a proposition, \(p\) whose truth or falsity in and of itself, makes no difference to an agent's preferences. That is to say for any collection of outcomes \(B\) , the agent is indifferent between \(B\) with \(p\) true and \(B\) with \(p\) false.

Extending, the Cloudy-Rainy-day example, if i say that you have an exam and you have to go out, idi9fferent of the fact its is cloudy or not, here "its cloudy out there" is an "ethically neutral" proposition with subject to your choice of going out.

Now we can identify an ethically neutral proposition, \(h\) with probability \(\frac{1}{2}\) as follows. Consider two outcomes, \(A\) and \(B\) such that you prefer the 1rst one to the 2nd. then the ethically neutral proposition \(h\) has probability \(\frac{1}{2}\) for you if you are indifferent between [\(A\) if \(h\); B otherwise] and [\(B\) if \(h\); A otherwise]. This the key idea. We can you this over and over to re-construct our judgements, but that is something for some other day.

Frank P Ramsey
who not only revolutionized probability theory but also combinatorics in his short lifetime.

Ramsey goes to the Race Course

Now what De Fenetti called judgemental probabilities, Ramsey called it degree of beliefs of outcomes and we call that conditional probability today. I will use Ramsey's literature, hoping that readers will do the necessary mappings (with conditional probability). the reason I chose Ramsey's notations is because, I want our readers to understand the real intuition and motive behind formalizing conditional probabilities, which actually validates the conclusion,

"Conditioning is the soul of Judgements".

So, we will end our discussion, with a problem that explores the nature of the idea of "ethically neutral", as readers may find the clarification over this idea will be most helpful while understanding Law of Total Probability and what actually defines Stochastic Independence.

The Horse race

Consider four propositions HH, HT, TH, TT, which are mutually exclusive and jointly exaustive ( as outcomes of tossing two coins). Farmer Smith doesn't really care which of these is true. More specifically, for whatever way the things he does care about could come out, he is indifferent to their coming out that way with HH or with HT, or with TH, or with TT. Then in Ramsey's terminology these four propositions are ethically neutral.

Suppose, in addition, that for any things he does care about, \(A\) prefered to \(B\) preferred to \(C\) preferred to \(D\), he is indifferent between the gamble

and any other gamble that can be gotten by rearranging \(A\), \(B\) , \(C\), \(D\) , for instance,

Then for him HH,HT, TH, TT all have the same probability , equal to \(\frac{1}{4}\). (Perhaps this is because these events represents what he takes to be two independent flips of a fair coin and he is making judgements like Pascal and Fermat, as discussed earlier.)

Now, suppose Farmer Smith went on to bet on a Horse race. There is to be a race in which the horses, Stewball and Molly, compete. Farmer smith owns Stewball, and the propositions Stewball wins and Molly wins are not ethically neutral for him. He can wager on the race, with the prospect of winning a pig if the horse he wagers on wins.

His most prefered outcome is get pig and Stewball wins, that is with all certainty he is going to get the pig when Stewball wins, so he assigns 1 to its belief for this outcome, and symmetrically he will not get anything if Stewball loses, so he assigns 0 as the degree of belief to that outcome. These are just arbitrary choices of how to scale his degree of beliefs :

Farmer Smith is indifferent between : get pig and Molly wins and a hypothetical gamble that would ensure that he would get the Pig and Stewball would win if HH or HT or TH and that would get the pig and Stewball lose if TT, but that gamble will increase his degree of belief of having the pig by \(\frac{3}{4}\), so our new scale of beliefs is, ( basically what he does is chooses to toss two coin which replaces the condition of winning of Stewball )

He is indifferent between no pig and Molly loses and the hypothetical gamble that would ensure that he would get the pig and Stewball would win if HH and that he will get no pig and Stewball would lose if HT, TH or TT. Now we have

He is indifferent between the gamble get pig if Molly wins and no pig is she loses and the gamble get pig and Stewball wins if HH or HT, but no pig and Stewball loses if TH or TT. The first gamble is not conditioned on ethically neutral propositions, but it is equated to 1; that is the gamble get pig and Stewball wins if HH or HT, but no pig and Stewball loses if TH or TT has expected belief \(\frac{1}{2}. 1+ \frac{1}{2}. 0 = \frac{1}{2}\). So the first gamble, pig if Molly wins and no pig if she loses, must also satisfy

\(P( Molly \ wins ) P( Pig | Molly \ wins)+ (1-P(Molly \ wins))P(No \ Pig | Molly \ loses)= \frac{1}{2}\).

that is chance of getting the pig while conditioning on the proposition of Molly's win, is ethically neutral .

Here the conditional probabilities are actually the degree of beliefs that we scaled using the hypothetical gamble, here those believes are \(\frac{3}{4}\) and \(\frac{1}{4}\) respectively, Farmer's judgemental probabilities, in order to have Molly's win as an ethically neutral proposition, \(P(Molly's \ win)= \frac{1}{2}\).

Ramsey started with the coherent preference of ordering and showed how exact probabilities and degrees of beliefs such that the preferences are in accord with the expected belief i.e. the unconditional (classical) probability. This is a representation theorem for probability and degree of belief. Coherent preferences can be represented as coming from judgemental probability and personal beliefs by the rule of conditioning on (seemingly) certain evidences.You can call this kind of judgements as " Wishful Thinking" also.

Can Evidences be Uncertain ?

Before finishing, just trying to create some excuse to carry on the discussion some other day. Observe that through our structuring degrees of beliefs to judgemental probabilities, we considered that the evidences on which we are conditioning our preferences, are more or less certain, like cloudy sky(you can see that), or Molly's win ( though not certain as such but given Molly wins or loose it gains certainty). But there are instances where this evidences that are available to us are uncertain itself.

Richard C Jeffrey
who revolutionized Decision Theory

In Richard Jeffrey's Probability Kinematics he discussed that there is no evidential proposition that we earn with certainty. Rather, the evidential experience causes shifts in the possibilities of the other propositions conditional on members of evidential partition unchanged. This leads to rich general conception of updating with connections to minimal change of probabilities.

Like in our example of Cloudy-Rainy-Day, the evidence that it is overcast (from morning) can change due to the infuence of some more uncertain evidential alterations like direction of wind, speed of wind which is fortunately or unfortunately taking the clouds with her. But whatever it is, it is quite certain that this alterations in the circumstantial evidences will impact your judgements, and you need to update your judgements to improve the possibilies of raining.

Jeffrey illustrates a nice example, suppose you get up at night and observe, by the dim light in through the window, a jellybean sitting on a table. Jellybeans that might possibly be there are red, pink, brown or black. The light is not good enough to shift your probabilities . This is a case of uncertain observation, there is no proposition available to sum up the content of your observation. We might try saying that as much as we observe is a proposition itself, but that is not a proposition based on which you can define a reasonable probability space.

So, we need to find some other supporting evidence that can serve as the propositions on formalizing the judgemental probabilities, suppose the flavors of the jellybeans perhaps !! Like a red one might be cherry or cinnamon. A brown one may be chocolate or coffee. There are all sorts. Now can you think of using these conditions wisely?! Think it over, until we meet again.

To bring coherence to bear, assume that we have coherent rule for updating certain evidence. By the argument given it must be a rule of conditioning on the evidence, but what is the chance of observing a particular evidence, in the world of uncertain happening? Its paradoxical isn't it !!

Wait for Bayes, he will come to rescue you from here !!

till then Stay Safe and keep Thinking.

References

  1. 1. Truth and Probability - Frank P Ramsey
  2. 2. Ten Great Ideas About Chance- Skyrms, Diaconis
  3. 3. Probability Kinematics - Richard Jeffrey

Physics of Coin Tossing and Uncertainty | Cheenta Probability Series

This is our 4th post in the Cheenta Probability Series that deals with the physics involved in coin tossing. It reveals the true nature of uncertainty.

"It is a very tedious task !! First you have to calculate where he is and where is is not, then you must calculate where he could possibly be, then you must seek where he is at this moment, then finally you have to calculate the probability that what is the chance of finding him when you reach, where you are suspecting him to be right now . "

- Sukumar Ray in his Novel, " Utter Nonsense"( Ho-Jo-Bo-Ro-Lo)

Uncertainty is something that has drawn attention of mathematicians ad philosophers right from the beginning of the modern civilization. There has been many school of thoughts about the true nature of uncertainty, which also changed and is keeping changing as the mathematics is getting sophisticated gradually. The perspective of uncertainty that was(is) very much appealing to me originated from the perspective of the great probabilist Bruno de Finetti is "Uncertainty is actually the quantification of our ignorance of the lack of information ". This straightaway made the idea of probability subjective, which should be a topic for discussion for another day. But one of the living legend and an ambassador of this school of thought Persi Diaconis is man who is obsessed with the way the most fundamental places where the uncertainty exists, goes further finding a physical solution to the problem of quantifying uncertainty, which our present discussion is all about.

In statistical experiments lack of knowledge do motivate statisticians to develop more sophisticated laws of predictions, and statisticians gain only that much knowledge that they can afford given a cost. So, lack of knowledge is what induces the desire of prediction in extension handling uncertainty. They basically do a mapping of knowledge gain with cost, though cost-gain of knowledge relationship is a different topic altogether and not really a core probability stuff, but from this we end up with two most important philosophical questions

  • What is uncertainty ?
  • Can we actually gather all the information if we have a luxury of infinite investment ?

You can see clearly that the both questions are correlated and assuming on gives you the justification for another, and I choose to discuss on the first question as that is also a more general one.

Is Coin flipping is Random

There are quite a number of fundamental models that engage probabilists and mathematicians in their quest for the actual nature of uncertainty, coin tossing being the most basic and fundamental of them, made the great mathematician and probabilist Diaconis obsessed and he dived in the search of the true nature of uncertainty. Inspite of the simplicity of the experiment, and the number of unknowns are many and they accumulate to an ultimate uncertain system, which are subject to physical changes that one can't possibly have the luxury to be aware of. But Diaconis being an obsessed man, follows the thoughts of de Fenetti and questions, "Is coin tossing is truly random, or its just our ignorance over the physical parameters ?" .

what if we knew about the magnitude of the force that our thumb imparted on the coin? At how many spins per second the coin spun while going up or coming down ? At what velocity the coin went up ? How high did the coin went before it succumb on the ground ? What about the collisions with the air ?

If we if can answer these questions, then Physics allows us to completely determine the outcome of coin flipping, so coin tossing is not at all random, its Physics !!

Persi Diaconis

Diaconis is so obsessed by the enigma of the uncertainty, he claimed that coin tossing involves the law of mechanics( ignoring the effect of air molecules). To demonstrate this, he had had the physics department build him a coin-tossing machine. The coin starts out on a spring , the spring released, the coin spins upward and lands in a cup.(as shown in the figure). Because the force imparted are controlled he claims that the coin always lands with the same side up. Magicians and crooked gamblers (including Persi Diaconis himself) also possess such abilities. I further suggest the interested readers to read articles and watch videos of Diaconis on coin tossing and game of cards, you will be amazed. Those are the reason, I feel he is among the very few who worked extensively and mathematically over the philosophical questions about uncertainty.

Coin-Tossing Machine
Keller's approach towards explaining the Dynamics of the coin tossing.

The careful study of flipped coin started by Keller (in 1986). he assumed that a coin flips about an axis in its plane with spin about this axis at a rate \(w\) revolutions per second. If the initial velocity in the up direction \(\vec{K}\) is \(v_t\) , after \(t\) seconds a coin is flipped from a height \(z_o\) will be at height \(z_o+tv_t-\frac{g}{2}t^2\). If the coin drops at a surface the time elapsed \(t*\) satisfies \(t*v_t-\frac{gt*^2}{2}=0\) or \( t*=\frac{2v_t}{g}\) (simple equations of motion). This coin will revolve \(\frac{2wv_t}{g}\) times. If this is between \(2j\) and \(2j+1\) the initial side will turn up. If it is between \(2j+1\) and \(2j+2\) the opposite side turns up.

Hyperbolas as defined by the various initial values of \(w\).

The figure shows a space \((w,t)\) in the regions where the coin coes up as the same side or the opposite side. The edges of the hyperbola \(\frac{2wv}{g}=j\). Visually the regions get closer together, implying that small changes in initial conditions make for the difference between heads and tails.

How then is the probabilistic treatment of coin flips so widespread and so successful ? the basic answer was given by Poincare. If a coin is flipped vigorously, with sufficient vertical and angular velocity, there is a sensitive dependence on initial conditions. Then a little uncertainty as to initial conditions is amplified to a large uncertainty about the outcome, where equiprobability of outcomes are not at all a bad assumption.

In 1992, Engel carried on the work based on Keller and fitted a probabilistic model on the velocity and spins per second and even he too ended up concluding that the chance of getting head is nearly half and inspite of accounting for the velocity and spins it seemed difficult to argue on the fact that almost with equal chances one can obtain head or tail.

Engel proposed a theorem that considered two probability distribution on velocity \(v\) and spin per second \(w\) and gave a bound on the chances of getting a head and found that the chance revolves around \(\frac{1}{2}\).

So, even in a classical mechanical set up, the entire uncertainty can't be wiped out completely.


Does Fair Tosses only depends on Fair Coins ?

According to Persi Diaconis's observation , coin tossing outcomes are heavily dependent on the way it is tossed up. The exact determination of the bias depends in a delicate way on the shape of the coin's edges also, which are used by magicians to trick naked human eye. (We will see this fact as a consequence in a problem, which we will solve later). He even goes further to suggest that the coin tossing gives a lot more unbiased result when tossed say 100 times, but the same penny when spun showed reasonable bias towards tails. We provide the result in a histogram comparing the outcomes of coin tossing and coin spinning. Diaconis further extends his search for uncertainty in coin flipping explaining more about physical chances, which interested readers can make their way further to those.

Coin spun on their edges Tossed Coins

Explaining and discussing the approaches and ideas that are associated in understanding the physics behind the coin flipping, lets shift ourselves to some beautiful problems which involves and explains the physical geometry and the classical mechanical fundamentals which influences the outcome of the toss.


Wishful Coin Tossing

Suppose you are tossing a coin, you know nothing about its fairness, we assume that the thickness of the coin is negligible, the coin is tossed, what is the chance of getting a head ?

Now, here we really know next to nothing about the initial conditions, which we saw impacts the outcome heavily. So here we will deal with this problem geometrically, but don't forget the fact that classical mechanics is basically originated from this kind of mathematics. Hence, lets assume that after the coin strikes the surface the vector of the normal \(\vec{N}\) applied on the heads side of the coin generates a cone ( follow the figure). The axis of the cone makes an angle \(\theta\) \((-\frac{\pi}{2} \le \theta \le \frac{\pi}{2})\) with the horizontal plane and \(\alpha\) is the angle between the generatrix of the cone and its axis, \((0 \le \alpha \le \frac{\pi}{2})\).

This figure demontrates the dynamics of the coin according to the stated problem, here the shaded sector, will be where the coin will flip over, and the darker region will be below the horizontal plane when, \(|\theta| \le \alpha \).

When the coin hits the ground and starts spinning before settling finally, the normal vector spins randomly over the circumference of the base of the (imaginary) cone. Now imagine that coin is in any arbitrary position (as given in the figure).

Now observe that it is quite obvious from the figure that if \(\theta < \alpha \), then the coin will surely not flip over .i.e. heads will come up.

Similarly if \( \theta < - \alpha \) then the coin will surely flip and heads will never come up.

But when \(|\theta| \le \alpha \), then we have to observe the rotation of the normal vector \(\vec{N}\) more closely, and have to locate the regions on the circumference where if the normal vector stops will result in the flip over of the coin. Further observe that the coin will flip over whenever the normal vector will intersect(or penetrate ) the horizontal plane. It is hard to imagine, the figure and ones own imagine is what I can offer for clarification. Observe that part of the base of the cone that will be immersed within the horizontal plane will be part of the circumference that will flip over the coin.

So, observing the figure, if say \(O'P=r'\) and radius of the base of the cone is \(r\) and the height of the cone is \(h\). Using further simple properties of the circle, we need to find the circumference of the shaded sector as in that part of the circle, the coin will flip over, following the argument we established above. Then,

\(tan{\theta} = \frac{r'}{h} \) and \(tan {\alpha}= \frac{r}{h}\)

Let the angle of the sector be \(\phi\) (angle at \(O'\)).

So, \( Cos{\frac{\phi}{2}}=\frac{r'}{r}=\frac{h tan{\theta}}{h tan{\alpha}} \Rightarrow \phi=2cos^{-1}{(\frac{tan {\theta}}{tan {\alpha}})}\)

So, \(P( coin \ flips \ over)=\frac{r \phi}{2 \pi r}=\frac{1}{\pi}cos^{-1}{(\frac{tan {\theta}}{tan {\alpha}})}\)

Hence, \(P(Heads)= 1-\frac{1}{\pi}cos^{-1}{(\frac{tan {\theta}}{tan {\alpha}})}\).

So, we can also say that the toss will be fair when \(\theta =0\) .i.e. \(P(Head)=\frac{1}{2}\), which we can conclude also from the figure that if the axis of the cone becomes parallel to the horizontal plane there is a equal chance of falling on either side and expose any of the faces.

If you can toss a coin such that it first lands on its edges when it hits the ground (surface), then even with a biased coin you can perform a fair toss !!

So, as Diaconis claimed that edges play an important role in determining the outcome of a toss, this problem showed the same. Actually when you see magicians tossing a coin and claiming tat heads will land (as Diaconis does this himself) and truly ends up with a head, then you can be sure about the fact that he/she always using the "\(\theta > \alpha \)" case. Though in this problem I used lesser parameters ( as compared to standard mechanical models), still I claim that I ended with a quite satisfactory chance of getting, since the geometry was specific and subtle enough to cover the all the possibilities after the coin hits the surface.

Can you control the uncertainty on the edges of the coin like the magicians ? Try it !!

Edgy Tossing

Suppose I ask you about the chance of a coin landing on one of its faces, you may think its how foolish of me to ask such a question, and smartly gave the reply that " landing on of its faces is a sure event man !!", and then I further argue with strong mathematical logic that we can always assign some chances on the event that the coin may sometimes land on its edges !! Edgy right !

Imagine a thick coin, (for psychological convenient), now we can actually assume the coin as a cylinder right ! a flatter one perhaps. Say the coin has radius \(r\) and thickness \(h\), eventhough the mass is accountable we ignore the mass for simplicity. Now you rolled the coin (cylinder), what is the probability you think that the cylinder (coin) lands on its lateral surface (edge) ?

Try this problem yourself !! Use the assumption that Keller used, that is the normal vector of the coin will rotate in a spherical space, and you have to use the idea of Solid angle of the geometric shapes you wish to use in analyzing the phenomenon. If you don't know what is a solid angle, then give a breif read on the topic and come back to the problem !! Good luck !

I tried the problem with a standard coin of diameter (\(2r\)) 19.05 mm and thickness (\(h\)) is 1.52 mm . I ended up with the fact that there exists about 3% chance of the coin to land on its edges !! Hence landing on edges are not at all impossible, on the contrary it is edgy enough !!

The Coin finally becomes Schordinger's Cat !

So, till now we have been talking about how Classical Physics try to explain what are disguised as uncertainties, but at the end of every instances we failed to free ourselves from chances. Though we must admits that Laws of Mechanics explain some part of the uncertainty, leaving us to propose a "model of uncertainty" for better understanding of our readers, the model is as follows ,

\( Uncertainty = "Lack \ of \ Classical \ Mechanical \ information " + "Error' \).

Now before concluding, we will just discuss and try to argue from where this error coming and what is this error actually represents.

According to the latest researches, of two physicists Andreas Albrecht and Daniel Philips of University Of California, they argue that probabilities we use in our daily life and science do not "quantify our ignorance" but instead reflect the inherent random nature of the physical world as described by Quantum Mechanics.

They claim that the reason we fail to determine the coin toss outcome even after accounting for most of the classical parameters is because we cant anticipate the collisions of the air particles due to Brownian motion over the surface of the coin which in turn leaving its invisible impact on the instantaneous velocity and spins of the coin, Remember Heisenberg's Uncertainty Principle !!

They infact claim that anyone who is tossing a coin is actually performing Schordinger's Cat Experiment. But rather than a cat that is both alive and dead the quantum object in this case is a coin whose final state is here Heads or Tails (or Edge !!). hence outcome of the flip generally remains genuinely open until the upward face of the coin is looked at which the system takes a definite value of either Heads or a tails.

So, basically what uncertainty stands to be is that "Uncertainty is the manifestations of Quantum Chanciness ." Hence our intuitive model of uncertainty modifies as follows,

\( Uncertainty = "Lack \ of \ Classical \ Mechanical \ information " + "Quantum \ phenomenon" \).

This is also the instant where classical probabilities branches up to their respective directions. I hope our interested readers will find themselves hanging on any of these branches. Will you toss a coin to choose a branch ?? What do you think !!


Food For Thought

If you have solved the problem left as an thought exercise in "Edgy Tossing " , then take some rest and think again !!

Suppose the chance of the coin landing on its edge is \(\frac{1}{3}\), How thick you think the coin should be !! Do you Have any idea !!

You can share them with us perhaps !!

Share your thoughts with us, and stay tuned as the next week is the "Geometric Probability week" , and we hope to make most of it !! Be ready to be perplexed !

References

  1. Dynamical Bias in the Coin Toss - Persi Diaconis , Susan holmes and Richard Montgomery
  2. Ten Great ideas About Chances - Persi Diaconis & Brian Skyrms
  3. The Quantum Coin Toss- Edwin Cartlidge, PhysicsWorld
  4. Problems in The Theory of Probability - Sevastyanov, Chistyakov & Zubkov
  5. Fifty Challenging Problems in Probability - Frederick Mosteller.
  6. Special Thanks to my friends Avishek Dutta and Soham Ghosh ( also the co-writer of this blog), for some discussions which turned out to be productive while writing this article.

ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube


An Unexpected Correspondence and some Unfinished Games | Cheenta Probability Series

Human revolutionized and extended her/is restrictions on perception to natural phenomenon, when s/he started thinking about chances. We already know what crucial roles chances play when we cross the road on a busy traffic or while playing a game of 29 (card game), you show a card expecting your opponent doesn't show the joker (card with highest points) of the same colour. Hold on ! did I say "expecting" ?

So how about discussing a problem on how we actually quantify what we are actually expecting when uncertainty is playing her mighty tricks ? Let's present two very fundamental problems on mathematical expectation which are some of the most early documented problems on mathematical expectations or probabilistic means, encountered by two most significant mathematicians of that century. After little bit of background stories we will finally converge into the problems and present the intuitions and ideas the two Greats developed to solve those problems.

Letters conveying Thoughts

It was in 1654, after 78 years after Cardano's (the man who first developed, the mathematics of probability) death and 9 years before Cardano's works on The Game of Chances got published posthumously, correspondence of letters started between two mathematician exchanging ideas on how to solve some gambling problems, and as it happens they both ended up setting essential rules in the subject.

Hence these letters exhibits the first substantial works in mathematics of Probability. These two greats showed how seemingly complex problems can be reduced to straight-forward calculations when required subtlety has not escaped the eyes of the solver.

Blaise Pascal

The main aspects they focused on was fairness and expectation or probabilistic means.

These two were namely Blaise Pascal and Pierre de Fermat, who formally defined the present form of mathematical expectations while building the idea of fairness in an uncertain game. They mainly exchanged their thoughts on two problems, which was encountered by Pascal, when he was asked by one of his Parisian gambler friend Gamboud , the Chevalier de Mere. Fermat on the other hand mainly known for his interests and works in Number Theory, was getting inclined towards the idea of tackling uncertainties, and hence replied the letters immediately, which he was receiving from Pascal, explaining the problems and possible approaches.

Well before, putting the problems, just as the justification of the title I gave for this article, this correspondence which started seemingly voluntarily was very unexpected indeed and also cause of chance, as in his rest of his life (as a problem solver) Fermat never collaborated neither published any of his works. Once Pascal pressed him to publish some of his works and Fermat replied : " Whatever of my work is judged worthy of publication , I do not want my name to appear there " , such a secretive man was Fermat. Though ironically and very rightfully, there exists many theorem under the name "Fermat" in the present day mathematics. So, we can claim that this collaboration, doesn't matter how successful it turned out to be, it remains one of the unexpected and unlikely collaboration in the history of evolution of mathematics, and the man who induced this unlikely event was Father Mersenne , after who's name we have the Mersenne Primes.

Peirre de Fermat

So, What are the Problems ?

There are basically two problems, as they follows

Problem of dice - A player will roll a dice 8 times, and he needs to throw a 6 within this 8 throws, now 3 throws have been made and none of them resulted in a 6 and a stake is being settled. So, what proportion of the stake would be fair to give to the player to forego his fourth throw (only fourth ) ?

And the second problem is a bit of complex and funny in respective sense,

Problem of points- Gamboud and another player were involved in a game of dice rolling and the rule of the game was, first one to reach a certain number of points wins and collects the whole stake. Now they started playing and after certain number of rounds, as it happens the gamblers got themselves into an argument and ended up being in front of Pascal, demanding a fair division of the stake as they are not interested in completing the game and put a stop on their argument.

So, basically Pascal needed to calculate the chances of wining the game of each payer if they had continued the game and divide the stake according to the probabilities of winning of individual players assuming that both players has same probability of winning each point.


Pascal reaches out to Fermat

Observe that both the question deals with a common requirement of fair judgement. But Probabilities in those days were mainly based on intuition and experience of gamblers ( remember Cardano, himself was a gambler, and idea of probability to him was just a survival strategy). Pascal who had no experience of gambling reached out to Fermat to share his ideas and they ended up employing the concept of expectation to answer the question of fairness.

The expected value of a gamble that pays off \(S(x)\) (say), in outcome \(x\) is the probabilistic weighted average or probabilistic mean, defined as

\(expectation (S)=S(x_1)p(x_1)+S(x_2)p(x_2)+........ \), where \( p(x_1), p(x_2),.....\) are the individual probabilities of the outcomes \(x_1,x_2,.....\) respectively.

They both argued that "A transaction that leaves the players expected value unchanged is assumed to to be fair." For example consider flipping a fair coin. If it comes up Head you win 1, and if tails turn up you loose 1. Then the expected value is \( (+1)\frac{1}{2} +(-1)\frac{1}{2} =0\) .

Fermat rectifies Pascal

In the letters exchanged between the two mathematicians it is found that Pascal first made an error that, he claimed that throws that the player lost, earned him \(\frac{1}{6}\) th of the remaining stake, which makes by his theory that if the player should earn, \(\frac{125}{1296}\) of the total stake if he agrees to forego his fourth throw !! But Fermat explains that he is mistaken and provides the solution himself.

Fermat writes,

" .....you proposed in the last example in your letter (I quote your very terms)b that if I undertake to find the six in eight throws and I have thrown three times without getting it and if my opponent proposes that I should not play the fourth time, and if he wishes me to be justly treated, it is proper that I have \(\frac{125}{1296}\) of the entire sum of our wages.

This is however not true by my theory. For in this case the three first throws having nothing for the player who holds the die, the total sum thus remaining at stake , he who holds the die and who agrees to not play his fourth throw should take \(\frac{1}{6}\) as his reward. And if he has played fourth throw without finding the desired point and if they agree that he shall not play the fifth time , he will nevertheless have \(\frac{1}{6}\) of the total for his share. Since the whole sum stays in the play it not only follows the theory, but indeed common sense that each throw should be of equal value. "

So, basically Fermat, claimed that in order to conduct a fair game, the expected value of the game must not change depending on the fact whether a player is foregoing a round taking a proportion of the stake or continue to play the game.

Let us explain Fermat's argument, more elaborately and mathematically, considering two cases, as

Case-1- Suppose the stake \(s\) is settled after 3 throws ( 6 came in none of those), and our friend is left with 5 throws and he plays the 4th round, then he is expected to win the stake \(s\) with probability \(\frac{1}{6}\) (probability of getting a six ) or loosing the 4th throw with probability \(\frac{5}{6}\) ( probability of getting anything but 6) but winning the stake \(s\) in one of the 4 remaining rounds thereafter with probability \((1-(\frac{5}{6})^4)\) (probability of getting at least 1 six in any of the 4 throws). So, mathematically the expectation becomes,

\(\frac{1}{6}s+ \frac{5}{6}(1-(\frac{5}{6})^4)s \) .

Case-2- Suppose the player foregoes his fourth throw for \(\frac{1}{6}\) of the stakes, then according to Fermat's suggestions in the letters, his expectation is,

\(\frac{1}{6}\) of the stakes \(s\) due to his foregoing of the 4th round with rest of the remaining \(\frac{5}{6}\) of \(s\), with probability \((1-(\frac{5}{6})^4)\) (probability of winning in the remaining 4 rounds). Hence, while foregoing his fourth throw the player expects,

\( \frac{1}{6}s +(1-(\frac{5}{6})^4)\frac{5}{6}s \).

so, clearly, in both cases the expectation of the player remained same as stated by Fermat, so \(\frac{1}{6}\) of the stake is the fair price for foregoing the fourth throw. Using Fermat's argument one can easily generalize this solution, hence I leave that task upon the readers.

Fermat gets the key once again

The second problem " Problem of points " is also an expected-value problem, which roots back in 1494, when Fra Luca Pacioli considered a problem where the play is complete with 6 points, one player has won 5 points and the other 3. Pacioli argued that the fair division would be according to the proportion to the rounds already won 5 to 3 . About 50 years later Tartaglia objected that according to this objected that according to this rule if the game were stopped after 1 game, then the person winning that only game would take the whole stake !! After objecting when he himself tried the problem he ended up being puzzled by the perplexity which later also baffled the likes of Cardano and the Chevelier de Mere.

But Fermat immediately got the key once again and stated more generally that if the Player-1 needs \(p\) points more and the Player-2 needs \(q\) points more to win the game, then the result of the game can be determined by playing at most \( p+q-1\) games . Then he used a very important step which to this day remains a very efficient techniques to the present day problem solvers. What he did is just mapped each rounds of the \(p+q-1\) games to be payed with the event of tossing a fair coin, \(p+q-1\) times where each toss representing each round of the game. This is called bijective mapping in mathematics, where you just look outcomes of a phenomenon which has exactly identical outcomes of the event we are interested in. Here he basically mapped the win and loss to the two faces of the fair coin and since we assumed that each player has equal probability of winning each round, so its just like win( or head) with probability \(\frac{1}{2}\) and loose( or tails ) with probability \(\frac{1}{2}\) . Hence the problem reduced to a set of events which have equiprobable outcomes, and thus is the subtlety which I was talking while explaining the nature and trick to the problem.

So, if we consider Pacioli's problem, where Player-1 has 5 points and player-2 has 3, since 6 concludes the game, 3 (1+3-1) more games will suffice. Now if we think of tossing 3 fair coins, then we will have 8 (\(2^3\)) equiprobable outcomes, but Player-2 who needs 3 points must win all three of the games, that is he/she must have (win, win, win) (or, (heads, heads heads)) outcome, and chance of getting all 3 wins (heads) is \(\frac{1}{8}\), which makes the chance of Player-1's win \(\frac{7}{8}\) (or, \((1-\frac{1}{8})\)). And hence, Player-1's expectation on the stake becomes \(\frac{7}{8}\) of the stakes.

Here, Fermat just magnified his views over the outcomes of each round and since the outcomes are symmetric, he can extended his observation over a set of games played. in modern Probability theory , we use this method which we sometimes call the fundamental bridge, which breaks an random variable (here points scored by each player after a set of games have been played), into some finite number of indicator random variables (here, 1 point scored if one wins or 0 if looses in each game). This technique helps tremendously in finding complex expectations, even when you don't know the distribution of the random variable. Hence this work of Fermat initiated the first step towards erecting the pillars of the fundamental bridge in probability theory.

Pascal Generalizes with Pattern

Imagine Fra Pacioli's problem in a modified set up, where in the game where win comes at 6 points , Player-1 has no points and the other has only 1 point .So by Fermat's argument, the players needs to play at most 10 (6+5-1) matches . Now here using the fundamental bridge and tossing a fair coin 10 times leaves us with 1024 (\(2^{10}\)) equally likely outcomes, which is no more possible to count like the the previous one. Obviously, we don't have all day!! But now Pascal had an idea.

To count the cases in which one of the players must win at least 6 rounds, she can win exactly any 6 out of 10 trials in [ 10 choose 6] \( 10 \choose 6 \) ways, again exactly 7 win out of 10 trials in [10 choose 7] \( 10 \choose 7\) ways,..... this way...., 10 wins out of 10 trials in [10 choose 10] \( 10 \choose 10 \) ways. And these numbers can be found to be lying on the 10t row of the Pascal's triangle. The particular row tell us number of ways we can choose "wins" from a group of 10 "outcomes"( of "wins" and "loss" only).

Pascal's Triangle- The temple of Patterns
(observe the row-10 for the given problem)

In this particular problem, Pascal needed the number of ways getting 6 wins in 10 trials+7 wins in 10 trials +.....+ 10 wins in 10 trials, which from Pascals Triangle found to be , 210+120+45+10+1=386.

So, fos a probability of winning of \(\frac{386}{1296}\) (which is about 38% , from here calculating the fair division is already explained earlier.

These solutions and generalizations by Pascal and Fermat opened the gate of handling equiprobable cases using combinatorial principles and provided insights on the fairness of a game and measuring of mathematical expectation, which later turned out to be most important and fundamental measure besides Probability itself in theory of Statistics and Probability.

Pascal gives the finishing touch

Before concluding I just want to put light on another approach which Pascal provided to solve a modified version of the "Problem of points" , without discussing about this beautiful approach, I think this article on "Probabilistic means" will remain, incomplete.

Pascal proposed a problem and its possible approach to Fermat in one of his letter as,

" Let us suppose the first of them has two (points) and the other one. They now play on throw of which the chances are such that if the first wins, he will win the entire wager that is at stake, that is to say 64 pistoles. If the other wins they will be two to two and in consequence, if they wish to separate it follow that each will take back his wage that is to say 32 pistoles.

Consider then, Monsieur that if the first wins, 64 will belong to him. If he looses 32 belong to him. then if they do not wish to play this point and separate without doing it, the first would say, "I am sure of 32 pistoles, for even a loss gives them to me. As for the 32 other , perhaps I will have them and perhaps and perhaps you will have them the risk is equal. Therefore let us divide the 32 pistoles in half (16 each) and give me 32 of which I'm certain besides." He will then have 48 (32+16) pistoles and the other will have 16."

Pascal basically used the situation of draw clearly to create a definition of fairness which is very difficult to argue over. He finely differentiated what part of the stakes surely belongs to the leading player, and what part of the stake still requires to be partitioned over uncertainties.

Pascal further extends his idea to solve more general problem, when say one player is leading by 2 points (or p points more generally) and they start playing for a point where if the one leading wins , takes the lot. Can you find the fair share if they doesn't wish to play the rounds further ? What do you think the fair share would be if one leading looses the match and now just leading by a point ?? Are you thinking of using recursion of the argument Pascal provided for the case of "2 to 1 points" ? Well thats what Pascal himself did, try completing the rest yourself !!

Pascal's method of recursion , was the final touch of artistry to the solutions to the problems and the definition of fairness and its relation to the probabilistic means. This method of calculation makes the solution more general and it lives till today to solve problems on conditional expectation. And since recursion involves much insights over recognizing the patterns inherent in a certain problem associated with seemingly random outcomes, Pascal's procedure of calculating probabilistic mean still now attracts bunch of problem solvers and probability fanatics who are inclined to the mathematical aspects of quantifying chances. This method helps to solve problems involving in games, patterns in outcomes of a sequence of coin tosses, which has extensive application in many fields of Physics and Genetics. Coin tossing is itself a topic which demands discussion on its own right, lets keep it for another day.


Food For Thought

I often find thinking over mistakes and disputes in arguments often help us getting more insights over the subtle truth which is puzzling enough. So I suggest,

Trace back to the section " Fermat rectifies Pascal", can you explain where Pascal went wrong in his argument, which I left unexplained for you to figure out !!

Share your thoughts with us, and stay tuned to go out for some "Random Walks" with Soham Ghosh in the next post. Be ready to be perplexed !

References

  1. Ten Great Ideas About Chance - Persi Diaconis & Brian Skyrms
  2. Pascal and Fermat on Probability - Vera Sanford.
  3. Do dice play God - Ian Stewart.
  4. Fermat's Last Theorem - Simon Singh.

Similar Problems and Solutions



ISI MStat PSB 2008 Problem 10
Outstanding Statistics Program with Applications

Outstanding Statistics Program with Applications

Subscribe to Cheenta at Youtube