Bayes' in-sanity || Cheenta Probability Series

One of the most controversial approaches to statistics, this post mainly deals with the fundamental objections to Bayesian methods and Bayesian school of thinking. Turning to the Bayesian crank, Fisher put forward a vehement objection towards Bayesian Inference, describing it as "fallacious rubbish".

However, ironically enough, it’s interesting to note that Fisher’s greatest statistical failure, fiducialism, was essentially an attempt to “enjoy the Bayesian omelette without breaking any Bayesian eggs" !

Ronald Fisher - Objections to Bayesian theory — Ronald Fisher

Inductive Logic

An inductive logic is a logic of evidential support. In a deductive logic, the premises of a valid deductive argument logically entail the conclusion, where logical entailment means that every logically possible state of affairs that makes the premises true must make the conclusion truth as well. Thus, the premises of a valid deductive argument provide total support for the conclusion. An inductive logic extends this idea to weaker arguments. In a good inductive argument, the truth of the premises provides some degree of support for the truth of the conclusion, where this degree-of-support might be measured via some numerical scale.

If a logic of good inductive arguments is to be of any real value, the measure of support it articulates should be up to the task. Presumably, the logic should at least satisfy the following condition:

Criterion of Adequacy (CoA):
The logic should make it likely (as a matter of logic) that as evidence accumulates, the total body of true evidence claims will eventually come to indicate, via the logic’s measure of support, that false hypotheses are probably false and that true hypotheses are probably true.

One practical example of an easy inductive inference is the following:

" Every bird in a random sample of 3200 birds is black. This strongly supports the following conclusion: All birds are black. "

This kind of argument is often called an induction by enumeration. It is closely related to the technique of statistical estimation.

Critique of Inductive Logic

Non-trivial calculi of inductive inference are shown to be incomplete. That is, it is impossible for a calculus of inductive inference to capture all inductive truths in some domain, no matter how large, without resorting to inductive content drawn from outside that domain. Hence inductive inference cannot be characterized merely as inference that conforms with some specified calculus.
A probabilistic logic of induction is unable to separate cleanly neutral support from disfavoring evidence (or ignorance from disbelief). Thus, the use of probabilistic representations may introduce spurious results stemming from its expressive inadequacy. That such spurious results arise in the Bayesian "doomsday argument" is shown by a re-analysis that employs fragments of inductive logic able to represent evidential neutrality. Further, the improper introduction of inductive probabilities is illustrated with the "self-sampling assumption."

Objections to Bayesian Statistics

While Bayesian analysis has enjoyed notable success with many particular problems of inductive inference, it is not the one true and universal logic of induction. Some of the reasons arise at the global level through the existence of competing systems of inductive logic. Others emerge through an examination of the individual assumptions that, when combined, form the Bayesian system: that there is a real valued magnitude that expresses evidential support, that it is additive and that its treatment of logical conjunction is such that Bayes' theorem ensues.

The fundamental objections to Bayesian methods are twofold: on one hand, Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience. The second objection to Bayes' comes from the opposite direction and addresses the subjective strand of Bayesian inference.

Andrew Gelman , a staunch Bayesian pens down an interesting criticism of the Bayesian ideology in the voice of a hypothetical anti-Bayesian statistician.

Here is the list of objections from a hypothetical or paradigmatic non-Bayesian ; and I quote:

"Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications. Subjective prior distributions don’t transfer well from person to person, and there’s no good objective principle for choosing a non-informative prior (even if that concept were mathematically defined, which it’s not). Where do prior distributions
come from, anyway? I don’t trust them and I see no reason to recommend that other people do, just so that I can have the warm feeling of philosophical coherence. To put it another way, why should I believe your subjective prior? If I really believed it, then I could just feed you some data and ask you for your subjective posterior. That would save me a lot of effort!"

In 1986 , a statistician as prominent as Brad Efron restates these concerns mathematically:

"I like unbiased estimates and I like confidence intervals that really have their advertised confidence coverage. I know that these aren’t always going to be possible, but I think the right way forward is to get as close to these goals as possible and to develop robust methods that work with minimal assumptions. The Bayesian approach—to give up even trying to approximate unbiasedness and to instead rely on stronger and stronger assumptions—that seems like the wrong way to go. When the priors I see in practice are typically just convenient conjugate forms. What a coincidence that, of all the infinite variety of priors that could be chosen, it always seems to be the normal, gamma, beta, etc., that turn out to be the right choices?"

Well that really sums up every frequentist's rant about Bayes' 😀 !

And the torrent of complaints never ceases....

Some frequentists believe that in the old days, Bayesian methods at least had the virtue of being mathematically
clean. Nowadays, they all seem to be computed using Markov chain Monte Carlo, which means that, not only can you not realistically evaluate the statistical properties of the method, you can’t even be sure it’s converged, just adding one more item to the list of unverifiable (and unverified) assumptions in Bayesian belief.

As the applied statistician Andrew Ehrenberg wrote :

" Bayesianism assumes:

(a) Either a weak or uniform prior, in which case why bother?,

(b) Or a strong prior, in which case why collect new data?,

(c) Or more realistically, something in between,in which case Bayesianism always seems to duck the issue."

Many are skeptical about the new found empirical approach of Bayesians which always seems to rely on the assumption of "exchangeability", which is almost impossible to obtain in practical scenarios.

Finally Peace!!!

No doubt, some of these are strong arguments worthy enough to be taken seriously.

There is an extensive literature, which sometimes seems to overwhelm that of Bayesian inference itself, on
the advantages and disadvantages of Bayesian approaches. Bayesians’ contributions to this discussion have included defense (explaining how our methods reduce to classical methods as special cases, so that we can be as inoffensive as anybody if needed).

Obviously, Bayesian methods have filled many loopholes in classical statistical theory.

And always remember that you are subjected to mass-criticism only when you have done something truly remarkable walking against the tide of popular opinion.

Hence : "All Hail the iconoclasts of Statistical Theory:the Bayesians"

N.B. The above quote is mine XD

Wait for our next dose of Bayesian glorification!

Till then ,

Stay safe and cheers!

References

1."Critique of Bayesianism"- John D Norton

2."Bayesian Informal Logic and Fallacy" - Kevin Korb

3."Bayesian Analysis"- Gelman

4."Statistical Re-thinking"- Richard McElreath

Some Important Links:

Nonconglomerability and the Law of Total Probability || Cheenta Probability Series

This explores the unsung sector of probability : "Nonconglomerability" and its effects on conditional probability. This also emphasizes the idea of how important is the idea countable additivity or extending finite addivity to infinite sets.

**10 min read**

“I believe that we do not know anything for certain, but everything probably.”~ Christiaan Huygens

One week into conditional probability, it's time to get our hands dirty with the Law of Total Probability and paradoxes which have emerged out of it.Let's be formal enough to state the law first.

The Law of Total Probability

Adorably called LOTP , it is one of the cardinal results in Conditional Probability.

Suppose the events \(A_1,A_2,...,A_k \) are a partition (mutually exclusive and exhaustive) of the event space and let \(H\) be any arbitrary event of the event space then it states that \(P(H)=P(H|A_1)P(A_1)+P(H|A_2)P(A_2)+...+P(H|A_k)P(A_k) \)

Nonconglomerability and law of total probability — **k=6**

The day to day event I always relate to while recalling this law is that Suppose you have a thin glass block placed on a table and accidentally some water has been spilled on it. A part of this water has been trapped in between the surface of the table and the glass. If you look at this from above, you will see a puddle of water almost circular,trapped within the rectangular block. This puddle is actually our arbitrary event \(H\) and our block the event space. How can you get the partitions? Any wild guesses? Well, drop a hard stone on the glass and it cracks, or even if you have strong arms and like fantasizing about hurting your knuckles, you can do it too :P. The cracks partition the sample space into various segments and there is water trapped in each of them. There you go!

As we have stressed already, from a false proposition, or from a fallacious argument that leads to a false proposition - all propositions true and false, may be deduced.But this is just the danger;if fallacious reasoning always led to absurd conclusions,it would be found out at once and corrected.But once an easy , shortcut mode of reasoning has led to a few correct results, almost everybody accepts it; those who try to warn against it are generally not listened to.

When a fallacy reaches this stage, it takes on a life of its own and develops very effective defenses for self preservation in the face of all criticisms.Here is one such instance.

Nonconglomerability

If \( (C_1,C_2,...,C_n )\) denote a finite set of mutually exclusive, exhaustive propositions on prior information
I , then for any proposition \(A\), we have:

\( P(A|I) = \sum_{i=1}^{n} P(AC_i|I) = \sum_{i=1}^{n} P(A|C_i I)P(C_i|I) \)

As you all seen in the previous blog post, the prior probability \(P(A|I)\) is written as a weighted average of the conditional probabilities \( P(A|C_i I) \).

Now, it is an elementary result that the weighted mean of a set of real numbers cannot lie outside the range spanned by those numbers, i.e. if \( L \le P(A|C_i I) \le U \) ; then necessarily \( L \le P(A|I) \le U \).

De Finetti (1972) called this property as "conglomerability" of the partition \( \{C_i\}\).

Obviously, non-conglomerability cannot arise from a correct application of the rules of probability theory on finite
sets.It cannot, therefore occur in an infinite set which is approached as a well defined limit of a sequence of finite sets.

Yet nonconglomerability has become a minor industry, with a large and growing literature.There are writers who believe that it is a real phenomenon, and that they are proving theorems about the circumstances in which it occurs, which are important for the foundations of probability theory. Nonconglomerability has become, quite literally, institutionalized in our literature and taught as truth.

Let us examine some case where "nonconglomerability" has been claimed to be true.

Rectangular Array

This particular example by the famous trio Kadane,Schevish and Seidenfeld (1986).

We start from a 2 dimensional \( (M \times N) \) set of probabilities: \( p(i,j) , 1 \le i \le M ; 1\le j \le N \).The sample space is a rectangular array of \( MN \) points in the first quadrant. It will suffice to take some prior information \( I \) for which these probabilities are uniform : \(p(i,j) = (\frac{1}{MN} ) \). Let us define the event \(A : i<j \).

Therefore, \(P(A|I)\) can be found by direct counting and in fact it is given by :

\( P(A|I) =
\begin{cases}
\frac{(2N-M-1)}{2N}, & M \le N \\
\frac{(N-1)}{2M}, & N \le M \\
\end{cases}
\)

Now let us resolve this conditionally, using the partition \(\{C_1,C_2,...,C_M\}\).We have \(P(C_i |I)=\frac{1}{M} \).

So, we get \( P(A|C_i I) =
\begin{cases}
\frac{(N-i)}{N}, & 1 \le i \le M \le N \\
\frac{(N-i)}{N}, & 1 \le i \le N \le M \\ 0 & N \le i \le M \\
\end{cases}
\)

These conditional probabilities reach the upper and lower bounds \( U= \frac{(N-1)}{N}, \forall M,N \) and

\(L=1-R \) ; if \(M \le N\) and \(0\) otherwise, where \( R=\frac{M}{N} \).

Now, if we check the conglomerability criteria using these \(L,U\) , then it seems to work fine with no ambiguity. So, where can one possibly create a non-conglomerability out of this?

Just take \( M \rightarrow \infty, N \rightarrow \infty \) and look at the probabilities \( P(A|C_i I) \). We try to evaluate these probabilities directly on the infinite set.

Then it is argued that, for any given \(i\), there are an infinite number of points where \( A \) is true and only a finite number where it is false.Thus, the conditional probability \(P(A|C_i I) =1 \) for all \(i\); yet \(P(A|I) <1 \).

Now, consider the set of propositions \( \{D_1,D_2,...,D_N \} \) , where \( D_j\) is the statement that we are on the \(j^{th}\) row of the array,counting from the bottom.Now, by the same argument , for any given \( j \) , there are an infinite number of points where \(A\) is false, and only a finite number where \(A\) is true.

Then, the conditional probability \(P(A|D_j I)=0 ; \forall j\) , yet \(P(A|I) >0 \).By this reasoning, we have produced two nonconglomerabilities, in opposite directions, from the same model!!

But wait wait wait... aren't we missing something? I don't think this is a fallacy at all. Let's think of this elementary problem in analysis:

It was Gauss who pointed out that any given infinite series \(S=\sum_{i} a_i \) converges to any real number \(x\) as per your choice.

Suppose you define the partial sums \(s_n =a_1 + a_2 + ... +a_n \). Define \(s_0=0\).

Write \(a_n = (s_n - x) - (s_{n-1} -x ), so our series becomes:

\(S= (s_1 - x)+(s_2 -x )+ (s_3 -x)+...-(s_0 - x)-(s_1 -x )-(s_2-x)-... \) .

The terms \( (s_1-x), (s_2-x),....\) cancel out and BOOM !! your sum is \( S=-(s_0 -x)=x \).

Pause for a moment and taste the BLUNDER 😛

Hadn't a great man once said:

Apply the ordinary processes of arithmetic and analysis only to expressions with a finite
number n of terms. Then after the calculation is done, observe how the resulting finite
expressions behave as the parameter n increases indefinitely.

Yes, exactly! even stalwarts like Gauss, Weierstrauss,Abel and many accompanying them did not follow this advice meticulously and in many cases reached wrong conclusions. If you can understand the fallacy of this proof, you can pretty well admire the forgery in the rectangular array problem.

Once one has understood the fallacy in the analysis problem, then whenever someone claims to have proved some result by carrying out arithmetic or analytical operations directly on an infinite set, it is hard to shake off a feeling that he could have proved the opposite just as easily and by an equally sound argument, had he wished to. Thus there is no reason to be surprised by what we have just found.

Nonconglomerability on a rectangular array, far from being a phenomenon of probability theory, is only an artifact of failure to obey the rules of probability theory.

Bourbaki were the first to point out the fallacy

Borel-Kolmogorov Paradox

Another abuse of conglomerability , this has its name written down in the history books.

Suppose a random variable has a uniform distribution on a unit sphere. Now we choose a point randomly. We will do this in two ways:

1.Choose longititude \( \lambda \) uniformly from \([-\pi,\pi] \).

2. Choose latitude \( \phi \) from \( [-\frac{\pi}{2},\frac{\pi}{2}] \) with density \( \frac{1}{2} \cos \phi \)

The problem asks us to find the conditional distribution of \(X\) on a great circle.

Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results. First, note that choosing a point uniformly on the sphere is equivalent to choosing the longitude \( \lambda \) uniformly from \( [-\pi, \pi] \) and choosing the latitude \( \phi \) from \([-\frac{\pi}{2}, \frac{\pi}{2}] \).

For a line with longitude with \( \lambda =0 \) , \(f(\phi| \lambda=0) = \frac{1}{2} \cos \phi \) .

Whereas, for a line with latitude \( \phi=0 \), \( f(\lambda| \phi=0) = \frac{1}{2 \pi} \).

One is uniform on the circle , while the other is not! Yet both refer to the same great circle :O

Now, I am not going to geek the resolution of this paradox as it requires greater knowledge in probability theory which we will surely cover in future posts. But interested readers can go through E.T. Jaynes' explanation of the same.

We will be back with a few more posts in conditional probability and will try our best to enamor you with the various angles and spheres which are less trodden in probability.

Till then stay safe.

Have a great weekend!

References:

1. Conglomerability and finite Partitions - Alan Zame

2. The Extent of Non Conglomerability of Finitely Additive Probabilities- Kadane, Schervish, Seidenfeld

3.Probability Theory- the logic of science - E.T. Jaynes

Some Classical Problems And Paradoxes In Geometric Probability||Cheenta Probability Series

This is our 6th post in our ongoing probability series. In this post, we deliberate about the famous Bertrand's Paradox, Buffon's Needle Problem and Geometric Probability through barycentres.

**(10 min read)**

"Geometry is not true, it is advantageous." ~ Henri Poincare

Yes , exactly... it's time to merge the two stalwarts together : "Geometry" and "Probability".

The Probability Measure of Geometrical Elements

In probability theory one is usually concerned with random variables which are quantities, or sets of quantities, taking values in some set of possibilities on which there is defined a non-negative measure, satisfying certain required conditions which enable us to interpret it as a probability. In the theory of geometrical probabilities the random elements are not quantities but geometrical objects such as points, lines and rotations. Since the ascription of a measure to such elements is not quite an obvious procedure, a number of "paradoxes" can be produced by failure to distinguish the reference set. These are all based on a simple confusion of ideas but may be useful in illustrating the way in which geometric probabilities should be defined.

Bertrand's Paradox

We consider one paradox due to J.Bertrand (1907).

The problem of interest is precisely: Determine the probability that a random chord of a circle of unit radius has a length greater than the square root of 3, the side of an inscribed equilateral triangle.

Context of the problem

The development of the Theory of Probability has not been smooth at all. The first attempts to formalize the calculus of probability were due to Marquis De Laplace (1749-1827) who proposed to define the probability \( P(A) \) of an outcome A as the ratio of the number of events that result in the outcome A to the total number of possible events. This is of course only meaningful if the number of all possible events is finite and, in addition, all the events are equi-probable. The notion which Laplace has also defined. However, in our first blog post, we addressed the fact that the definition is, in a sense, circular - a notion of equi-probable is defined prior to the introduction of probable.

Thus, at the time, the field did not seem to have a sound foundation. Attempts to extend the definition to the case of infinite number of events led to even greater difficulties. The Bertrand's Paradox is one such discovery that made mathematicians wary of the whole notion of probability.

Apparently, this problem has more than one solution, meaning as the perspective of the reader changes, the solution also changes! Worthy of a paradox right?!

Some of the most discussed solutions

What about the probability is \( \frac{1}{3} \) ?

Yeah, this is correct! Provided your thought process follows the same lines as this proof :

Any chord of the circle intersects it in two points, and we may suppose these to be independently distributed in probability distributions which are uniform over the circumference of the circle. Without loss of generality, we can suppose one of the two points to be at a vertex of an inscribed equilateral triangle. There is then just \(\frac{1}{3} \) of the circumference in which the other point can lie in order that the resulting chord has length greater than \( \sqrt{3} \) so that the probability is \( \frac{1}{3} \).

Do you get what are the favourable areas?

What about \( \frac{1}{4} \) ?

Umm...sure! why not? Any chord is uniquely defined by the foot of a perpendicular on it from the centre. If this point is distributed uniformly over the circle the probability of it lying in any region of area \( A \) is \( A \pi^{-1} \) since
the total area of the circle is \(\pi\). For the chord to have length greater than \( \sqrt{3} \) the foot of the perpendicular must lie inside a circle of radius \( \frac{1}{2} \) and hence the probability is \( \frac{1}{4} \).

But but.. it is also \( \frac{1}{2} \) !?

Try to think of a proof why this probability is also \( \frac{1}{2} \) .

Based on constructing a random chord in a circle, the paradox involves a single mathematical problem with three reasonable but different solutions. It’s less a paradox and more a cautionary tale. It boils down to the same old question: "What do you mean by random?"

Let's hand it over to Buffon

The "Buffon needle problem" which many of us encountered in our college days has now been with us for 200 years. One major aspect of its appeal is that its solution has been tied to the value of \( \pi \) which can then be estimated by physical or computer simulation today. It is interesting that in the original development, Buffon (1777) extols geometry as a companion tool to the calculus in establishing a science of probability and suggests that chance is amenable to the methods of geometry as well as those of the calculus. Buffon indicates that the human mind, because of prior mathematics, preferred numbers to measures of area but that the invention of games revolving around area and ratios of areas could rectify this. To highlight this point he investigated a game already in practice in the 18th century
known as "clean tile".

Clean Tile Problem

In a room tiled or paved with equal tiles, of any shape, a coin is thrown upwards; one of the players bets that after its fall the coin will rest cleanly, i.e., on one tile only; the second bets that the coin will rest on two tiles, i.e., that it
will cover one of the cracks which separate them; a third player bets the coin will rest over 3, 4, or 6 cracks: it is required to find the chances for each of these players.

This problem is regarded as the precursor of the famous "needle problem" .

Buffon in his own words states, "I assume that in a room, the floor of which is merely divided by parallel lines, a stick is thrown upwards and one of the players bets the stick will not intersect any of the parallels on the floor, whereas on the contrary the other one bets the stick will intersect some one of these lines; it is required to find the chances of the two players. It is possible to play this game with a sewing needle or a headless pin."

Buffon's Needle

Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?

Uspensky's Proof:

Let the parallel lines be separated \( d \) units apart. The length of the needle is given by \( l \),with the assumption \( l \le d \).Uspensky (1937) provides a proof that the probability of an intersection is \( p = \frac{2l}{\pi d} \) . He develops this by considering a finite number of possible positions for the needle as equally likely outcomes and then treats the limiting case as a representation of the problem. This includes a definition of randomness for the distance \(x\) of the needle's midpoint to the nearest line and the acute angle \( \phi \) if formed by the needle and a perpendicular from the midpoint to the line. The solution is obtained by computing the ratio of favorable outcomes to the total set
of outcomes and passing to the limit.

A measure of the set of total outcomes is given by:

\( \int_{0}^{\frac{\pi}{2}} \int_{0}^{\frac{d}{2}} dx d \phi = \frac{\pi d}{4} \)

From he figure above, it's evident that the measure of the set of intersections is:

\( \int_{0}^{\frac{\pi}{2}} \int_{0}^{\frac{l}{2} \cos \phi } dx d \phi = \frac{l}{2} \)

Therefore \( p = \frac{l/2}{(\pi d)/4} = \frac{2l}{\pi d} \).

For knowing about the further generalizations to this problem you must go through Laplace's Extension and Long Needle case.

Now, let's explore something new!

Barycentric Coordinates take guard!

What are barycentric coordinates?

For precise definiton and illustration of barycentres , you can go through Particle Geometry and Triangles.

The barycentric coordinates are generally defined in context of triangles but they can be set up in a more general space \(\mathbb{R}^n \) . For n=1, it takes 2 distinct points \(A \) and \(B \) and two coordinates \(u\) and \(v\). Every point \(K\) on the real line is uniquely represented as \( K = uA+vB \) , where \( u+v=1 \). More generally, to define the barycentric coordinates in \( \mathbb{R}^n \) , one need \( n+1 \) points that do not lie in a space of lesser dimension.

Now, let's look at a problem:

Choose \(n\) points at random on a segment of length 1. What is the probability that an \((n+1)\)-gon (a polygon with \((n+1)\) sides) can be constructed from the \((n+1)\) thus obtained segments?

This is a generalization of this famous problem: "Two points are selected at random on a straight line segment of length 1. What is the probability that a triangle can be constructed out of thus obtained three segments?"

The above problem has a simple geometric proof and does not require heavy machinery , which will probably be addressed in our future posts. But the generalization makes the problem tempting enough to use barycentres.

First of all for the validity of the \( (n+1) \)-gon, it must satisfy:

\(x_i < x_0 + x_1 +...+x_n - x_i =1- x_i \ \forall i=\{ 0,1,...,n\} \)

Thus, \( x_i < \frac{1}{2} \).

The object obtained by this set of inequalities is is obtained from the basic simplex by removing smaller simplexes, one at every vertex. Each of these \( (n+1) \) smaller simplexes has the hypervolume equal to \( (\frac{1}{2})^n \) of the hypervolume of the basis simplex.

Thus, the probability that the barycentric co-ordinates satisfy this set of inequalities is \(p= 1- \frac{n+1}{2^n} \).

See that this probability goes to 1 as \(n\) grows larger and larger explaining the fact that it is easier to find segments to construct a many sided polygon than it is to find the sides of a triangle, which is rather natural.

Food For Thought

Today's section does not contain any problem, rather I would like to share a popular research problem regarding "The extension of Buffon's Needle Problem in three dimensions". There have been many attempts to incorporate another dimension to the traditional needle problem for instance, If the needle were released in a weightless environment, then it wouldn’t drop down to the plane, it would float. This introduces another dimension into the problem. I would suggest some research articles in the bibliography which discusses this problem in detail. You can go through them if this problem excites you enough to dig deep into it.

Till then , stay safe!

Ciao!

References:

1. " The Buffon Needle Problem Extended " - JAMES MCCARRY & FIROOZ KHOSRAVIYANI

2. "The Buffon–Laplace needle problem in three dimensions" - Zachary E Dell and Scott V Franklin

3. "Fifty Challenging Problems in Probability with Solutions" - Mosteller

4. "Geometric Probability" - Kendall, Morran

Understanding Statistical Regularity Through Random Walks | Cheenta Probability Series

This is another blog of the Cheenta Probability Series. Let's give a formal definition of statistical regularity to bring some seriousness into account.

**10 min read**

“The Law of Statistical Regularity formulated in the mathematical theory of probability lays down that a moderately large number of items chosen at random from a very large group are almost sure to have the characteristics of the large group.” ~ W.I.King

Starting off with statistical regularity

So, let's give a formal definition of this to bring some seriousness into account. If a sequence of independent experiments is held under the same specified conditions, the proportion of occurrences of a given event stabilize as the number of experiments becomes larger. This is ideally what is known to be statistical regularity.

It is an umbrella term that covers the law of large numbers, all central limit theorems and ergodic theorems.

But keeping in mind that we cover stuff for undergraduates mainly,we would not get into the aforementioned topics.

Richard Von Mises first mathematically demonstrated the idea of statistical regularity by pinpointing that no method for forming a subsequence of a random sequence (an infinite sequence of 0's and 1's) improves the odds for a specific event.

For instance, a sequence of fair coin tosses produces equal and independent 50/50 chances for heads and tails. A simple system of betting on heads every 3rd, 7th, or 21st toss, etc., does not change the odds of winning in the long run.

This is famously known as the "Impossibility of a gambling system".

The Random Walk

This is in itself a topic in advanced probability theory and stochastic processes , but I will try to keep it simple here. Let's consider a game in which a player starts at the point \( x=0 \) and at each move, is required to take a step either forward (toward \(+x\) ) or backward (toward \(−x\)). The choice is to be made randomly (maybe by tossing a coin). How shall we describe the resulting motion? In general, this problem is closely related to the coin tossing problem.

First let us look at a few examples of a random walk. We may characterize the walker’s progress by the net distance \(D_N \) traveled in N steps. We illustrate the graph of the random walker in three instances below:

What can we say about such a motion? We might first ask: “How far does he get on the average?” We must expect that his average progress will be zero, since he is equally likely to go either forward or backward. But we have the feeling that as \( N \) increases, he is more likely to have strayed farther from the starting point. We might, therefore, ask what is his average distance travelled in absolute value, that is, what is the average of \(|D|\). It is, however, more convenient to deal with another measure of “progress,” the square of the distance:\( D^2 \) is positive for either positive or negative motion, and is therefore a reasonable measure of such random wandering.

Now, till now we have not defined expected value of a quantity or a variable, which we sure will in the upcoming blog posts. For now , by “expected value” we mean the probable value (our best guess), which we can think of as the expected average behavior in many repeated sequences.

We represent such an expected value by \( ⟨D_N ^2⟩ \), and may refer to it also as the “mean square distance.” After one step, \( D^2 \) is always \( +1 \), so we have certainly \( ⟨D_1 ^2⟩=1 \).

Now , we have an obvious recursion between \(D_N\) and \(D_{N-1} \). More specifically, \(D_N = D_{N-1} +1 \) or \(D_N=D_{N-1}-1 \).

Thus, squaring, \(D_N ^2 = D_{N-1} ^2 + 2 D_{N-1}+1 \) or, \(D_N^2 = D_{N-1} ^2 - 2 D_{N-1}+1 \)

In a number of independent sequences, we expect to obtain each value one-half of the time, so our average expectation is just the average of the two possible values. The expected value of \(D_N ^2 \) is \(D_{N−1} ^2+1 \) . In general, we should expect for \(D_{N−1} ^2 \) its “expected value” \( ⟨D_{N−1} ^2 ⟩ \) (by definition!). So \( ⟨D_N ^2⟩=⟨D_{N−1} ^2 ⟩+1 \).

We have already seen that \(⟨D_1 ^2⟩=1\); it follows then that \( ⟨D_N ^2 ⟩=N \).

Damn, that was easy. Pretty simple right?

Now let's draw an analogy of this game with a simple coin tossing experiment ( which many authors use as the prototype for demonstrating regularity !). Yeah, we were thinking slightly in an unorthodox manner ;).

To appropriately represent drifting away from the origin, in a random walk, we can use the Root Mean Square distance:

\( D_R = \sqrt{ ⟨D ^2⟩ } =\sqrt{N} \).

If we imagine the direction of each step to be in correspondence with the appearance of heads or tails in a coin toss, then \( D = N_H−N_T \) , the difference in the number of heads and tails. Since \( N_H+N_T=N \), the total number of steps (and tosses), we have \( D=2N_H−N \).

Now, it's time to merge our intuition into reality. If the coin is honest or fair, what do you expect?

In \( N \) tosses, you should get \(\frac{N}{2} \) heads right?

So, lets observe the difference \( N_H - \frac{N}{2} =\frac{D}{2} \).

The RMS deviation is given by \( (N_H- \frac{N}{2})_{\text{RMS}}=\frac{\sqrt{N}}{2} \).

Thus, an actual \( N_H \) deviates from \( \frac{N}{2} \) by about \( \frac{\sqrt{N}}{2} \), or the fraction to deviate by \( \frac{1}{N} \frac{\sqrt{N}}{2} \).

So, the larger the \(N\) , the closer we expect the fraction \( \frac{N_H}{N} \) to \( \frac{1}{2} \).

Sounds familiar right? we circled back to statistical regularity again!

The fraction of tosses that gave heads in a particular sequence of tosses

Unfortunately, for any given run or combination of runs there is no guarantee that the observed deviation will be even near the expected deviation. There is always the finite chance that a large fluctuation—a long string of heads or tails—will give an arbitrarily large deviation. All we can say is that if the deviation is near the expected \( \frac{1}{2 \sqrt{N}} \) , we have no reason to suspect the honesty of the coin. If it is much larger, we may be suspicious, but cannot prove, that the coin is loaded (or that the tosser is clever!).

If you still suspect the fairness of the coin, you should probably learn the Physics of Coin Tossing which Uttaran Chatterjee would address in the next blog.

Another Simple Game of Chance

This is mainly for our reader friends who like to visualize through coding.

Let's consider a simple game of chance using a spinner. But this game is somewhat kind to the gambler.In our game,the payoff in each of several repeated plays is determined by spinning the spinner. We pay an entry fee for each play of the game and then receive the payoff indicated by the spinner. Let the payoff on the spinner be distributed uniformly around the circle; i.e. if the angle after the spin is \( \theta \), he receives \( \frac{\theta}{2 \pi} \) rupees. Thus, our payoff on one play is \(U\) rupees, where \(U\) is a random number taking values in \([0,1]\). Clearly, this is gambling for the poor :P.

Let us simulate the game to see what cumulative payoffs the gambler might receive, not counting the entry fees obviously, if he plays the game repeatedly.

Construct partial sums \( S_k = U_1+U_2+...+U_k, 1 \le k \le n \).

The successive partial sums form a random walk, with \(U_n\) being the \(n^{th} \) step and \(S_n\) being the position after \(n\) steps.

R Codes and plots

walk <- function(j)
{
  uniforms <- runif(10^j)
  
  firstsums <- cumsum(uniforms)
  
  sums <- c(0, firstsums)
  
  index <- order(sums)-1
  
  plot(index,sums,main="Random walk for for the partial sums",xlab="Number of trials",ylab="winnings",col=j)
  
}
par(mfrow=c(2,2))

walk(1)

walk(2)

walk(3)

walk(4)

We now present the 4 plots for 4 values of \(N\) , \(10,100,1000,10000 \).

Possible realizations of the first \(10^j \) steps

For small \(n\), that is for \(n=10\), we see irregularly spaced points increasing to the right, but as \(n\) increases, the spacing between the points becomes blurred and regularity emerges: the plots approach a straight line with slope equal to \( \frac{1}{2} \), the mean of a single step \(U_k\). If we look from a macroscopic point of view, ignoring the units on the axes, we see that the plots become independent of \(n\) as \(n\) increases. This is what regularity signifies.

Finally, here comes Number Theory!!

Trust me, I just can't get enough of the Prime Number Theorem. Here is a short problem to think about for future researchers in this field.

Suppose a random walker starts at the point \( S_0 = 2 \), and walks according to the following rules:

1. If the walker is on the \( n^{th} \) prime number \( p_n \), he moves to either
\( p_n + 1 \) or \( p_{n+1} \) with equal probability.

2. If the walker is on a composite number \(x \), he moves to one of the prime factors of \(x \), each with probability \( \frac{1}{\omega(x) }\), where \(\omega(n) \) denotes the number of distinct prime factors of \(n\).

The random walk is given by the sequence of moves \(S_n\).

What can you say about the quantity \( \mathbb{P}(\sup_{n \ge 0} S_n = \infty) \) ?

Give it a try.

Food For Thought

For our gambling game described above,if the expected payoff is \( \frac{1}{2} \) rupees each play of the game,the gamme is fair if the fee to play is \( \frac{1}{2} \) rupees.

Make a minor modification of this simulation by repeating the experiment after subtracting the mean \( \frac{1}{2} \) from each step of the random walk.

Now, plot the centered random walk (i.e. centered partial sums \(S_k - \frac{k}{2} \) for the same values of \(n\) as before.

Do you observe the same plots?

References

1.Experiencing Statistical Regularity - Ward Whitt

2.The Problem of The Random Walk- K.Pearson

3.An Introduction To Probability Theory and its applications - W.Feller

Stay tuned and keep following this series for getting more interesting perspectives on standard probability results and terms. I bet you too will see probability theory in a different light!

Keep learning, keep thinking!

Cheers..