# Data, Determinant and Simplex

This is a beautiful problem connecting linear algebra, geometry and data. Go ahead and delve into the glorious connection.

## Problem

Given a matrix \( \begin{bmatrix}a & b \\c & d \end{bmatrix} \) with the constraint \( 1 \geq a, b, c, d \geq 0; a + b + c + d = 1\), find the matrix with the largest determinant.

Is there any statistical significance behind this result?

### Prerequisites

- Linear Algebra
- Euclidean Geometry
- Determinant as an Area
- AM - GM Inequality

## Solution ( Geometrical )

#### Step 1

Take two vectors \( v = (a,c) and w = (b,d)\) such that their addition lies on \(v +w lies on x + y = 1\) line. Now, we need to find a pair of vectors {\(v, w\)}such that the area formed by these two vectors is maximum.

#### Step 2

Rotate the parallelogram such that CF lies on the X - axis.

Now, observe that this new parallelogram has an area same as the initial one. Can you give a new parallelogram with a larger area?

#### Step 3

Just extend the vertices to the end of the simplex OAB. Observe that the new parallelogram has a larger area than the initial parallelogram. Is there any thing larger?

#### Step 4

Now, extend it to a rectangle. Voila! It has a larger area. Now therefore, given any non rectangular parallelogram we can find a rectangle with a larger area than the parallelogram. So, let's search in the region of rectangles. What do you guess is the answer?

#### Step 5

A Square!

Let the rectangle has length \(x, y\) and area \(xy\). Now, observe that \(xy\) is maximized with respect to \(x+y = 1\) when \(x = y = \frac{1}{2}\). [Use AM - GM Inequality].

So, \(v = (0,\frac{1}{2}) \) and \( w = (\frac{1}{2},0) \) maximizes the determinant.

## Challenge 1

Prove it using algebraic methods borrowed from this geometrical thinking. Your solution will be put upon here.

## Challenge 2

Can you generalize this result for \( n \times n \) matrices? If, yes prove it. Just algebrify the steps.

## Statistical Significance

Lung Cancer and Smoker Data

Observe that that if, we divide every thing by 1000, we get a matrix.

So, the question is about association of Smoking and Lung Cancer. Given these 1000 individuals let's see how the distribution of the numbers result in what odd ratio?

For the categorical table data \( \begin{bmatrix}a & b \\c & d \end{bmatrix} \) the odd's ratio is defined as \(\frac{ad}{bc} = \frac{det(\begin{bmatrix}a & b \\c & d \end{bmatrix})}{bc} + 1\)

The log odd's ratio is defined as \( log(ad) - log(bc)\).

Observe the above data, observe that Log Odd's Ratio is almost behaving like the determinant. When \( X = 1\) and \(X = 0\) depend on Y uniformly, no information of dependence is released. Hence, Log Odd's Ratio is 0 and so is the Determinant.

Try to understand, why the Log Odd's ratio is behaving same as Odd's Ratio?

\( log(x)\) is increasing and so is \(x\) hence, \(log(ad) - log(bc)\) must have the same nature as \(ad -bc\).

Share your ideas here. I will write in more details about this phenemenon.

Stay Tuned! Stay Blessed!