The following are notes taken while watching MIT 6.041.

## Lecture 1: Probability Models and Axioms

### Notation

Notation Description
$P(H)$ The marginal probability of $H$
$P(D)$ The marginal probability of $D$
$P(H^c)$ The complement to the probability of $H$
$P(D^c)$ The complement to the probability of $D$
$P(H \cap D)$ The joint probability of $H$ and $D$
$P(H \cup D)$ The union probability of $H$ and/or $D$
$P(H | D)$ The conditional probability of $H$ given $D$

### Axioms

1. $P(A) \ge 0$
2. $P(\Omega) = 1$
3. If $A \cap B = \emptyset$, then $P(A \cup B) = P(A) + P(B)$
4. If $A_1, A_2, A_3, \dots$ are a sequence of disjoint events, then $P(A_1 \cup A_2 \cup A_3 \cup \dots) = P(A_1) + P(A_2) + P(A_3) + \dots$

## Lecture 2: Conditioning and Bayes’ Rule

• The final axiom in Lecture 1 only applies to countable sequences.

Take the unit square, $\square$, which is the union of all points $(x, y)$ in the interval $[0, 1]$. The probability of any one point is zero. By the final axiom, it at first appears that the probability of the unit square is the sum of the probabilities of all points, which would again be zero. This apparent contradition is resolved by recognizing that real numbers are an uncountable sequence, and therefore the final axiom does not apply.

• “Zero probability” is not synonymous with “impossible.”

• Conditional Probability: The probability that $H$ will occur given that $D$ has occured is the ratio between the joint probability of $H$ and $D$ and the total probability of $D$. (Conditional probability is undefined if $P(D) = 0$.)

$$P(H | D) = \frac{P(H \cap D)}{P(D)}$$
• Multiplication Rule: As a consequence of the definition of conditional probability,
$$P(H \cap D) = P(H) \cdot P(D|H) = P(D) \cdot P(H|D)$$
• Total Probability Theorem:
$$P(H) = P(D) \cdot P(H|D) + P(D^c) \cdot P(H|D^c)$$
• Bayes’ Theorem
$$\overbrace{P(H|D)}^{\mathrm{Posterior}} = \frac{\overbrace{P(D|H)}^{\mathrm{Likelihood}}\overbrace{P(H)}^{\mathrm{Prior}}}{P(D)}$$
• Union of Probabilities
$$P(H \cup D) = P(H) + P(D) - P(H \cap D)$$

## Lecture 3: Independence

• Definition 1: The occurrence of $A$ provides no information about the occurrence of $B$. This definition has the advantage of being intuitive, but does not apply when $P(B)=0$.
$$P(A|B) = P(A)$$
• Definition 2: While perhaps less intuitive, there is no longer any restriction on $P(B)$.
$$P(A \cap B) = P(A) \cdot P(B)$$
• Note the distinction between probabilities being disjoint and being independent. Being disjoint in some sense implies maximal dependence, in that knowledge of one event gives perfect knowledge of the probability of the other.