The following are notes taken while watching MIT 6.041.

Lecture 1: Probability Models and Axioms

Notation

Notation Description
The marginal probability of
The marginal probability of
The complement to the probability of
The complement to the probability of
The joint probability of and
The union probability of and/or
The conditional probability of given

Axioms

  1. If , then
  2. If are a sequence of disjoint events, then

Lecture 2: Conditioning and Bayes’ Rule

  • The final axiom in Lecture 1 only applies to countable sequences.

Take the unit square, , which is the union of all points in the interval . The probability of any one point is zero. By the final axiom, it at first appears that the probability of the unit square is the sum of the probabilities of all points, which would again be zero. This apparent contradition is resolved by recognizing that real numbers are an uncountable sequence, and therefore the final axiom does not apply.

  • “Zero probability” is not synonymous with “impossible.”

  • Conditional Probability: The probability that will occur given that has occured is the ratio between the joint probability of and and the total probability of . (Conditional probability is undefined if .)

$$ P(H | D) = \frac{P(H \cap D)}{P(D)} $$
  • Multiplication Rule: As a consequence of the definition of conditional probability,
$$ P(H \cap D) = P(H) \cdot P(D|H) = P(D) \cdot P(H|D) $$
  • Total Probability Theorem:
$$ P(H) = P(D) \cdot P(H|D) + P(D^c) \cdot P(H|D^c) $$
  • Bayes’ Theorem
$$ \overbrace{P(H|D)}^{\mathrm{Posterior}} = \frac{\overbrace{P(D|H)}^{\mathrm{Likelihood}}\overbrace{P(H)}^{\mathrm{Prior}}}{P(D)} $$
  • Union of Probabilities
$$ P(H \cup D) = P(H) + P(D) - P(H \cap D) $$

Lecture 3: Independence

  • Definition 1: The occurrence of provides no information about the occurrence of . This definition has the advantage of being intuitive, but does not apply when .
$$ P(A|B) = P(A) $$
  • Definition 2: While perhaps less intuitive, there is no longer any restriction on .
$$ P(A \cap B) = P(A) \cdot P(B) $$
  • Note the distinction between probabilities being disjoint and being independent. Being disjoint in some sense implies maximal dependence, in that knowledge of one event gives perfect knowledge of the probability of the other.