MIT 6.041: Probabilistic Systems Analysis and Applied Probability
The following are notes taken while watching MIT 6.041.
Lecture 1: Probability Models and Axioms
Notation
Notation  Description 

The marginal probability of  
The marginal probability of  
The complement to the probability of  
The complement to the probability of  
The joint probability of and  
The union probability of and/or  
The conditional probability of given 
Axioms
 If , then
 If are a sequence of disjoint events, then
Lecture 2: Conditioning and Bayes’ Rule
 The final axiom in Lecture 1 only applies to countable sequences.
Take the unit square, , which is the union of all points in the interval . The probability of any one point is zero. By the final axiom, it at first appears that the probability of the unit square is the sum of the probabilities of all points, which would again be zero. This apparent contradition is resolved by recognizing that real numbers are an uncountable sequence, and therefore the final axiom does not apply.

“Zero probability” is not synonymous with “impossible.”

Conditional Probability: The probability that will occur given that has occured is the ratio between the joint probability of and and the total probability of . (Conditional probability is undefined if .)
$$
P(H  D) = \frac{P(H \cap D)}{P(D)}
$$
 Multiplication Rule: As a consequence of the definition of conditional probability,
$$
P(H \cap D) = P(H) \cdot P(DH) = P(D) \cdot P(HD)
$$
 Total Probability Theorem:
$$
P(H) = P(D) \cdot P(HD) + P(D^c) \cdot P(HD^c)
$$
 Bayes’ Theorem
$$
\overbrace{P(HD)}^{\mathrm{Posterior}} = \frac{\overbrace{P(DH)}^{\mathrm{Likelihood}}\overbrace{P(H)}^{\mathrm{Prior}}}{P(D)}
$$
 Union of Probabilities
$$
P(H \cup D) = P(H) + P(D)  P(H \cap D)
$$
Lecture 3: Independence
 Definition 1: The occurrence of provides no information about the occurrence of . This definition has the advantage of being intuitive, but does not apply when .
$$
P(AB) = P(A)
$$
 Definition 2: While perhaps less intuitive, there is no longer any restriction on .
$$
P(A \cap B) = P(A) \cdot P(B)
$$
 Note the distinction between probabilities being disjoint and being independent. Being disjoint in some sense implies maximal dependence, in that knowledge of one event gives perfect knowledge of the probability of the other.