6.2 Discrete and Continuous Probabilities
This section discusses two main types of probability distributions — discrete and continuous — and how they describe the likelihood of events depending on whether the target space is countable or continuous.
6.2.1 Discrete Probabilities
Definition 6.7 The probability that \(X\) takes a particular value \(x\) is given by the probability mass function (pmf): \[ P(X = x) \]
Example 6.6 Suppose a small factory inspects 4 items at random from a large production line. Each item is independently defective with probability \(0.2\).
Let
\[
X = \text{number of defective items among the 4 inspected}.
\]
The possible values of \(X\) are:
\[
\{0,1,2,3,4\}.
\]
Each inspection is a Bernoulli trial, so \(X\) follows a binomial distribution: \[ X \sim \text{Binomial}(n=4, p=0.2). \]
The PMF of a binomial random variable is: \[ p_X(x) = P(X = x) = \binom{4}{x}(0.2)^x(0.8)^{4-x}, \quad x = 0,1,2,3,4. \] Evaluating: \[ \begin{aligned} P(X=0) &= (0.8)^4 = 0.4096 \\ P(X=1) &= 4(0.2)(0.8)^3 = 0.4096 \\ P(X=2) &= 6(0.2)^2(0.8)^2 = 0.1536 \\ P(X=3) &= 4(0.2)^3(0.8) = 0.0256 \\ P(X=4) &= (0.2)^4 = 0.0016 \end{aligned} \] Therefore, we have the following PMF as a table:
| \(x\) | \(P(X = x)\) |
|---|---|
| 0 | 0.4096 |
| 1 | 0.4096 |
| 2 | 0.1536 |
| 3 | 0.0256 |
| 4 | 0.0016 |
Notice that the PMF is not uniform: some outcomes are much more likely than others. Most of the probability mass is concentrated at \(x=0\) and \(x=1\). Exact probabilities for events can be computed by summing the PMF for example: \[ P(X \le 1) = P(X=0) + P(X=1) = 0.8192. \]
We note that this has all of the PMF requirements:
- \(p_X(x) \ge 0\) for all \(x\)
- \(\sum_{x=0}^4 p_X(x) = 1\)
- Each value of \(X\) has a clearly defined probability.
Definition 6.8 For two random variables \(X\) and \(Y\):
- The joint probability is \(P(X = x_i, Y = y_j)\), denoted \(p(x, y)\).
- The marginal probability of \(X\) is obtained by summing over all possible \(y\): \[ P(X = x_i) = \sum_j P(X = x_i, Y = y_j) \]
- The conditional probability of \(Y\) given \(X\) is: \[ P(Y = y_j \mid X = x_i) = \frac{P(X = x_i, Y = y_j)}{P(X = x_i)} \]
Example 6.7 A school surveys students about whether they study regularly and whether they pass a math exam. The events are:
- \(S\): student studies regularly
- \(P\): student passes the exam
The results are summarized in the following joint probability table:
| Pass (\(P\)) | Fail (\(P^c\)) | Total | |
|---|---|---|---|
| Study (\(S\)) | 0.42 | 0.08 | 0.50 |
| No Study (\(S^c\)) | 0.18 | 0.32 | 0.50 |
| Total | 0.60 | 0.40 | 1.00 |
Joint probabilities describe the probability that two events occur together. For example, \[ P(S,P) = P(S \cap P) = 0.42, \quad P(S^c, P^c) = P(S^c \cap P^c) = 0.32. \]
Marginal probabilities are obtained by summing over rows or columns of the joint table. \[ \begin{aligned} P(S) &= 0.42 + 0.08 = 0.50 \\ P(S^c) &= 0.18 + 0.32 = 0.50 \\ P(P) &= 0.42 + 0.18 = 0.60 \\ P(P^c) &= 0.08 + 0.32 = 0.40 \end{aligned} \]
Conditional probability measures the likelihood of one event given that another event has occurred. For example, the probability of passing given the student studies: \[ P(P \mid S) = \frac{P(S \cap P)}{P(S)} = \frac{0.42}{0.50} = 0.84. \] Another example might be the probability of passing given the student does not study: \[ P(P \mid S^c) = \frac{0.18}{0.50} = 0.36. \]
The probabilities of all possible states must sum to one: \[ \sum_i P(X = x_i) = 1 \] Discrete distributions are commonly used to model categorical variables, such as labels or class features.
6.2.2 Continuous Probabilities
Definition 6.9 A continuous random variable takes values from an interval on the real line \(\mathbb{R}\).
The probability that \(X\) lies in an interval \([a, b]\) is: \[ P(a \le X \le b) = \int_a^b f(x) \, dx \]
Definition 6.10 The function \(f(x)\) is the probability density function (pdf), which satisfies:
- \(f(x) \ge 0\) for all \(x\)
- \(\int_{-\infty}^{\infty} f(x) \, dx = 1\)
The cumulative distribution function (cdf) is defined as: \[ F_X(x) = P(X \le x) = \int_{-\infty}^{x} f(t) \, dt \]
Example 6.8 Let \(X\) be a continuous random variable representing the amount of time (in hours) a student spends studying for an exam. Assume \(X\) has the following probability density function (PDF): \[ f_X(x) = \begin{cases} \frac{1}{4}, & 0 \le x \le 4, \\ 0, & \text{otherwise}. \end{cases} \] This is a uniform distribution on the interval \([0,4]\). The height of the density is constant: \(f_X(x) = \frac{1}{4}\). Probabilities are found by computing areas, not by evaluating the PDF at a point. For example, the probability that a student studies between 1 and 3 hours is: \[ P(1 \le X \le 3) = \int_1^3 \frac{1}{4} \, dx = \frac{1}{4}(3 - 1) = \frac{1}{2}. \]
The cumulative distribution function (CDF) is defined by: \[ F_X(x) = P(X \le x). \] Compute \(F_X(x)\) by integrating the PDF: \[ F_X(x) = \begin{cases} 0, & x < 0, \\ \displaystyle \int_0^x \frac{1}{4} \, dt = \frac{x}{4}, & 0 \le x \le 4, \\ 1, & x > 4. \end{cases} \] \(F_X(x)\) gives the probability that the study time is at most \(x\) hours. For example: \[ P(X \le 2) = F_X(2) = \frac{2}{4} = 0.5. \]
Note that \(P(X = x) = 0\) for continuous random variables.
Example 6.9 Let \(X\) be a continuous random variable (CRV) with probability density function (PDF) \(f_X(x)\). By definition, probabilities for a CRV are computed using integrals: \[ P(a \le X \le b) = \int_a^b f_X(x)\,dx. \] Consider the probability that \(X\) takes exactly one value \(x_0\): \[ P(X = x_0). \] This probability corresponds to the integral over an interval of zero width: \[ P(X = x_0) = \int_{x_0}^{x_0} f_X(x)\,dx. \] Since the limits of integration are the same, \[ \int_{x_0}^{x_0} f_X(x)\,dx = 0. \] Therefore, \[ P(X = x_0) = 0 \] for any real number \(x_0\).
6.2.3 Contrasting Discrete and Continuous Distributions
| Property | Discrete | Continuous |
|---|---|---|
| Representation | Probability Mass Function \(p(x)\) | Probability Density Function \(f(x)\) |
| Domain | Finite or countable set | Interval in \(\mathbb{R}\) |
| Probability of a single value | \(P(X = x) \geq 0\) | \(P(X = x) = 0\) |
| Normalization | \(\sum_x p(x) = 1\) | \(\int f(x)dx = 1\) |
| Example | Categorical variable, coin toss | Gaussian, uniform distribution on an interval |
Example 6.10 Discrete case:
A variable \(Z\) with three equally likely outcomes \(\{-1.1, 0.3, 1.5\}\):
\[
P(Z = z_i) = \frac{1}{3}
\]
Example 6.11 Continuous case:
A variable \(X\) uniformly distributed over \([0.9, 1.6]\) has:
\[
\int_{0.9}^{1.6} p(x) \, dx = 1
\]
The height of \(p(x)\) can exceed 1 as long as the total area equals 1.
Exercises
Exercise 6.3 Consider the table below:
| \(x_1\) | \(x_2\) | \(x_3\) | \(x_4\) | |
|---|---|---|---|---|
| \(y_1\) | 10 | 40 | 65 | 35 |
| \(y_2\) | 15 | 55 | 25 | 60 |
| \(y_3\) | 20 | 30 | 50 | 45 |
- \(p(x_i, y_j) = p(x, y)\). What is this called and what is \(p(x_3, y_2)\)?
- \(p(X = x) = p(x)\). What is this called and what is the formula?
- Find \(p(x_4)\). Write it out using the formula.
- Find \(p(y_2)\). Write it out using the formula.
- \(p(X = x_i \mid Y = y_j) = p(x \mid y)\). What is this called and what is the formula?
- Find \(p(x_2 \mid y_3)\) and \(p(y_1 \mid x_4)\).
- \(p(x) = \sum_{y \in Y} p(x, y)\). What is this called? Use the formula to find \(p(x_2)\).
Exercise 6.4 Let \(X\) be a random variable with PDF given by \[f_X(x)= \begin{cases} cx^2 & |x| \leq 1\\0 & otherwise \end{cases}.\]
- Find the constant \(c\).
- Find \(P(X \geq 1/2)\)
Exercise 6.5 Let \(X\) be a continuous random variable with PDF \[f_X(x) = \dfrac{1}{2}e^{-|x|}, \;\;\;\; x \in \mathbb{R}.\] If \(Y = X^2\), find the CDF of \(Y\).
Exercise 6.6 Let \(X\) be a continuous random variable with PDF \[f_X(x) = \begin{cases} 4x^3 & 0 < x \leq 1\\0 & otherwise \end{cases}.\] Find \(P(X\leq 2/3 | X > 1/3)\).
Exercise 6.7 Let \(f(x) = k(3x^2 + 1)\).
- Find the value of \(k\) that makes the given function a PDF on the interval \(0 \leq x \leq 2\).
- Let \(X\) be a continuous random variable whose PDF is \(f(x)\). Compute the probability that \(X\) is between 1 and 2.
- Find the distribution function of \(X\).
- Find the probability that \(X\) is exactly equal to 1.
Exercise 6.8 Let \[f(t) = \begin{cases}t & 0 < t \leq 1\\ 2-t & 1 < t \leq 2 \\ 0 & otherwise \end{cases}.\]
- Prove this is a PDF.
- Find \(p(x \leq 1.5)\)
- Find \(p(x > 1.2)\)
- Find \(p(1.2 < x \leq 1.5)\)
- Find \(p(x = 1)\)
Exercise 6.9 Show that the normal distribution is a PDF. Note that the normal distribution is given by \[f(z) = \dfrac{1}{\sqrt{2\pi}}e^{-z^2/2}.\]