6.2 Discrete and Continuous Probabilities

This section discusses two main types of probability distributions — discrete and continuous — and how they describe the likelihood of events depending on whether the target space is countable or continuous.


6.2.1 Discrete Probabilities

Definition 6.7 The probability that \(X\) takes a particular value \(x\) is given by the probability mass function (pmf): \[ P(X = x) \]

Example 6.6 Suppose a small factory inspects 4 items at random from a large production line. Each item is independently defective with probability \(0.2\).

Let
\[ X = \text{number of defective items among the 4 inspected}. \] The possible values of \(X\) are: \[ \{0,1,2,3,4\}. \]

Each inspection is a Bernoulli trial, so \(X\) follows a binomial distribution: \[ X \sim \text{Binomial}(n=4, p=0.2). \]

The PMF of a binomial random variable is: \[ p_X(x) = P(X = x) = \binom{4}{x}(0.2)^x(0.8)^{4-x}, \quad x = 0,1,2,3,4. \] Evaluating: \[ \begin{aligned} P(X=0) &= (0.8)^4 = 0.4096 \\ P(X=1) &= 4(0.2)(0.8)^3 = 0.4096 \\ P(X=2) &= 6(0.2)^2(0.8)^2 = 0.1536 \\ P(X=3) &= 4(0.2)^3(0.8) = 0.0256 \\ P(X=4) &= (0.2)^4 = 0.0016 \end{aligned} \] Therefore, we have the following PMF as a table:

\(x\) \(P(X = x)\)
0 0.4096
1 0.4096
2 0.1536
3 0.0256
4 0.0016

Notice that the PMF is not uniform: some outcomes are much more likely than others. Most of the probability mass is concentrated at \(x=0\) and \(x=1\). Exact probabilities for events can be computed by summing the PMF for example: \[ P(X \le 1) = P(X=0) + P(X=1) = 0.8192. \]

We note that this has all of the PMF requirements:

  • \(p_X(x) \ge 0\) for all \(x\)
  • \(\sum_{x=0}^4 p_X(x) = 1\)
  • Each value of \(X\) has a clearly defined probability.

Definition 6.8 For two random variables \(X\) and \(Y\):

  • The joint probability is \(P(X = x_i, Y = y_j)\), denoted \(p(x, y)\).
  • The marginal probability of \(X\) is obtained by summing over all possible \(y\): \[ P(X = x_i) = \sum_j P(X = x_i, Y = y_j) \]
  • The conditional probability of \(Y\) given \(X\) is: \[ P(Y = y_j \mid X = x_i) = \frac{P(X = x_i, Y = y_j)}{P(X = x_i)} \]

Example 6.7 A school surveys students about whether they study regularly and whether they pass a math exam. The events are:

  • \(S\): student studies regularly
  • \(P\): student passes the exam

The results are summarized in the following joint probability table:

Pass (\(P\)) Fail (\(P^c\)) Total
Study (\(S\)) 0.42 0.08 0.50
No Study (\(S^c\)) 0.18 0.32 0.50
Total 0.60 0.40 1.00

Joint probabilities describe the probability that two events occur together. For example, \[ P(S,P) = P(S \cap P) = 0.42, \quad P(S^c, P^c) = P(S^c \cap P^c) = 0.32. \]

Marginal probabilities are obtained by summing over rows or columns of the joint table. \[ \begin{aligned} P(S) &= 0.42 + 0.08 = 0.50 \\ P(S^c) &= 0.18 + 0.32 = 0.50 \\ P(P) &= 0.42 + 0.18 = 0.60 \\ P(P^c) &= 0.08 + 0.32 = 0.40 \end{aligned} \]

Conditional probability measures the likelihood of one event given that another event has occurred. For example, the probability of passing given the student studies: \[ P(P \mid S) = \frac{P(S \cap P)}{P(S)} = \frac{0.42}{0.50} = 0.84. \] Another example might be the probability of passing given the student does not study: \[ P(P \mid S^c) = \frac{0.18}{0.50} = 0.36. \]

The probabilities of all possible states must sum to one: \[ \sum_i P(X = x_i) = 1 \] Discrete distributions are commonly used to model categorical variables, such as labels or class features.


6.2.2 Continuous Probabilities

Definition 6.9 A continuous random variable takes values from an interval on the real line \(\mathbb{R}\).

The probability that \(X\) lies in an interval \([a, b]\) is: \[ P(a \le X \le b) = \int_a^b f(x) \, dx \]

Definition 6.10 The function \(f(x)\) is the probability density function (pdf), which satisfies:

  1. \(f(x) \ge 0\) for all \(x\)
  2. \(\int_{-\infty}^{\infty} f(x) \, dx = 1\)

The cumulative distribution function (cdf) is defined as: \[ F_X(x) = P(X \le x) = \int_{-\infty}^{x} f(t) \, dt \]

Example 6.8 Let \(X\) be a continuous random variable representing the amount of time (in hours) a student spends studying for an exam. Assume \(X\) has the following probability density function (PDF): \[ f_X(x) = \begin{cases} \frac{1}{4}, & 0 \le x \le 4, \\ 0, & \text{otherwise}. \end{cases} \] This is a uniform distribution on the interval \([0,4]\). The height of the density is constant: \(f_X(x) = \frac{1}{4}\). Probabilities are found by computing areas, not by evaluating the PDF at a point. For example, the probability that a student studies between 1 and 3 hours is: \[ P(1 \le X \le 3) = \int_1^3 \frac{1}{4} \, dx = \frac{1}{4}(3 - 1) = \frac{1}{2}. \]

The cumulative distribution function (CDF) is defined by: \[ F_X(x) = P(X \le x). \] Compute \(F_X(x)\) by integrating the PDF: \[ F_X(x) = \begin{cases} 0, & x < 0, \\ \displaystyle \int_0^x \frac{1}{4} \, dt = \frac{x}{4}, & 0 \le x \le 4, \\ 1, & x > 4. \end{cases} \] \(F_X(x)\) gives the probability that the study time is at most \(x\) hours. For example: \[ P(X \le 2) = F_X(2) = \frac{2}{4} = 0.5. \]

Note that \(P(X = x) = 0\) for continuous random variables.

Example 6.9 Let \(X\) be a continuous random variable (CRV) with probability density function (PDF) \(f_X(x)\). By definition, probabilities for a CRV are computed using integrals: \[ P(a \le X \le b) = \int_a^b f_X(x)\,dx. \] Consider the probability that \(X\) takes exactly one value \(x_0\): \[ P(X = x_0). \] This probability corresponds to the integral over an interval of zero width: \[ P(X = x_0) = \int_{x_0}^{x_0} f_X(x)\,dx. \] Since the limits of integration are the same, \[ \int_{x_0}^{x_0} f_X(x)\,dx = 0. \] Therefore, \[ P(X = x_0) = 0 \] for any real number \(x_0\).


6.2.3 Contrasting Discrete and Continuous Distributions

Property Discrete Continuous
Representation Probability Mass Function \(p(x)\) Probability Density Function \(f(x)\)
Domain Finite or countable set Interval in \(\mathbb{R}\)
Probability of a single value \(P(X = x) \geq 0\) \(P(X = x) = 0\)
Normalization \(\sum_x p(x) = 1\) \(\int f(x)dx = 1\)
Example Categorical variable, coin toss Gaussian, uniform distribution on an interval

Example 6.10 Discrete case:
A variable \(Z\) with three equally likely outcomes \(\{-1.1, 0.3, 1.5\}\): \[ P(Z = z_i) = \frac{1}{3} \]

Example 6.11 Continuous case:
A variable \(X\) uniformly distributed over \([0.9, 1.6]\) has: \[ \int_{0.9}^{1.6} p(x) \, dx = 1 \] The height of \(p(x)\) can exceed 1 as long as the total area equals 1.


Exercises

Exercise 6.3 Consider the table below:

\(x_1\) \(x_2\) \(x_3\) \(x_4\)
\(y_1\) 10 40 65 35
\(y_2\) 15 55 25 60
\(y_3\) 20 30 50 45
  1. \(p(x_i, y_j) = p(x, y)\). What is this called and what is \(p(x_3, y_2)\)?
  2. \(p(X = x) = p(x)\). What is this called and what is the formula?
  3. Find \(p(x_4)\). Write it out using the formula.
  4. Find \(p(y_2)\). Write it out using the formula.
  5. \(p(X = x_i \mid Y = y_j) = p(x \mid y)\). What is this called and what is the formula?
  6. Find \(p(x_2 \mid y_3)\) and \(p(y_1 \mid x_4)\).
  7. \(p(x) = \sum_{y \in Y} p(x, y)\). What is this called? Use the formula to find \(p(x_2)\).

Exercise 6.4 Let \(X\) be a random variable with PDF given by \[f_X(x)= \begin{cases} cx^2 & |x| \leq 1\\0 & otherwise \end{cases}.\]

  1. Find the constant \(c\).
  2. Find \(P(X \geq 1/2)\)

Exercise 6.5 Let \(X\) be a continuous random variable with PDF \[f_X(x) = \dfrac{1}{2}e^{-|x|}, \;\;\;\; x \in \mathbb{R}.\] If \(Y = X^2\), find the CDF of \(Y\).

Exercise 6.6 Let \(X\) be a continuous random variable with PDF \[f_X(x) = \begin{cases} 4x^3 & 0 < x \leq 1\\0 & otherwise \end{cases}.\] Find \(P(X\leq 2/3 | X > 1/3)\).

Exercise 6.7 Let \(f(x) = k(3x^2 + 1)\).

  1. Find the value of \(k\) that makes the given function a PDF on the interval \(0 \leq x \leq 2\).
  2. Let \(X\) be a continuous random variable whose PDF is \(f(x)\). Compute the probability that \(X\) is between 1 and 2.
  3. Find the distribution function of \(X\).
  4. Find the probability that \(X\) is exactly equal to 1.

Exercise 6.8 Let \[f(t) = \begin{cases}t & 0 < t \leq 1\\ 2-t & 1 < t \leq 2 \\ 0 & otherwise \end{cases}.\]

  1. Prove this is a PDF.
  2. Find \(p(x \leq 1.5)\)
  3. Find \(p(x > 1.2)\)
  4. Find \(p(1.2 < x \leq 1.5)\)
  5. Find \(p(x = 1)\)

Exercise 6.9 Show that the normal distribution is a PDF. Note that the normal distribution is given by \[f(z) = \dfrac{1}{\sqrt{2\pi}}e^{-z^2/2}.\]