### Conditional distribution

**
In probability theory and statistics, given two jointly distributed random variables ***X* and *Y*, the **conditional probability distribution** of *Y* given *X* is the probability distribution of *Y* when *X* is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value *x* of *X* as a parameter. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.

If the conditional distribution of *Y* given *X* is a continuous distribution, then its probability density function is known as the **conditional density function**. The properties of a conditional distribution, such as the moments, are often referred to by corresponding names such as the conditional mean and conditional variance.

More generally, one can refer to the conditional distribution of a subset of a set of more than two variables; this conditional distribution is contingent on the values of all the remaining variables, and if more than one variable is included in the subset then this conditional distribution is the conditional joint distribution of the included variables.

## Contents

## Discrete distributions

For discrete random variables, the conditional probability mass function of *Y* given the occurrence of the value *x* of *X* can be written according to its definition as:

- $p\_Y(y\backslash mid\; X\; =\; x)=P(Y\; =\; y\; \backslash mid\; X\; =\; x)\; =\; \backslash frac\{P(X=x\backslash \; \backslash cap\; Y=y)\}\{P(X=x)\}.$

Due to the occurrence of $P(X=x)$ in a denominator, this is defined only for non-zero (hence strictly positive) $P(X=x).$

The relation with the probability distribution of *X* given *Y* is:

- $P(Y=y\; \backslash mid\; X=x)\; P(X=x)\; =\; P(X=x\backslash \; \backslash cap\; Y=y)\; =\; P(X=x\; \backslash mid\; Y=y)P(Y=y).$

## Continuous distributions

Similarly for continuous random variables, the conditional probability density function of *Y* given the occurrence of the value *x* of *X* can be written as

- $f\_Y(y\; \backslash mid\; X=x)\; =\; \backslash frac\{f\_\{X,\; Y\}(x,\; y)\}\{f\_X(x)\},$

where *f _{X,Y}*(

*x, y*) gives the joint density of

*X*and

*Y*, while

*f*(

_{X}*x*) gives the marginal density for

*X*. Also in this case it is necessary that $f\_X(x)>0$.

The relation with the probability distribution of *X* given *Y* is given by:

- $f\_Y(y\; \backslash mid\; X=x)f\_X(x)\; =\; f\_\{X,Y\}(x,\; y)\; =\; f\_X(x\; \backslash mid\; Y=y)f\_Y(y).$

The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.

## Relation to independence

Random variables *X*, *Y* are independent if and only if the conditional distribution of *Y* given *X* is, for all possible realizations of *X*, equal to the unconditional distribution of *Y*. For discrete random variables this means *P*(*Y* = *y* | *X* = *x*) = *P*(*Y* = *y*) for all relevant *x* and *y*. For continuous random variables *X* and *Y*, having a joint density function, it means *f*_{Y}(*y* | *X=x*) = *f*_{Y}(*y*) for all relevant x and y.

## Properties

Seen as a function of *y* for given *x*, *P*(*Y* = *y* | *X* = *x*) is a probability and so the sum over all *y* (or integral if it is a conditional probability density) is 1. Seen as a function of *x* for given *y*, it is a likelihood function, so that the sum over all *x* need not be 1.

## Measure-Theoretic Formulation

Let $(\backslash Omega,\; \backslash mathcal\{F\},\; P)$ be a probability space, $\backslash mathcal\{G\}\; \backslash subseteq\; \backslash mathcal\{F\}$ a $\backslash sigma$-field in $\backslash mathcal\{F\}$, and $X:\; \backslash Omega\; \backslash to\; \backslash mathbb\{R\}$ a real-valued random variable (measurable with respect to the Borel $\backslash sigma$-field $\backslash mathcal\{R\}^1$ on $\backslash mathbb\{R\}$). It can be shown that there exists^{[1]} a function $\backslash mu:\; \backslash mathcal\{R\}^1\; \backslash times\; \backslash Omega\; \backslash to\; \backslash mathbb\{R\}$ such that $\backslash mu(\backslash cdot,\; \backslash omega)$ is a probability measure on $\backslash mathcal\{R\}^1$ for each $\backslash omega\; \backslash in\; \backslash Omega$ (i.e., it is **regular**) and $\backslash mu(H,\; \backslash cdot)\; =\; P(X\; \backslash in\; H\; |\; \backslash mathcal\{G\})$ (almost surely) for every $H\; \backslash in\; \backslash mathcal\{R\}^1$. For any $\backslash omega\; \backslash in\; \backslash Omega$, the function $\backslash mu(\backslash cdot,\; \backslash omega):\; \backslash mathcal\{R\}^1\; \backslash to\; \backslash mathbb\{R\}$ is called a **conditional probability distribution** of $X$ given $\backslash mathcal\{G\}$. In this case,

- $E[X\; |\; \backslash mathcal\{G\}]\; =\; \backslash int\_\{-\backslash infty\}^\backslash infty\; x\; \backslash ,\; \backslash mu(d\; x,\; \backslash cdot)$

almost surely.