# Bayes' theorem

Bayes' theorem (also known as Bayes' rule or Bayes' law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. In some interpretations of probability, Bayes' theorem tells how to update or revise beliefs in light of new evidence a posteriori.

The probability of an event A conditional on another event B is generally different from the probability of B conditional on A. However, there is a definite relationship between the two, and Bayes' theorem is the statement of that relationship.

As a formal theorem, Bayes' theorem is valid in all interpretations of probability. However, frequentist and Bayesian interpretations disagree about the kinds of things to which probabilities should be assigned in applications: frequentists assign probabilities to random events according to their frequencies of occurrence or to subsets of populations as proportions of the whole; Bayesians assign probabilities to propositions that are uncertain. A consequence is that Bayesians have more frequent occasion to use Bayes' theorem. The articles on Bayesian probability and frequentist probability discuss these debates at greater length.

## Statement of Bayes' theorem

Bayes' theorem relates the conditional and marginal probabilities of stochastic events A and B: \begin{align} P(A|B) & = \frac{P(B | A)\, P(A)}{P(B)} \\ & \propto L(A | B)\, P(A) \end{align}

where L(A|B) is the likelihood of A given fixed B. Although in this case the relationship P(B | A) = L(A | B), in other cases likelihood L can be multiplied by a constant factor, so that it is proportional to, but does not equal probability P.

Each term in Bayes' theorem has a conventional name:

With this terminology, the theorem may be paraphrased as $\mbox{posterior} = \frac{\mbox{likelihood} \times \mbox{prior}} {\mbox{normalizing constant}}$

In words: the posterior probability is proportional to the product of the prior probability and the likelihood.

In addition, the ratio P(B|A)/P(B) is sometimes called the standardised likelihood, so the theorem may also be paraphrased as $\mbox{posterior} = {\mbox{standardised likelihood} \times \mbox{prior} }.\,$

## Derivation from conditional probabilities

To derive the theorem, we start from the definition of conditional probability. The probability of event A given event B is $P(A|B)=\frac{P(A \cap B)}{P(B)}.$

Likewise, the probability of event B given event A is $P(B|A) = \frac{P(A \cap B)}{P(A)}. \!$

Rearranging and combining these two equations, we find $P(A|B)\, P(B) = P(A \cap B) = P(B|A)\, P(A). \!$

This lemma is sometimes called the product rule for probabilities. Dividing both sides by Pr(B), providing that it is non-zero, we obtain Bayes' theorem: $P(A|B) = \frac{P(B|A)\,P(A)}{P(B)}. \!$

## Alternative forms of Bayes' theorem

Bayes' theorem is often embellished by noting that $P(B) = P(A\cap B) + P(A^C\cap B) = P(B|A) P(A) + P(B|A^C) P(A^C)\,$

where AC is the complementary event of A (often called "not A"). So the theorem can be restated as $P(A|B) = \frac{P(B | A)\, P(A)}{P(B|A) P(A) + P(B|A^C) P(A^C)}. \!$

More generally, where {Ai} forms a partition of the event space, $P(A_i|B) = \frac{P(B | A_i)\, P(A_i)}{\sum_j P(B|A_j)\,P(A_j)} , \!$

for any Ai in the partition.

### Bayes' theorem in terms of odds and likelihood ratio

Bayes' theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as $O(A|B)=O(A) \cdot \Lambda (A|B)$

where $O(A|B)=\frac{P(A|B)}{P(A^C|B)} \!$ are the odds of A given B,

and $O(A)=\frac{P(A)}{P(A^C)} \!$ are the odds of A by itself,

while $\Lambda (A|B) = \frac{L(A|B)}{L(A^C|B)} = \frac{P(B|A)}{P(B|A^C)} \!$ is the likelihood ratio.

### Bayes' theorem for probability densities

There is also a version of Bayes' theorem for continuous distributions. It is somewhat harder to derive, since probability densities, strictly speaking, are not probabilities, so Bayes' theorem has to be established by a limit process; see Papoulis (citation below), Section 7.3 for an elementary derivation. Bayes's theorem for probability densities is formally similar to the theorem for probabilities: $f(x|y) = \frac{f(x,y)}{f(y)} = \frac{f(y|x)\,f(x)}{f(y)} \!$

and there is an analogous statement of the law of total probability: $f(x|y) = \frac{f(y|x)\,f(x)}{\int_{-\infty}^{\infty} f(y|x)\,f(x)\,dx}. \!$

As in the discrete case, the terms have standard names. f(x, y) is the joint distribution of X and Y, f(x|y) is the posterior distribution of X given Y=y, f(y|x) = L(x|y) is (as a function of x) the likelihood function of X given Y=y, and f(x) and f(y) are the marginal distributions of X and Y respectively, with f(x) being the prior distribution of X.

Here we have indulged in a conventional abuse of notation, using f for each one of these terms, although each one is really a different function; the functions are distinguished by the names of their arguments.

### Abstract Bayes' theorem

Given two absolutely continuous probability measures P˜Q on the probability space $(\Omega, \mathcal{F})$ and a sigma-algebra $\mathcal{G} \subset \mathcal{F}$, the abstract Bayes theorem for a $\mathcal{F}$-measurable random variable X becomes $E_P[X|\mathcal{G}] = \frac{E_Q[\frac{dP}{dQ} X |\mathcal{G}]}{E_Q[\frac{dP}{dQ}|\mathcal{G}]}$.

This formulation is used in Kalman filtering to find Zakai equations. It is also used in financial mathematics for change of numeraire techniques.

### Extensions of Bayes' theorem

Theorems analogous to Bayes' theorem hold in problems with more than two variables. For example: $P(A|B,C) = \frac{P(A) \, P(B|A) \, P(C|A,B)}{P(B) \, P(C|B)}$

This can be derived in several steps from Bayes' theorem and the definition of conditional probability: $P(A|B,C) = \frac{P(A,B,C)}{P(B,C)} = \frac{P(A,B,C)}{P(B) \, P(C|B)} =$ $= \frac{P(C|A,B) \, P(A,B)}{P(B) \, P(C|B)} = \frac{P(A) \, P(B|A) \, P(C|A,B)}{P(B) \, P(C|B)} .$

A general strategy is to work with a decomposition of the joint probability, and to marginalize (integrate) over the variables that are not of interest. Depending on the form of the decomposition, it may be possible to prove that some integrals must be 1, and thus they fall out of the decomposition; exploiting this property can reduce the computations very substantially. A Bayesian network, for example, specifies a factorization of a joint distribution of several variables in which the conditional probability of any one variable given the remaining ones takes a particularly simple form (see Markov blanket).

## Stochastic Hoody-hoo

### Your eyelids are sooooo heavy.......

Blah blah blah

Blah blah

Blah

blah

blah

blah

blah

blah

blah

blah

Zzzzz.............