Interactive~18 minBeginner

Bayesian Thinking — Updating beliefs with evidence

Bayesian reasoning is how rational agents update beliefs in the face of new data. From medical diagnosis to machine learning, it's the foundation of modern AI. This lesson teaches you the math and the intuition through interactive visualizations.

Start here: 6-step guided lesson

Before the math, let's build intuition with a concrete example: finding your friend at a café. You'll see how beliefs evolve as you gather evidence.

6-Step Guided Lesson: Finding Your Friend

Walk through Bayesian reasoning with a concrete example: your friend might be at the café. Watch your belief evolve as you gather evidence.

1/6

Step 1: Your Prior Belief

You believe there's a 20% chance a friend is at the coffee shop right now (your prior belief before any evidence).

20%

P(Friend at café)

Key insight:

The prior is your belief before seeing any new information. It comes from past experience or general knowledge.

Learning checkpoints:

Prior belief
Evidence (likelihoods)
Bayes theorem
Likelihood ratio
Prior strength
Iterative updates
?

Quick check

In the guided lesson, when you saw your friend's car, why didn't your belief go straight to 100%?

Bayes Theorem: The Formula

Bayes theorem is deceptively simple: P(H|D) = P(D|H) × P(H) / P(D). Let's break it down and build intuition with an interactive calculator.

The equation

P(H|D) = P(D|H) × P(H) / P(D)
P(H|D) — Posterior
Your updated belief: probability hypothesis is true given the data you observed
P(D|H) — Likelihood
How likely the data is if your hypothesis is true
P(H) — Prior
Your belief before seeing the data
P(D) — Evidence
Total probability of seeing this data (under all possibilities)

Intuition:

You take what you believed before (prior), multiply by how much the data supports it (likelihood), and normalize. The stronger the evidence, the more your belief shifts.

Interactive Bayes Theorem Calculator

Adjust prior belief and likelihoods, watch posterior probability update in real-time.

30.0%
Your initial belief: uncertain
80.0%
Test accuracy when hypothesis is true
10.0%
False positive rate
P(H)30.0%P(D|H)80.0%P(D|¬H)10.0%Bayes UpdateP(H|D)77.4%

Evidence P(D)

31.00%

Total prob. of observing data

Likelihood Ratio

8.00x

Evidence strength

Prior → Posterior

30.077.4%

Belief shift: 47.4%

Complement P(¬H|D)

22.6%

Belief in alternative

What this means:

Evidence leans toward your hypothesis, but uncertainty remains. More data would help.

P(D) is the total probability of observing the data under all possible scenarios:

P(D) = P(D|H) × P(H) + P(D|¬H) × P(¬H)

It acts as a normalizer, ensuring P(H|D) stays between 0 and 1. Without it, the right side could exceed 1. Think of it as: "What's the probability I'd see this data no matter what?"

This confuses many people: likelihood is not probability.

  • Probability: Given a hypothesis, what's the chance of the data? P(D|H)
  • Likelihood: Given the data, how "likely" is the hypothesis? Still P(D|H), but used in reverse

In Bayes theorem, we use the likelihood P(D|H) to evaluate the hypothesis. Even if P(D|H) = 0.8 for two different hypotheses, the one with the higher probability of the data is more "likely" to be true (all else equal).

?

Quick check

If you increase the likelihood P(D|H) while holding the prior and complement fixed, the posterior P(H|D) will:

Priors and Posteriors: The Beta Distribution

For probability data (coin flips, test results), the Beta distribution is the conjugate prior. Adjust the prior shape, add observed data, and watch the posterior shift.

Beta Distribution: Before & After Data

Adjust prior shape and observe data points. The posterior (red curve) shifts toward your observations.

0→1Density

Prior Belief

50.0%

α=2.0, β=2.0

Posterior Belief

62.5%

α=5.0, β=3.0

Data observed

31

75% success rate

Belief shift

12.5%

Data pulls up

Key insight:

The Beta distribution is the conjugate prior for Bernoulli data. Each success adds 1 to α, each failure adds 1 to β. The wider your initial prior (smaller α+β), the more the data can shift you. A concentrated prior (large α+β) needs more data to budge.

A conjugate prior is special: when you update it with new data, the posterior has the same mathematical form as the prior.

For Bernoulli data (success/failure), the Beta distribution is conjugate. Each observed success adds 1 to α, each failure adds 1 to β. The posterior is still a Beta distribution—you can keep updating forever without changing the form.

This makes Beta(α, β) perfect for modeling unknowns like coin bias, disease prevalence, or email spam probability.

A prior's strength is determined by α + β. A prior of Beta(10, 10) (α + β = 20) is much stronger than Beta(1, 1) (α + β = 2).

To overcome a strong prior, you need proportionally more data. This is why priors matter: they encode your initial confidence. A domain expert's strong prior (from years of experience) should dominate a small dataset, but weak priors bow to evidence quickly.

In machine learning, this is called regularization: a strong prior on model weights prevents overfitting to noisy data.

?

Quick check

If you observe 5 successes and 1 failure starting from Beta(1, 1), the posterior is Beta(6, 2). What's the posterior mean?

MCMC: Sampling from the Posterior

For complex models where you can't compute the posterior analytically, Markov Chain Monte Carlo (MCMC) generates samples from it. Watch a random walk converge to the target distribution.

MCMC Random Walk

Metropolis-Hastings samples from a target distribution. Blue trace shows random walk, red histogram shows resulting distribution.

Trace plot (random walk)

IterationsValue

Posterior distribution (after burn-in)

0→1Count

Acceptance rate

82.6%

Too wide (↓ proposal)

Posterior mean

61.5%

Std: 17.5%

95% credible interval

25.7%–90.8%

Range: 65.1%

Burn-in samples

50/500

450 samples used

How MCMC works:

The sampler walks randomly through parameter space, accepting moves that fit the data better and rejecting those that fit worse. Over time, it spends more time in regions of high probability. The burn-in period (shaded red) is discarded because the sampler hasn't converged yet. Acceptance rate ~40% is ideal—lower means proposals are too conservative, higher means too aggressive.

MH is elegant: make a random proposal, compute its log-probability, and accept/reject based on an acceptance ratio.

α = min(1, exp(log p(proposal) - log p(current)))

If the proposal is more probable, accept it. If less probable, accept it with probability α. This biased random walk spends more time in high-probability regions, effectively sampling from the posterior.

The burn-in period lets the sampler forget its starting point and find the mode of the distribution. After convergence, the trace plot should look like noise, and the histogram shows the true posterior shape.

Modern deep learning uses optimization (gradient descent) to find point estimates. But Bayesian methods want the full posterior distribution—the range of plausible parameters and their probabilities.

MCMC enables Bayesian inference on complex models. In probabilistic programming (like Stan or PyMC), you describe the model, and MCMC automatically samples the posterior. This gives you uncertainty quantification, which is crucial for scientific discovery and safe decision-making.

Uncertainty from a Bayesian model is different from point estimates: you get full distributions, credible intervals, and the ability to propagate uncertainty through downstream tasks.

?

Quick check

If your acceptance rate is 10% in MCMC, what should you do?

Real-world application: Medical testing

A positive test result doesn't mean you have the disease. Bayesian reasoning reveals why: you must account for the disease's rarity (prior) and the test's false positive rate.

Medical Test Scenario

How does a positive test result change your belief about having a disease? Adjust prevalence, sensitivity, and false positive rate to see how posterior probability shifts.

1.0%
P(Disease) = How common is the disease? (1 in 100)
95.0%
P(+|Disease) = Detects disease when present (not false negatives)
5.0%
P(+|No Disease) = False alarms (test positive but no disease)

Scenario setup:

You take a test. It comes back positive.

Before the test, you had a 1.0% chance of having the disease (generic population risk).

The test is 95.0% accurate at detecting the disease when you have it.

But it also gives false positives 5.0% of the time (tests positive when you don't have it).

Question: Given a positive result, what's the probability you actually have the disease?

P(Disease | Positive Test) =

16.1%

This is the actual probability you have the disease given your positive test result — your updated belief after seeing the data.

Before test

1.0%

Your prior belief (generic population risk)

After positive test

16.1%

Your posterior belief (updated by test)

Negative predictive value

99.9%

If test is negative, probability you're disease-free

Belief amplification

16x

How much the positive test increases your belief

Mixed signal

Even with a positive test, the disease is still more likely absent than present. A second test could clarify.

The base rate fallacy:

Many people ignore the prior probability (base rate). A test that's 95% accurate sounds amazing, but if the disease is rare (1 in 1000), a positive test is still more likely wrong than right!

Bayes' theorem accounts for this. The posterior probability depends equally on the test accuracyand how common the disease is. Always consider both when interpreting test results.

Many people ignore the prior when interpreting test results. A test that's 95% accurate sounds great, but if the disease is rare (1 in 1000), a positive test is still more likely wrong than right.

A doctor's intuition often fails here. Bayes theorem forces you to do the math. The base rate (prior) is as important as the test accuracy (likelihood). You can't ignore either.

Real example: COVID tests early in the pandemic, before widespread infection, had high false positive rates relative to cases. Many people tested positive but were actually negative. This is base rate fallacy in action.

Doctors learn Bayes theorem implicitly through experience. When a test comes back positive:

  1. How common is this disease in this patient's demographic? (prior)
  2. How accurate is this test? (likelihood)
  3. Should I retest, or order confirmatory tests? (iterative updates)

Modern medical practice uses Bayes theorem implicitly. Tools like LR+ (likelihood ratio positive) help clinicians intuitively understand how much a test result should shift their belief.

?

Quick check

If a disease is extremely rare (0.1% prevalence), how good must a test be to give a 50/50 posterior after a positive result?

Knowledge Checks & Deeper Concepts

Test your understanding of Bayesian thinking across scenarios and concepts.

?

Quick check

A meteorologist predicts rain tomorrow with 70% confidence. It rains. Does this mean the 70% prior was wrong?

?

Quick check

In machine learning, a strong prior on model weights corresponds to:

?

Quick check

Which statement best describes Bayesian inference?

?

Quick check

How does a strong prior affect the posterior if you have 1000 data points?

?

Quick check

What does a 95% credible interval from Bayesian inference mean?

Summary: Bayesian Thinking in 4 Ideas

1. Priors encode what you already know

Start with beliefs from past experience, domain knowledge, or assumptions. Weak priors let data dominate; strong priors require overwhelming evidence.

2. Likelihoods measure evidence strength

How well does your data support each hypothesis? P(data|hypothesis). Stronger likelihoods produce bigger belief shifts.

3. Bayes theorem combines them rationally

P(H|D) = P(D|H) × P(H) / P(D). This is the recipe for belief updating. It respects both prior and evidence.

4. Iteration refines estimates

Your posterior becomes the next prior. Each update narrows uncertainty. This is how science works—evidence accumulates.

Next steps:

  • Explore probabilistic programming: Try Pymc, Stan, or Jax to build Bayesian models on real data
  • Study variational inference: A faster alternative to MCMC for large-scale Bayesian models
  • Apply to AB testing: Use Bayesian methods to design experiments that stop early when winners are clear
  • Build Bayesian neural networks: Deep learning with uncertainty quantification

Finished this lesson?

Mark it as complete to track your progress and get a certificate.