Interactive lesson~15 minIntermediate

Knowledge Distillation

Distillation transfers behavior from a larger teacher model to a smaller student model through soft targets, traces, or generated data.

Teacher-studentSoft labels

Mental model

Teach a compact model using the teacher’s shape of uncertainty.

Distillation makes models cheaper, faster, private enough for edge devices, and easier to deploy at scale.

Accuracy retention

balanced

68% modeled signal

Latency win

balanced

54% modeled signal

Failure risk

balanced

60% modeled signal

Concept pipeline

Build the idea in four moves

Interactive lab

Distill a large assistant into a smaller specialist.

Teacher

Pick a stronger model or ensemble.

Teacher quality82

weakexpert

Student size42

tinylarge

Domain coverage64

narrowbroad

Focus lens

The part that usually clicks late

Soft labels

Probability distributions contain more information than hard answers.

Accuracy retention

Latency win

Failure risk

Knowledge check

Why use soft labels in distillation?

Next horizon

Where this topic is headed

Logit distillation

Reasoning trace distillation

Specialist small models

Back to all lessons

Knowledge Distillation

Build the idea in four moves

Teacher

Signals

Student

Validate

Distill a large assistant into a smaller specialist.

The part that usually clicks late

Why use soft labels in distillation?

Where this topic is headed

Finished this lesson?