Knowledge Distillation
Distillation transfers behavior from a larger teacher model to a smaller student model through soft targets, traces, or generated data.
Mental model
Teach a compact model using the teacher’s shape of uncertainty.
Distillation makes models cheaper, faster, private enough for edge devices, and easier to deploy at scale.
Accuracy retention
balanced68% modeled signal
Latency win
balanced54% modeled signal
Failure risk
balanced60% modeled signal
Concept pipeline
Build the idea in four moves
Interactive lab
Distill a large assistant into a smaller specialist.
Teacher
Pick a stronger model or ensemble.
Focus lens
The part that usually clicks late
Soft labels
Probability distributions contain more information than hard answers.
Accuracy retention
68
Latency win
54
Failure risk
60
Knowledge check
Why use soft labels in distillation?
Next horizon