Interactive lesson~15 minIntermediate

Knowledge Distillation

Distillation transfers behavior from a larger teacher model to a smaller student model through soft targets, traces, or generated data.

Teacher-studentSoft labels

Mental model

Teach a compact model using the teacher’s shape of uncertainty.

Distillation makes models cheaper, faster, private enough for edge devices, and easier to deploy at scale.

Accuracy retention

balanced

68% modeled signal

Latency win

balanced

54% modeled signal

Failure risk

balanced

60% modeled signal

Concept pipeline

Build the idea in four moves

Interactive lab

Distill a large assistant into a smaller specialist.

Teacher

Pick a stronger model or ensemble.

Focus lens

The part that usually clicks late

Soft labels

Probability distributions contain more information than hard answers.

Accuracy retention

68

Latency win

54

Failure risk

60

Knowledge check

Why use soft labels in distillation?

Next horizon

Where this topic is headed

Logit distillation
Reasoning trace distillation
Specialist small models
Back to all lessons

Finished this lesson?

Mark it as complete to track your progress and get a certificate.