Interactive~18 minIntermediate

Graph Neural Networks — learning on non-Euclidean data

Graphs are everywhere: social networks, molecules, knowledge bases, recommendation systems. Unlike images or text, graphs have irregular structure—no fixed grid. GNNs break this barrier by treating the graph itself as the model. Nodes send messages to neighbors, aggregate information, and learn representations that respect the underlying network topology. Master message passing, aggregation, and graph convolutions to unlock predictions on any relational data.

What are graphs?

A graph is a mathematical structure consisting of nodes (vertices) and edges (links). Graphs naturally represent relational data where entities are nodes and relationships are edges. Unlike sequences (text) or grids (images), graphs have irregular, non-Euclidean structure.

A graph G = (V, E) where V is the set of nodes and E is the set of edges. Edges can be:

Undirected

Edges have no direction. A–B is the same as B–A. Examples: friendships, molecular bonds, protein interactions.

Directed

Edges have direction. A→B is different from B→A. Examples: citations, followers, transaction flows.

Why graphs matter for ML

Graphs encode relational structure that sequential or grid-based models can't exploit. A social graph tells you who influences whom. A molecular graph encodes atomic bonds and chemical structure. GNNs leverage this structure to make better predictions.

?

Quick check

What is the key difference between undirected and directed graphs?

Graph structure and representation

The most efficient way to represent a graph for computation is the adjacency matrix. For a graph with N nodes, the adjacency matrix A is N × N where A[i,j] represents an edge between nodes i and j.

Graph Types & Applications

Graphs appear everywhere in the real world. Click on each type to see how nodes and edges represent different entities and relationships.

AliceBobCharlieDianaEve

Social Network

People as nodes, friendships as edges. Models relationships, influence propagation, community detection.

Structure

Nodes: 5

Edges: 6

Avg degree: 2.4

Real-world examples

Facebook, Twitter, LinkedIn

GNN tasks
  • Link prediction
  • Community detection
  • Node classification

Nodes often have features (attributes). For a social network, a node might have features like profile embeddings, follower count, or interests. These are stored in an N × F matrix where N is the number of nodes and F is the feature dimension.

The adjacency matrix A tells us the graph structure. The feature matrix H tells us what each node "looks like." Together, A and H fully specify a graph instance, and GNNs learn to combine them.

Node features can come from various sources: embeddings, one-hot encodings, raw attributes, or even learned representations from a previous layer.

?

Quick check

In the adjacency matrix A, what does A[2,5] represent?

Message passing and aggregation

The core idea of GNNs is message passing: nodes send information to neighbors, who aggregate the messages. This is how information propagates through the graph without requiring a global computation.

Message Passing

Nodes exchange information with neighbors. Features aggregate and update based on the graph structure.

A[1.00, 0.00]B[0.00, 1.00]C[1.00, 1.00]D[0.50, 0.50]E[1.00, 0.50]

Node Features

A: [1.000, 0.000]
B: [0.000, 1.000]
C: [1.000, 1.000]
D: [0.500, 0.500]
E: [1.000, 0.500]

Aggregated Messages

A: [1.500, 1.500]
B: [2.000, 2.000]
C: [1.500, 2.500]
D: [3.500, 2.000]
E: [1.500, 1.000]

Normalized Features

A: [0.500, 0.500]
B: [0.667, 0.667]
C: [0.500, 0.833]
D: [0.875, 0.500]
E: [0.750, 0.500]

How it works: Each node sends its features to neighbors. Neighbors aggregate these messages, then normalize by degree. This spreads information through the graph while preserving local structure.

When you sum neighbor features, nodes with many neighbors get a much larger aggregated value than nodes with few neighbors. This causes training instability and prevents networks from learning effectively.

Degree normalization solves this by dividing by the number of neighbors (or technically, by the square root of the degree). This keeps feature magnitudes consistent regardless of how connected a node is.

Formally: normalized = aggregated / degree. This ensures that a node with 100 neighbors doesn't drown out a node with 5 neighbors just by having larger accumulated values.

?

Quick check

What happens if you aggregate neighbor features without degree normalization?

Graph convolution

A Graph Convolutional Network (GCN) layer combines message passing with learned transformations. Instead of just averaging neighbors, GCN passes aggregated features through a learned weight matrix, then applies a non-linearity.

Graph Convolution Layers

Watch how node features transform as they pass through GCN layers. Each layer combines graph structure with learned transformations.

ABCD

Feature Vectors

A:1.000.000.00
B:0.001.000.00
C:0.000.001.00
D:0.500.500.00

Dimensions: 3

How it works

1. Input: Each node starts with initial features (one-hot or continuous).

2. Propagation: Features spread along edges. Neighbors aggregate information.

3. Transformation: Aggregated features pass through learned weight matrices.

4. Activation: ReLU introduces non-linearity, learning complex patterns.

The GCN formula

h' = σ(Ã @ h @ W + b)

where à is the normalized adjacency matrix (D-1/2 A D-1/2), h is the input feature matrix, W is the weight matrix, b is the bias, and σ is an activation function like ReLU.

Each layer propagates information one hop further and learns new features. Stack multiple GCN layers to capture multi-hop neighborhoods and learn increasingly abstract representations.

The raw adjacency matrix A doesn't account for node degree. Multiplying by A directly causes high-degree nodes to send large signals to neighbors.

The normalized adjacency matrix à = D-1/2 A D-1/2 corrects this. D is the degree matrix (diagonal matrix with degrees on the diagonal). The D-1/2 factors normalize both the sending and receiving sides, making information flow symmetric and stable.

This normalization is inspired by spectral graph theory and ensures that GCNs have nice mathematical properties like bounded eigenvalues, which helps with training stability.

?

Quick check

What is the purpose of the weight matrix W in a GCN layer?

Applications and tasks

GNNs solve different types of problems depending on what you want to predict: node-level, edge-level, or graph-level tasks.

Node Classification

Predict labels for nodes. Example: classify users in a social network.

Output: one label per node

Link Prediction

Predict missing edges or future connections. Example: recommend friends.

Output: score for each possible edge

Graph Classification

Classify entire graphs. Example: predict if a molecule is toxic.

Output: one label per graph

Graph Regression

Predict continuous values for graphs. Example: molecular energy.

Output: continuous value

Real-world domains

  • Social networks: Link prediction, community detection, influence prediction
  • Chemistry: Molecular property prediction, drug discovery, reaction prediction
  • Knowledge graphs: Entity classification, relation prediction, reasoning
  • Recommendations: Item recommendations, ranking, click prediction
  • Biology: Protein interaction, gene networks, disease propagation

After GCN layers, you have a new feature vector for each node. For graph-level tasks like classification, you need to combine all node features into one graph representation. This is called graph pooling.

Common pooling strategies: mean (average all node features), sum (add all node features), or max (take max of each dimension across nodes). You can also learn which nodes are important (attention-based pooling).

Once you have the graph representation, pass it through dense layers to produce the final prediction (classification or regression). This is similar to how CNNs use global average pooling to go from convolutional feature maps to a class score.

?

Quick check

Which task would you use GNNs for if you wanted to predict whether a drug molecule is toxic?

Guided Walkthrough

A 6-step introduction from basic graph concepts to real-world GNN applications.

What are Graphs?

Graphs are networks: nodes (vertices) connected by edges (relationships).

  • A graph G = (V, E) where V is a set of nodes and E is a set of edges
  • Directed or undirected edges represent different types of relationships
  • Weighted edges encode relationship strength or distance
  • Graphs naturally represent relational data in many domains
A B C
Step 1 of 6

Interactive Playground

Build a graph, initialize features, and run GCN layers. Watch as features converge and nodes with similar neighborhoods become more similar.

Graph Structure

ABCD

Rounds completed: 0

Avg. feature similarity: 0.236

Node Features
A:[1.000, 0.000, 0.000]
B:[0.000, 1.000, 0.000]
C:[0.000, 0.000, 1.000]
D:[0.500, 0.500, 0.000]
Edges (4)
AB
BC
CD
DA

Add edge

How to use
  • Click "Run GCN Round" to propagate and transform features
  • Add/remove edges to change graph topology
  • Similarity increases as features converge
  • Try different weight initializations

Final checkpoint

?

Quick check

Why is message passing more efficient than computing global attention over all nodes?

?

Quick check

What is the main difference between a GCN layer and simple neighborhood averaging?

?

Quick check

When stacking multiple GCN layers, what does increasing depth accomplish?

Key takeaways

  • Graphs are everywhere: Social networks, molecules, knowledge bases, and more. They encode relational structure that standard ML models can't exploit.
  • Message passing propagates information: Nodes send features to neighbors, who aggregate and update their own representations. This is local, efficient, and respects graph structure.
  • GCN layers learn transformations: Aggregated neighbor features pass through learned weights and activation functions, enabling the network to discover task-specific patterns.
  • Normalization matters: Degree normalization ensures stable training by keeping feature scales consistent across nodes with different connectivity.
  • Multiple tasks are possible: Node classification, link prediction, graph classification, and regression all follow the same message-passing principle.

Finished this lesson?

Mark it as complete to track your progress and get a certificate.