Graph Neural Networks — learning on non-Euclidean data
Graphs are everywhere: social networks, molecules, knowledge bases, recommendation systems. Unlike images or text, graphs have irregular structure—no fixed grid. GNNs break this barrier by treating the graph itself as the model. Nodes send messages to neighbors, aggregate information, and learn representations that respect the underlying network topology. Master message passing, aggregation, and graph convolutions to unlock predictions on any relational data.
What are graphs?
A graph is a mathematical structure consisting of nodes (vertices) and edges (links). Graphs naturally represent relational data where entities are nodes and relationships are edges. Unlike sequences (text) or grids (images), graphs have irregular, non-Euclidean structure.
A graph G = (V, E) where V is the set of nodes and E is the set of edges. Edges can be:
Undirected
Edges have no direction. A–B is the same as B–A. Examples: friendships, molecular bonds, protein interactions.
Directed
Edges have direction. A→B is different from B→A. Examples: citations, followers, transaction flows.
Why graphs matter for ML
Graphs encode relational structure that sequential or grid-based models can't exploit. A social graph tells you who influences whom. A molecular graph encodes atomic bonds and chemical structure. GNNs leverage this structure to make better predictions.
Quick check
What is the key difference between undirected and directed graphs?
Graph structure and representation
The most efficient way to represent a graph for computation is the adjacency matrix. For a graph with N nodes, the adjacency matrix A is N × N where A[i,j] represents an edge between nodes i and j.
Graph Types & Applications
Graphs appear everywhere in the real world. Click on each type to see how nodes and edges represent different entities and relationships.
Social Network
People as nodes, friendships as edges. Models relationships, influence propagation, community detection.
Structure
Nodes: 5
Edges: 6
Avg degree: 2.4
Real-world examples
Facebook, Twitter, LinkedIn
GNN tasks
- Link prediction
- Community detection
- Node classification
Nodes often have features (attributes). For a social network, a node might have features like profile embeddings, follower count, or interests. These are stored in an N × F matrix where N is the number of nodes and F is the feature dimension.
The adjacency matrix A tells us the graph structure. The feature matrix H tells us what each node "looks like." Together, A and H fully specify a graph instance, and GNNs learn to combine them.
Node features can come from various sources: embeddings, one-hot encodings, raw attributes, or even learned representations from a previous layer.
Quick check
In the adjacency matrix A, what does A[2,5] represent?
Message passing and aggregation
The core idea of GNNs is message passing: nodes send information to neighbors, who aggregate the messages. This is how information propagates through the graph without requiring a global computation.
Message Passing
Nodes exchange information with neighbors. Features aggregate and update based on the graph structure.
Node Features
Aggregated Messages
Normalized Features
How it works: Each node sends its features to neighbors. Neighbors aggregate these messages, then normalize by degree. This spreads information through the graph while preserving local structure.
When you sum neighbor features, nodes with many neighbors get a much larger aggregated value than nodes with few neighbors. This causes training instability and prevents networks from learning effectively.
Degree normalization solves this by dividing by the number of neighbors (or technically, by the square root of the degree). This keeps feature magnitudes consistent regardless of how connected a node is.
Formally: normalized = aggregated / degree. This ensures that a node with 100 neighbors doesn't drown out a node with 5 neighbors just by having larger accumulated values.
Quick check
What happens if you aggregate neighbor features without degree normalization?
Graph convolution
A Graph Convolutional Network (GCN) layer combines message passing with learned transformations. Instead of just averaging neighbors, GCN passes aggregated features through a learned weight matrix, then applies a non-linearity.
Graph Convolution Layers
Watch how node features transform as they pass through GCN layers. Each layer combines graph structure with learned transformations.
Feature Vectors
| A: | 1.00 | 0.00 | 0.00 |
| B: | 0.00 | 1.00 | 0.00 |
| C: | 0.00 | 0.00 | 1.00 |
| D: | 0.50 | 0.50 | 0.00 |
Dimensions: 3
How it works
1. Input: Each node starts with initial features (one-hot or continuous).
2. Propagation: Features spread along edges. Neighbors aggregate information.
3. Transformation: Aggregated features pass through learned weight matrices.
4. Activation: ReLU introduces non-linearity, learning complex patterns.
The GCN formula
h' = σ(Ã @ h @ W + b)
where à is the normalized adjacency matrix (D-1/2 A D-1/2), h is the input feature matrix, W is the weight matrix, b is the bias, and σ is an activation function like ReLU.
Each layer propagates information one hop further and learns new features. Stack multiple GCN layers to capture multi-hop neighborhoods and learn increasingly abstract representations.
The raw adjacency matrix A doesn't account for node degree. Multiplying by A directly causes high-degree nodes to send large signals to neighbors.
The normalized adjacency matrix à = D-1/2 A D-1/2 corrects this. D is the degree matrix (diagonal matrix with degrees on the diagonal). The D-1/2 factors normalize both the sending and receiving sides, making information flow symmetric and stable.
This normalization is inspired by spectral graph theory and ensures that GCNs have nice mathematical properties like bounded eigenvalues, which helps with training stability.
Quick check
What is the purpose of the weight matrix W in a GCN layer?
Applications and tasks
GNNs solve different types of problems depending on what you want to predict: node-level, edge-level, or graph-level tasks.
Node Classification
Predict labels for nodes. Example: classify users in a social network.
Output: one label per node
Link Prediction
Predict missing edges or future connections. Example: recommend friends.
Output: score for each possible edge
Graph Classification
Classify entire graphs. Example: predict if a molecule is toxic.
Output: one label per graph
Graph Regression
Predict continuous values for graphs. Example: molecular energy.
Output: continuous value
Real-world domains
- Social networks: Link prediction, community detection, influence prediction
- Chemistry: Molecular property prediction, drug discovery, reaction prediction
- Knowledge graphs: Entity classification, relation prediction, reasoning
- Recommendations: Item recommendations, ranking, click prediction
- Biology: Protein interaction, gene networks, disease propagation
After GCN layers, you have a new feature vector for each node. For graph-level tasks like classification, you need to combine all node features into one graph representation. This is called graph pooling.
Common pooling strategies: mean (average all node features), sum (add all node features), or max (take max of each dimension across nodes). You can also learn which nodes are important (attention-based pooling).
Once you have the graph representation, pass it through dense layers to produce the final prediction (classification or regression). This is similar to how CNNs use global average pooling to go from convolutional feature maps to a class score.
Quick check
Which task would you use GNNs for if you wanted to predict whether a drug molecule is toxic?
Guided Walkthrough
A 6-step introduction from basic graph concepts to real-world GNN applications.
What are Graphs?
Graphs are networks: nodes (vertices) connected by edges (relationships).
- •A graph G = (V, E) where V is a set of nodes and E is a set of edges
- •Directed or undirected edges represent different types of relationships
- •Weighted edges encode relationship strength or distance
- •Graphs naturally represent relational data in many domains
Interactive Playground
Build a graph, initialize features, and run GCN layers. Watch as features converge and nodes with similar neighborhoods become more similar.
Graph Structure
Rounds completed: 0
Avg. feature similarity: 0.236
Node Features
Edges (4)
Add edge
How to use
- Click "Run GCN Round" to propagate and transform features
- Add/remove edges to change graph topology
- Similarity increases as features converge
- Try different weight initializations
Final checkpoint
Quick check
Why is message passing more efficient than computing global attention over all nodes?
Quick check
What is the main difference between a GCN layer and simple neighborhood averaging?
Quick check
When stacking multiple GCN layers, what does increasing depth accomplish?
Key takeaways
- Graphs are everywhere: Social networks, molecules, knowledge bases, and more. They encode relational structure that standard ML models can't exploit.
- Message passing propagates information: Nodes send features to neighbors, who aggregate and update their own representations. This is local, efficient, and respects graph structure.
- GCN layers learn transformations: Aggregated neighbor features pass through learned weights and activation functions, enabling the network to discover task-specific patterns.
- Normalization matters: Degree normalization ensures stable training by keeping feature scales consistent across nodes with different connectivity.
- Multiple tasks are possible: Node classification, link prediction, graph classification, and regression all follow the same message-passing principle.