Autoencoders: Learning Efficient Representations
Autoencoders are unsupervised neural networks that learn to compress input data into a latent space and reconstruct it. From dimensionality reduction and anomaly detection to generative modeling (VAEs) — complete mathematical and practical reference.
Encoder
Input → Latent
Decoder
Latent → Output
Bottleneck
Compressed representation
Reconstruction
Loss = ||x - x̂||²
What is an Autoencoder?
An autoencoder is a neural network trained to copy its input to its output. Internally, it compresses the input into a latent-space representation (bottleneck), then reconstructs the output from this representation. This forces the model to learn the most salient features of the data distribution.
Loss = ℒ(x, x̂) = MSE or Binary Cross-Entropy
Undercomplete Autoencoders
Bottleneck Dimension & Compression
dim(latent) < dim(input)
The network is forced to learn a compressed representation. Trained with reconstruction loss only (no regularization).
PCA analogy Non-linear manifold learning
Risk of Overcomplete
If dim(latent) ≥ dim(input), the network may learn identity function (copying). Regularization required.
Undercomplete → meaningful compression; Overcomplete → needs constraints
import torch.nn as nn
class UndercompleteAE(nn.Module):
def __init__(self, input_dim=784, latent_dim=32):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, latent_dim)
)
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 128),
nn.ReLU(),
nn.Linear(128, 256),
nn.ReLU(),
nn.Linear(256, input_dim),
nn.Sigmoid() # for pixel values [0,1]
)
def forward(self, x):
z = self.encoder(x)
x_hat = self.decoder(z)
return x_hat, z
Regularized Autoencoders
Sparse Autoencoder
ℒ = ||x - x̂||² + λ · Ω(z)
Penalize activations of hidden units (L1 regularization, KL divergence). Encourages specialized features.
Ω(z) = Σ|z| or KL(ρ||ρ̂)
Denoising Autoencoder (DAE)
Corrupt input with noise (e.g., Gaussian, dropout), reconstruct clean original.
Learns robust features, removes noise. x̃ = x + ε, minimize ||x - d(e(x̃))||²
Contractive Autoencoder (CAE)
ℒ = ||x - x̂||² + λ ||J_f(x)||²_F
Penalizes Frobenius norm of encoder's Jacobian. Encourages invariance to small input changes.
Practical Insight
Denoising + Sparse often combined. Contractive is computationally expensive; denoising is simpler and effective.
def add_noise(x, noise_factor=0.3):
noise = torch.randn_like(x) * noise_factor
x_noisy = x + noise
return torch.clamp(x_noisy, 0., 1.)
# Training loop
for x_batch in dataloader:
x_noisy = add_noise(x_batch)
x_hat, _ = model(x_noisy)
loss = nn.functional.binary_cross_entropy(x_hat, x_batch)
Variational Autoencoders (VAE)
VAEs are generative models that learn a probabilistic latent space. Instead of encoding a point, encoder outputs parameters of a Gaussian distribution (μ, σ). Decoder samples from this distribution to generate data.
VAE Loss: Reconstruction + KL Divergence
ℒ = 𝔼[log p(x|z)] - β · D_KL(q(z|x) || p(z))
Reconstruction: make x̂ similar to x.
KL divergence: regularize latent distribution towards prior (usually N(0,1)).
Reparameterization Trick
z = μ + σ ⊙ ε, ε ~ N(0, I). Enables backpropagation through sampling.
μ (mean), log-var (σ²) from encoder; sample, then decode.
class VAE(nn.Module):
def __init__(self, input_dim=784, latent_dim=20):
super().__init__()
# Encoder -> μ and logvar
self.fc1 = nn.Linear(input_dim, 400)
self.fc21 = nn.Linear(400, latent_dim) # μ
self.fc22 = nn.Linear(400, latent_dim) # logvar
# Decoder
self.fc3 = nn.Linear(latent_dim, 400)
self.fc4 = nn.Linear(400, input_dim)
def encode(self, x):
h = torch.relu(self.fc1(x))
return self.fc21(h), self.fc22(h)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def decode(self, z):
h = torch.relu(self.fc3(z))
return torch.sigmoid(self.fc4(h))
def forward(self, x):
mu, logvar = self.encode(x.view(-1, 784))
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
# VAE loss function
def vae_loss(recon_x, x, mu, logvar):
BCE = nn.functional.binary_cross_entropy(recon_x, x.view(-1, 784), reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return BCE + KLD
Modern & Advanced Autoencoders
Convolutional Autoencoders
Use Conv2D, TransposeConv2D for images. Essential for vision tasks.
VQ-VAE (Vector Quantized)
Discrete latent space. Used in high-fidelity generation (VQ-VAE-2, WaveNet).
Adversarial Autoencoders
GAN-based regularization to match prior distribution.
Variational Autoencoder (β-VAE)
Disentanglement: factorized latent representations.
Ladder VAE
Hierarchical latent variables.
Diffusion Autoencoders
Hybrid of diffusion and autoencoding.
Real-World Applications
Dimensionality Reduction
Non-linear PCA. Visualize high-dim data (t-SNE alternative). Pretraining for supervised tasks.
Anomaly Detection
Train on normal data; anomalies have high reconstruction error. Used in fraud, industrial inspection.
Image Denoising
DAEs remove noise from photographs, medical scans.
Inpainting
Fill missing regions in images.
Molecule Generation
VAEs generate novel molecular structures.
Feature Disentanglement
Separate content/style in images (β-VAE).
Anomaly detection pipeline: Train AE on normal samples → reconstruction error threshold → flag outliers.
Framework Implementations
TensorFlow / Keras
import tensorflow as tf
# Convolutional autoencoder
class ConvAE(tf.keras.Model):
def __init__(self):
super().__init__()
self.encoder = tf.keras.Sequential([
tf.keras.layers.Input(shape=(28,28,1)),
tf.keras.layers.Conv2D(32, 3, strides=2, padding='same', activation='relu'),
tf.keras.layers.Conv2D(64, 3, strides=2, padding='same', activation='relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(16) # latent
])
self.decoder = tf.keras.Sequential([
tf.keras.layers.Dense(7*7*64, activation='relu'),
tf.keras.layers.Reshape((7,7,64)),
tf.keras.layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu'),
tf.keras.layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu'),
tf.keras.layers.Conv2D(1, 3, padding='same', activation='sigmoid')
])
def call(self, x):
z = self.encoder(x)
return self.decoder(z)
model = ConvAE()
model.compile(optimizer='adam', loss='mse')
PyTorch (Convolutional VAE)
class ConvVAE(nn.Module):
def __init__(self, latent_dim=20):
super().__init__()
# Encoder: Conv2d -> μ, logvar
self.enc_conv = nn.Sequential(
nn.Conv2d(1, 32, 4, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(32, 64, 4, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(64, 128, 4, stride=2, padding=1), nn.ReLU(),
nn.Flatten()
)
self.fc_mu = nn.Linear(128*4*4, latent_dim)
self.fc_logvar = nn.Linear(128*4*4, latent_dim)
# Decoder
self.dec_fc = nn.Linear(latent_dim, 128*4*4)
self.dec_conv = nn.Sequential(
nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), nn.ReLU(),
nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1), nn.ReLU(),
nn.ConvTranspose2d(32, 1, 4, stride=2, padding=1), nn.Sigmoid()
)
# ... reparameterization, forward, loss
Latent Space Arithmetic & Interpolation
Autoencoders learn meaningful latent spaces. VAEs produce smooth, continuous manifolds.
z = z_smiling - z_neutral + z_male → generates smiling male (word2vec style).
def interpolate(model, x1, x2, steps=10):
"""x1, x2: input images, model: VAE"""
model.eval()
with torch.no_grad():
z1 = model.encode(x1)[0] # mu
z2 = model.encode(x2)[0]
alphas = torch.linspace(0, 1, steps)
interpolates = []
for alpha in alphas:
z = (1 - alpha) * z1 + alpha * z2
x_hat = model.decode(z)
interpolates.append(x_hat)
return interpolates
Autoencoder Variants – Cheatsheet
Autoencoder Variants Comparison
| Variant | Latent Space | Loss Terms | Primary Use |
|---|---|---|---|
| Undercomplete | Continuous, deterministic | Reconstruction | Dimensionality reduction |
| Sparse AE | Continuous + sparsity penalty | Reconstruction + L1/KL | Interpretable features |
| Denoising AE | Continuous, robust | Reconstruction (corrupted input) | Noise removal, pretraining |
| Contractive AE | Continuous + Jacobian norm | Reconstruction + ||J_f||² | Invariant representations |
| VAE | Probabilistic (Gaussian) | Reconstruction + KL | Generation, interpolation |
| VQ-VAE | Discrete codebook | Reconstruction + commitment | High-fidelity generation |