Autoencoders Unsupervised Learning
Generative AI Latent Space

Autoencoders: Learning Efficient Representations

Autoencoders are unsupervised neural networks that learn to compress input data into a latent space and reconstruct it. From dimensionality reduction and anomaly detection to generative modeling (VAEs) — complete mathematical and practical reference.

Encoder

Input → Latent

Decoder

Latent → Output

Bottleneck

Compressed representation

Reconstruction

Loss = ||x - x̂||²

What is an Autoencoder?

An autoencoder is a neural network trained to copy its input to its output. Internally, it compresses the input into a latent-space representation (bottleneck), then reconstructs the output from this representation. This forces the model to learn the most salient features of the data distribution.

Input (x) Encoder z (bottleneck) Decoder Reconstruction (x̂)

Loss = ℒ(x, x̂) = MSE or Binary Cross-Entropy

Undercomplete Autoencoders

Bottleneck Dimension & Compression

dim(latent) < dim(input)

The network is forced to learn a compressed representation. Trained with reconstruction loss only (no regularization).

PCA analogy Non-linear manifold learning

Risk of Overcomplete

If dim(latent) ≥ dim(input), the network may learn identity function (copying). Regularization required.

Undercomplete → meaningful compression; Overcomplete → needs constraints

Undercomplete Autoencoder (PyTorch)
import torch.nn as nn

class UndercompleteAE(nn.Module):
    def __init__(self, input_dim=784, latent_dim=32):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, latent_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, input_dim),
            nn.Sigmoid()  # for pixel values [0,1]
        )
    
    def forward(self, x):
        z = self.encoder(x)
        x_hat = self.decoder(z)
        return x_hat, z

Regularized Autoencoders

Sparse Autoencoder

ℒ = ||x - x̂||² + λ · Ω(z)

Penalize activations of hidden units (L1 regularization, KL divergence). Encourages specialized features.

Ω(z) = Σ|z| or KL(ρ||ρ̂)

Denoising Autoencoder (DAE)

Corrupt input with noise (e.g., Gaussian, dropout), reconstruct clean original.

Learns robust features, removes noise. x̃ = x + ε, minimize ||x - d(e(x̃))||²

Contractive Autoencoder (CAE)

ℒ = ||x - x̂||² + λ ||J_f(x)||²_F

Penalizes Frobenius norm of encoder's Jacobian. Encourages invariance to small input changes.

Practical Insight

Denoising + Sparse often combined. Contractive is computationally expensive; denoising is simpler and effective.

Denoising Autoencoder – adding noise
def add_noise(x, noise_factor=0.3):
    noise = torch.randn_like(x) * noise_factor
    x_noisy = x + noise
    return torch.clamp(x_noisy, 0., 1.)

# Training loop
for x_batch in dataloader:
    x_noisy = add_noise(x_batch)
    x_hat, _ = model(x_noisy)
    loss = nn.functional.binary_cross_entropy(x_hat, x_batch)

Variational Autoencoders (VAE)

VAEs are generative models that learn a probabilistic latent space. Instead of encoding a point, encoder outputs parameters of a Gaussian distribution (μ, σ). Decoder samples from this distribution to generate data.

VAE Loss: Reconstruction + KL Divergence

ℒ = 𝔼[log p(x|z)] - β · D_KL(q(z|x) || p(z))

Reconstruction: make x̂ similar to x.
KL divergence: regularize latent distribution towards prior (usually N(0,1)).

Reparameterization Trick

z = μ + σ ⊙ ε, ε ~ N(0, I). Enables backpropagation through sampling.

μ (mean), log-var (σ²) from encoder; sample, then decode.

VAE Encoder with Reparameterization (PyTorch)
class VAE(nn.Module):
    def __init__(self, input_dim=784, latent_dim=20):
        super().__init__()
        # Encoder -> μ and logvar
        self.fc1 = nn.Linear(input_dim, 400)
        self.fc21 = nn.Linear(400, latent_dim)  # μ
        self.fc22 = nn.Linear(400, latent_dim)  # logvar
        # Decoder
        self.fc3 = nn.Linear(latent_dim, 400)
        self.fc4 = nn.Linear(400, input_dim)
    
    def encode(self, x):
        h = torch.relu(self.fc1(x))
        return self.fc21(h), self.fc22(h)
    
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def decode(self, z):
        h = torch.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h))
    
    def forward(self, x):
        mu, logvar = self.encode(x.view(-1, 784))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

# VAE loss function
def vae_loss(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x.view(-1, 784), reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD
β-VAE: Increase β (>1) for stronger latent regularization, leading to more disentangled representations.

Modern & Advanced Autoencoders

Convolutional Autoencoders

Use Conv2D, TransposeConv2D for images. Essential for vision tasks.

VQ-VAE (Vector Quantized)

Discrete latent space. Used in high-fidelity generation (VQ-VAE-2, WaveNet).

Adversarial Autoencoders

GAN-based regularization to match prior distribution.

Variational Autoencoder (β-VAE)

Disentanglement: factorized latent representations.

Ladder VAE

Hierarchical latent variables.

Diffusion Autoencoders

Hybrid of diffusion and autoencoding.

Real-World Applications

Dimensionality Reduction

Non-linear PCA. Visualize high-dim data (t-SNE alternative). Pretraining for supervised tasks.

Anomaly Detection

Train on normal data; anomalies have high reconstruction error. Used in fraud, industrial inspection.

Image Denoising

DAEs remove noise from photographs, medical scans.

Inpainting

Fill missing regions in images.

Molecule Generation

VAEs generate novel molecular structures.

Feature Disentanglement

Separate content/style in images (β-VAE).

Anomaly detection pipeline: Train AE on normal samples → reconstruction error threshold → flag outliers.

Framework Implementations

TensorFlow / Keras
import tensorflow as tf

# Convolutional autoencoder
class ConvAE(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.encoder = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(28,28,1)),
            tf.keras.layers.Conv2D(32, 3, strides=2, padding='same', activation='relu'),
            tf.keras.layers.Conv2D(64, 3, strides=2, padding='same', activation='relu'),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(16)  # latent
        ])
        self.decoder = tf.keras.Sequential([
            tf.keras.layers.Dense(7*7*64, activation='relu'),
            tf.keras.layers.Reshape((7,7,64)),
            tf.keras.layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu'),
            tf.keras.layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu'),
            tf.keras.layers.Conv2D(1, 3, padding='same', activation='sigmoid')
        ])
    
    def call(self, x):
        z = self.encoder(x)
        return self.decoder(z)

model = ConvAE()
model.compile(optimizer='adam', loss='mse')
PyTorch (Convolutional VAE)
class ConvVAE(nn.Module):
    def __init__(self, latent_dim=20):
        super().__init__()
        # Encoder: Conv2d -> μ, logvar
        self.enc_conv = nn.Sequential(
            nn.Conv2d(1, 32, 4, stride=2, padding=1), nn.ReLU(),
            nn.Conv2d(32, 64, 4, stride=2, padding=1), nn.ReLU(),
            nn.Conv2d(64, 128, 4, stride=2, padding=1), nn.ReLU(),
            nn.Flatten()
        )
        self.fc_mu = nn.Linear(128*4*4, latent_dim)
        self.fc_logvar = nn.Linear(128*4*4, latent_dim)
        
        # Decoder
        self.dec_fc = nn.Linear(latent_dim, 128*4*4)
        self.dec_conv = nn.Sequential(
            nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), nn.ReLU(),
            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1), nn.ReLU(),
            nn.ConvTranspose2d(32, 1, 4, stride=2, padding=1), nn.Sigmoid()
        )
    # ... reparameterization, forward, loss

Latent Space Arithmetic & Interpolation

Autoencoders learn meaningful latent spaces. VAEs produce smooth, continuous manifolds.

🧮 Latent vector arithmetic: z = z_smiling - z_neutral + z_male → generates smiling male (word2vec style).
✨ Interpolation: z = (1-α)·z₁ + α·z₂. Decode to see smooth morphing between images.
Latent space interpolation (NumPy-style)
def interpolate(model, x1, x2, steps=10):
    """x1, x2: input images, model: VAE"""
    model.eval()
    with torch.no_grad():
        z1 = model.encode(x1)[0]  # mu
        z2 = model.encode(x2)[0]
        alphas = torch.linspace(0, 1, steps)
        interpolates = []
        for alpha in alphas:
            z = (1 - alpha) * z1 + alpha * z2
            x_hat = model.decode(z)
            interpolates.append(x_hat)
    return interpolates

Autoencoder Variants – Cheatsheet

Undercomplete Compression
Sparse Feature selection
Denoising Robustness
Contractive Local invariance
VAE Generative
β-VAE Disentanglement
VQ-VAE Discrete latent
AAE Adversarial

Autoencoder Variants Comparison

Variant Latent Space Loss Terms Primary Use
UndercompleteContinuous, deterministicReconstructionDimensionality reduction
Sparse AEContinuous + sparsity penaltyReconstruction + L1/KLInterpretable features
Denoising AEContinuous, robustReconstruction (corrupted input)Noise removal, pretraining
Contractive AEContinuous + Jacobian normReconstruction + ||J_f||²Invariant representations
VAEProbabilistic (Gaussian)Reconstruction + KLGeneration, interpolation
VQ-VAEDiscrete codebookReconstruction + commitmentHigh-fidelity generation

Autoencoder Pitfalls & Debugging

⚠️ Overcomplete AE copies input: Use regularization (sparse, denoising, dropout) or smaller bottleneck.
⚠️ VAE posterior collapse: Decoder ignores latent code. Use KL annealing, free bits, or β < 1.
✅ Blurry VAE outputs: Use perceptual loss, adversarial training (VQ-VAE), or hierarchical VAEs.
✅ Monitor reconstruction error: Should decrease. Compare train/test error for anomaly detection threshold.