Tensors, autograd, modules, data loaders, training loops, and GPU moves are here for research and product models. Follow the deep learning roadmap for theory and architecture context while PyTorch syntax stays on this cheatsheet.

Deep learning learning roadmap — PyTorch sits in the deep learning track alongside architecture study.

Installation & Setup

Installation

# Install PyTorch with CUDA (GPU support)
# Visit https://pytorch.org for latest commands
pip install torch torchvision torchaudio

# CPU-only version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# With specific CUDA version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install with conda
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Verification
import torch
import torchvision

# Check PyTorch version
print(torch.__version__)

# Check CUDA availability
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))

Basic Setup

# Common imports
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Set random seed for reproducibility
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)
Best Practice: Always set random seeds for reproducibility and explicitly specify device for tensors and models.

Tensors

Tensor Creation

# From Python lists
t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([[1, 2], [3, 4]])

# Special tensors
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)
eye = torch.eye(3)
rand = torch.rand(2, 3)
randn = torch.randn(2, 3)

# With specific data type
t_float32 = torch.tensor([1, 2, 3], dtype=torch.float32)
t_int64 = torch.tensor([1, 2, 3], dtype=torch.int64)

# On specific device
t_gpu = torch.tensor([1, 2, 3], device='cuda')
Common Data Types

torch.float32 - 32-bit floating point

torch.float64 - 64-bit floating point

torch.int32 - 32-bit integer

torch.int64 - 64-bit integer

torch.bool - Boolean

Tensor Operations

# Basic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Element-wise operations
add = a + b
sub = a - b
mul = a * b
div = a / b

# Matrix multiplication
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 4)
matmul = torch.matmul(mat1, mat2)

# Reduction operations
x = torch.tensor([[1, 2], [3, 4]])
sum_all = x.sum()
sum_dim0 = x.sum(dim=0)
mean_all = x.mean()
max_val = x.max()
# Reshaping and manipulation
x = torch.randn(2, 3, 4)

# Reshape
reshaped = x.reshape(6, 4)

# Transpose
transposed = x.transpose(0, 1)

# Squeeze and unsqueeze
squeezed = x.squeeze()
unsqueezed = x.unsqueeze(0)

# Concatenation
cat = torch.cat([x, x], dim=0)

Neural Networks

Model Definition

# Simple Neural Network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Create model instance
model = SimpleNN(784, 128, 10)
model = model.to(device)
# Using Sequential
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10)
)

Layers & Activations

Common Layers

nn.Linear - Fully connected layer

nn.Conv2d - 2D convolutional layer

nn.LSTM - Long Short-Term Memory

nn.BatchNorm2d - Batch normalization

nn.Dropout - Dropout layer

Activation Functions

nn.ReLU() - Rectified Linear Unit

nn.Sigmoid() - Sigmoid function

nn.Tanh() - Hyperbolic tangent

nn.Softmax() - Softmax function

nn.LeakyReLU() - Leaky ReLU

# CNN Example
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Training

Training Loop

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        
        # Forward pass
        output = model(data)
        loss = criterion(output, target)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

Loss & Optimizers

Common Loss Functions

nn.MSELoss() - Mean Squared Error (Regression)

nn.CrossEntropyLoss() - Cross Entropy (Classification)

nn.BCELoss() - Binary Cross Entropy

nn.L1Loss() - Mean Absolute Error

nn.NLLLoss() - Negative Log Likelihood

Optimizers

optim.SGD - Stochastic Gradient Descent

optim.Adam - Adaptive Moment Estimation

optim.RMSprop - Root Mean Square Propagation

optim.Adagrad - Adaptive Gradient Algorithm

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

# In training loop:
# scheduler.step() after optimizer.step()

Data Loading

# Using built-in datasets
from torchvision import datasets, transforms

# Define transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load datasets
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

Custom Dataset

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data, labels, transform=None):
        self.data = data
        self.labels = labels
        self.transform = transform
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        sample = self.data[idx]
        label = self.labels[idx]
        
        if self.transform:
            sample = self.transform(sample)
        
        return sample, label
Note: Always implement __len__ and __getitem__ methods for custom datasets.

Advanced Features

GPU Acceleration

# Check CUDA availability
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f'Using GPU: {torch.cuda.get_device_name(0)}')
else:
    device = torch.device('cpu')
    print('Using CPU')

# Move model to device
model = model.to(device)

# Move tensors to device
x = x.to(device)
y = y.to(device)

# Multiple GPUs
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
# Gradient accumulation
accumulation_steps = 4

for i, (inputs, labels) in enumerate(dataloader):
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss = loss / accumulation_steps
    loss.backward()
    
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

Model Saving & Loading

# Save model
torch.save(model.state_dict(), 'model.pth')

# Load model
model = SimpleNN(784, 128, 10)
model.load_state_dict(torch.load('model.pth'))
model.eval()

# Save entire model
torch.save(model, 'model_complete.pth')
model = torch.load('model_complete.pth')

# Save checkpoint
checkpoint = {
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss,
}
torch.save(checkpoint, 'checkpoint.pth')
# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# or model.train()

Additional Resources

Learning Resources

  • Official Tutorials: PyTorch Official Tutorials
  • Courses: Deep Learning with PyTorch, Fast.ai
  • Books: "Deep Learning with PyTorch", "PyTorch Pocket Reference"
  • Documentation: PyTorch Docs, Torchvision Docs
  • Communities: PyTorch Forums, Stack Overflow, GitHub

Useful Tools

  • Visualization: TensorBoard, Matplotlib
  • Debugging: PyTorch Debugger (pdb), TorchScript
  • Deployment: TorchServe, ONNX, LibTorch
  • Libraries: Torchvision, Torchaudio, Torchtext
  • Extensions: PyTorch Lightning, Fast.ai, Hugging Face
Quick reference guide

Comprehensive PyTorch Deep Learning Cheatsheet Reference

This PyTorch Deep Learning cheatsheet on Nikhil Learn Hub collects syntax, commands, and practical snippets for quick revision. Discover PyTorch tensors, neural networks, training workflows, and model-building techniques with practical code examples.

Use the reference cards and examples above during coding sessions; return here instead of scattered searches when you need dependable reminders. Follow the Deep learning learning roadmap when you want structured lessons beyond one-page lookups.

Quick lookup coverage

  • Syntax, commands, and API signatures
  • Copy-ready examples and common patterns
  • Terminology for coursework and interviews
  • Cross-links to the matching learning roadmap

How to study with this sheet

  • Production debugging and tuning reminders
  • Security, performance, or scale cautions
  • Integration with adjacent stacks on this site
  • Deeper study through tutorials and roadmaps

Who Should Use This Cheatsheet

Students, self-taught developers, and professionals who need fast PyTorch Deep Learning lookups during labs, debugging, or interview revision should keep this page bookmarked.