Word Embeddings Tutorial Section

Word Embeddings

Transition from sparse matrices to dense, continuous, low-dimensional vector spaces capable of capturing complex meaning.

Introduction to Word Embeddings

We've looked at One-Hot, BoW, and TF-IDF encoding. All of these generate Sparse Vectors (mostly zeros) where the length of the vector is equal to the massive size of the vocabulary (50k+ dimensions). Word Embeddings represented a paradigm shift in 2013: migrating from Sparse Vectors to Dense Vectors.

Sparse Vector (One-Hot)

"King" = [0, 0, 1, 0, 0, 0, 0, 0, 0....]

"Man" = [0, 0, 0, 0, 0, 1, 0, 0, 0....]

Length: 50,000+

Similarity: 0.0 (No overlap)
Dense Vector (Embedding)

"King" = [0.98, 0.45, -0.6, 0.12, 0.8]

"Man" = [0.93, 0.41, -0.9, 0.15, 0.3]

Length: Fixed sizes (e.g., 300)

Similarity: High (Vectors point same way)

How Dense Embeddings Work

Rather than counting words, an embedding model uses Neural Networks to map words into a continuous geometric space. Each dimension (number) in the fixed-length vector subtly captures a latent semantic feature (e.g., gender, royalty, color, sentiment).

  • Because the dimensions are dense (floats between -1 and 1 instead of sparse 0s), they compress vast vocabulary context into just 300 dimensions.
  • Cosine Similarity on the angles of these vectors accurately measures how conceptually similar two words are.

The State of the Art: The "Big 3" Static Embeddings

1. Word2Vec (2013)

Developed by Google

A predictive model that uses a shallow Neural Network to guess words based on their neighbors (or vice versa).

2. GloVe (2014)

Developed by Stanford

A count-based model that performs matrix factorization on a gigantic global word Co-occurrence Matrix to derive vectors.

3. FastText (2016)

Developed by Facebook AI

An extension of Word2Vec that trains on sub-word character N-grams (e.g., "apple" = "app", "ppl", "ple"). Can handle unknown spelling errors!

*Note: Since 2018, static embeddings have largely been superseded by Contextual Embeddings like BERT and LLMs, though they remain vital for lightweight tasks.