Co-occurrence Matrix Tutorial Section

Co-occurrence Matrix

Understand how words appearing together in a specific window of context creates rich semantic statistical models.

Co-occurrence Matrix

Bag-of-Words and TF-IDF create Document-Term matrices (Documents as rows, Words as columns). In contrast, a Co-occurrence Matrix creates a Word-Word matrix (Words as rows, Words as columns). It captures how often two different words appear together within a specific "window" distance in a sentence.

This follows the distribution hypothesis by famous linguist J.R. Firth: "You shall know a word by the company it keeps." Words that appear in similar contexts usually share semantic meaning.

How it works: The Context Window

Assume a corpus with one sentence: "deep learning is incredibly exciting"

If we set our Window Size = 1 (look 1 word left, 1 word right), we scan the text:

  • Focus on "learning": Left is "deep", Right is "is".
  • Add +1 to coordinates (learning, deep) and (learning, is) in the matrix.
deeplearningisincrediblyexciting
deep 01000
learning 10100
is 01010
incredibly00101
exciting 00010
Advantages
  • Preserves profound semantic relationships (unlike BoW).
  • Vectors from this matrix possess geometric meaning. Synonyms clustered together in mathematical space.
  • Forms the fundamental mathematical backbone for GloVe embeddings and Latent Semantic Analysis (LSA).
Disadvantages
  • Memory Intensive: Matrix size is Vocab x Vocab. If V=100,000, you need an array with 10 Billion elements! (Usually requires Sparse Matrices).
  • Requires Singular Value Decomposition (SVD) dimensionality reduction to be practically useful in modeling.
Note on Context Window: A small window size (1-2) identifies words that are grammatically interchangeable (e.g., "dog" and "cat" are both followed by "barks" or "meows"). A large window size (5-10) identifies topically related words (e.g., "doctor" and "hospital" appear in the same paragraph but not necessarily adjacent).