Context-Free Grammar Tutorial Section

Context-Free Grammar

Learn the mathematical rules of Context-Free Grammars used to define and evaluate Constituency structure in language.

Context-Free Grammars (CFG)

A Context-Free Grammar (CFG) is a mathematical system of strict rewriting rules used to generate syntactically valid sentences in a language. It forms the backbone of Constituency Parsing.

It is called "context-free" because the left-hand side of every rule consists of a single non-terminal symbol, meaning the rule can be applied regardless of the surrounding context.

Anatomy of a CFG

A simple CFG for English contains rules written as A → B C (meaning "A consists of B and C").

Lexicon Rules (Emits Words) Det → "the" | "a"
Noun → "dog" | "cat"
Verb → "chased" | "saw"
Prep → "with" | "in"

Structural Rules (Combines phrases) S → NP VP
NP → Det Noun
VP → Verb NP

Generating a Sentence

Starting from the ROOT state S (Sentence), we apply the substitution rules from top to bottom:

1. S

2. NP VP

3. Det Noun VP

4. "the" "dog" VP

5. "the" "dog" Verb NP

6. "the" "dog" "chased" Det Noun

7. "The dog chased the cat."

The Problem with Hand-Coded CFGs: Linguistic ambiguity is massive. You can write 1,000 rules, and an English sentence will still find a way to be ambiguous (e.g., prepositional attachment: "I shot an elephant in my pajamas"). Therefore, modern NLP uses Probabilistic Context-Free Grammars (PCFGs), adding mathematical probabilities learned from data to every rule to pick the "most likely" valid tree.

Previous: Syntax & Parsing