Constituency Parsing
Tutorial Section
Constituency Parsing
Group words into phrase structures (Noun Phrases, Verb Phrases) using constituency tree visualization.
Constituency Parsing
Constituency Parsing (or Phrase-Structure Parsing) automatically organizes a sentence into nested structural groups using the rules defined by a Probabilistic Context-Free Grammar (PCFG). The output is a highly structured, tree-like hierarchy of phrases.
Visualizing the Tree Structure
In a constituency tree, root/interior nodes represent non-terminals (Phrases like NP, VP), and the leaf nodes are the actual text words of the sentence.
"The clever dog chased the ball"
S
- └─ NP
- └─ Det (The)
- └─ Adj (clever)
- └─ Noun (dog)
- └─ VP
- └─ Verb (chased)
- └─ NP
- └─ Det (the)
- └─ Noun (ball)
Core Phrase Types
- NP (Noun Phrase): The subject or object of a sentence. Contains the noun and its modifiers. e.g., "The big red ball"
- VP (Verb Phrase): The predicate. It contains the verb and its dependents (like direct objects). e.g., "chased the ball across the yard"
- PP (Prepositional Phrase): Contains a preposition and acts generally as an adverb or adjective. e.g., "into the box"
Common NLP Use Cases
- Coreference Resolution: Finding noun phrases (NPs) that refer to the same entity.
- Chomskyan Linguistics: Direct implementation of Chomsky's formal hierarchies.
- Information Extraction: Extracting distinct semantic chunks from dense legal documents.
Visualizing Trees using NLTK CoreNLP
import nltk
# Assume we already have a parsed standard string
tree_string = "(S (NP (Det The) (Adj clever) (Noun dog)) (VP (Verb chased) (NP (Det the) (Noun ball))))"
# Parse the string into an NLTK tree object
tree = nltk.Tree.fromstring(tree_string)
# Print the readable ASCII hierarchy
print(tree)
# You can also draw actual graphical UI popups using:
# tree.draw()