Constituency Parsing Tutorial Section

Constituency Parsing

Group words into phrase structures (Noun Phrases, Verb Phrases) using constituency tree visualization.

Constituency Parsing

Constituency Parsing (or Phrase-Structure Parsing) automatically organizes a sentence into nested structural groups using the rules defined by a Probabilistic Context-Free Grammar (PCFG). The output is a highly structured, tree-like hierarchy of phrases.

Visualizing the Tree Structure

In a constituency tree, root/interior nodes represent non-terminals (Phrases like NP, VP), and the leaf nodes are the actual text words of the sentence.

"The clever dog chased the ball"

└─ NP
- └─ Det (The)
- └─ Adj (clever)
- └─ Noun (dog)
└─ VP
- └─ Verb (chased)
- └─ NP
  - └─ Det (the)
  - └─ Noun (ball)

Core Phrase Types

NP (Noun Phrase): The subject or object of a sentence. Contains the noun and its modifiers. e.g., "The big red ball"
VP (Verb Phrase): The predicate. It contains the verb and its dependents (like direct objects). e.g., "chased the ball across the yard"
PP (Prepositional Phrase): Contains a preposition and acts generally as an adverb or adjective. e.g., "into the box"

Common NLP Use Cases

Coreference Resolution: Finding noun phrases (NPs) that refer to the same entity.
Chomskyan Linguistics: Direct implementation of Chomsky's formal hierarchies.
Information Extraction: Extracting distinct semantic chunks from dense legal documents.

Visualizing Trees using NLTK CoreNLP

import nltk

# Assume we already have a parsed standard string
tree_string = "(S (NP (Det The) (Adj clever) (Noun dog)) (VP (Verb chased) (NP (Det the) (Noun ball))))"

# Parse the string into an NLTK tree object
tree = nltk.Tree.fromstring(tree_string)

# Print the readable ASCII hierarchy
print(tree)

# You can also draw actual graphical UI popups using:
# tree.draw()

Previous: Context-Free Grammar