Constituency Parsing Tutorial Section

Constituency Parsing

Group words into phrase structures (Noun Phrases, Verb Phrases) using constituency tree visualization.

Constituency Parsing

Constituency Parsing (or Phrase-Structure Parsing) automatically organizes a sentence into nested structural groups using the rules defined by a Probabilistic Context-Free Grammar (PCFG). The output is a highly structured, tree-like hierarchy of phrases.

Visualizing the Tree Structure

In a constituency tree, root/interior nodes represent non-terminals (Phrases like NP, VP), and the leaf nodes are the actual text words of the sentence.

"The clever dog chased the ball"
S
  • └─ NP
    • └─ Det (The)
    • └─ Adj (clever)
    • └─ Noun (dog)
  • └─ VP
    • └─ Verb (chased)
    • └─ NP
      • └─ Det (the)
      • └─ Noun (ball)
Core Phrase Types
  • NP (Noun Phrase): The subject or object of a sentence. Contains the noun and its modifiers. e.g., "The big red ball"
  • VP (Verb Phrase): The predicate. It contains the verb and its dependents (like direct objects). e.g., "chased the ball across the yard"
  • PP (Prepositional Phrase): Contains a preposition and acts generally as an adverb or adjective. e.g., "into the box"
Common NLP Use Cases
  • Coreference Resolution: Finding noun phrases (NPs) that refer to the same entity.
  • Chomskyan Linguistics: Direct implementation of Chomsky's formal hierarchies.
  • Information Extraction: Extracting distinct semantic chunks from dense legal documents.
Visualizing Trees using NLTK CoreNLP
import nltk

# Assume we already have a parsed standard string
tree_string = "(S (NP (Det The) (Adj clever) (Noun dog)) (VP (Verb chased) (NP (Det the) (Noun ball))))"

# Parse the string into an NLTK tree object
tree = nltk.Tree.fromstring(tree_string)

# Print the readable ASCII hierarchy
print(tree)

# You can also draw actual graphical UI popups using:
# tree.draw()