NLP Tutorial

Linguistics & Language Fundamentals

Core linguistics, syntax structure, and semantic analysis foundations for NLP.

Linguistics Basics

Linguistics Basics for NLP

To effectively process natural language with computers, we need to understand how human language is structured. Linguistics provides the foundation for NLP algorithms.

The Levels of Linguistic Analysis

Human language is highly structured and can be analyzed at multiple levels of abstraction, from individual sounds to overall meaning in context. Let's look at 5 levels.

1. Phonetics and Phonology (Sounds)

The study of linguistic sounds and how they are organized.

Example: A Phoneme is the smallest sound unit. Changing a phoneme changes a word's meaning.
Replacing the 'b' in "bat" with 'c' yields "cat".
NLP App: Speech-to-Text, Voice Assistants (Siri).

2. Morphology (Word Parts)

The study of the internal structure of words.

Example: A Morpheme is the smallest meaningful unit. Consider the word "unbelievable":
un- (prefix meaning not) + believe (root) + -able (suffix meaning capable of)
NLP App: Stemming, Lemmatization, and Subword Tokenization.

3. Syntax (Sentence Structure)

The rules that govern how words combine to form phrases and sentences (grammar and word order).

Example: Parse Trees breakdown sentence structure.
"The cat sat on the mat." → [Noun Phrase (The cat)] + [Verb Phrase (sat on the mat)].
NLP App: Part-of-Speech Tagging, Dependency Parsing.

4. Semantics (Meaning)

The study of meaning in words and sentences.

Example: Polysemy (words with multiple meanings) makes NLP hard.
"I went to the bank to deposit money" vs "I sat down by the river bank."
NLP App: Word Embeddings (Word2Vec), Named Entity Recognition.

5. Pragmatics and Discourse (Context)

The study of how context influences the interpretation of meaning, such as sarcasm, pronouns, and intent.

Example: "Can you pass the salt?"
Literal meaning: Are you physically capable of lifting the salt?
Pragmatic meaning: Please give me the salt.
NLP App: Sentiment Analysis, Conversational Chatbots.

Syntax & Parsing

Syntax and Sentence Parsing

Natural languages are not just random lists of words—they have highly structured hierarchical grouping rules called Syntax. A sentence's syntax determines how words group together to form logical units of meaning.

Parsing is the algorithmic process of automatically extracting this underlying syntactic structure from a stream of text data.

Syntactically Valid

"Colorless green ideas sleep furiously"

Noam Chomsky famously coined this sentence to prove that syntax is entirely separate from semantics (meaning). The sentence makes zero logical sense, but it perfectly follows English grammatical rules!

Syntactically Invalid

"Furiously sleep ideas green colorless"

This sentence contains the exact same words, but violates English syntax rules. It's un-parsable.

Why do we parse?

Parsing provides the deep structural relationship required for complex downstream tasks:

  • Question Answering: It maps Who did What to Whom. Parsing tells us whether "John hit Bob" or "Bob hit John".
  • Machine Translation: Different languages have different rigid syntax trees. English is Subject-Verb-Object (SVO), while Japanese is Subject-Object-Verb (SOV). You must parse the English tree to structurally map it to a Japanese tree.

The Two Main Types of Parsing

Computational Semantics

Computational Semantics

Semantics is the subfield of linguistics concerned with meaning. While syntax asks "Is this sentence grammatically well-formed?", semantics asks the harder question: "What does this sentence actually mean?"

Syntactically Valid, Semantically Odd

"Colorless green ideas sleep furiously."

Correct grammar. Zero real-world meaning. It violates our semantic model of reality (green cannot be colorless; ideas do not sleep).

Semantically Rich

"The quick fox jumped over the lazy dog."

Valid grammar AND a clearly grounded, compositional meaning. We can visualize it, react to it, and reason about its truth.

Core Areas of Computational Semantics

1. Lexical Semantics

Studies the meaning of individual words and how they relate to each other.

  • Synonymy: "big" ≈ "large"
  • Antonymy: "hot" ↔ "cold"
  • Hyponymy (Is-A): "poodle" IS-A "dog"
  • Polysemy: "bank" (river bank vs. savings bank)
2. Compositional Semantics

The meaning of a phrase is built from the meanings of its parts.

"Big dog" = Meaning(big) + Meaning(dog)

The Principle of Compositionality (Frege's Principle) states that the meaning of a whole sentence is determined by the meanings of its constituents.

Semantic Role Labeling (SRL)

SRL is the task of assigning semantic roles to words in a sentence — it answers Who did What to Whom, Where, When, and How?

SRL with AllenNLP (Example)
from allennlp.predictors.predictor import Predictor

predictor = Predictor.from_path(
    "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
)

result = predictor.predict(
    sentence="John carefully gave the book to Mary."
)