Discourse Analysis Tutorial Section

Discourse Analysis

Go beyond individual sentences: understand how paragraphs and multi-sentence texts create structured meaning through discourse relations.

Discourse Analysis Overview

All the NLP techniques we've studied so far operate primarily at the sentence level. Discourse Analysis zooms out to study how sentences connect and relate to form coherent paragraphs, conversations, documents, and entire texts.

A sequence of grammatically perfect individual sentences does not automatically make a coherent text. Discourse analysis identifies the hidden logical glue that holds connected text together.

An Example of Discourse Breakdown:
"Roses are red. Quantum mechanics describes particle physics. My cat is named Whiskers."
Three perfectly valid sentences. Zero discourse coherence. A good discourse model will assign this a very low coherence score.

Rhetorical Structure Theory (RST): The Backbone of Discourse

RST is the most influential theory for computational discourse analysis. It proposes that coherent texts can be represented as a hierarchical tree of nuclei and satellites linked by specific rhetorical relations.

Nucleus

The core element — the most essential piece of information. If removed, the text loses its main point. In the sentence pair "The system crashed [N] because of a memory overflow [S]", the Nucleus is the crash event.

Satellite

The supporting element — it elaborates or fills in context around the nucleus. It helps the nucleus but is not itself the main point. The cause ("memory overflow") is the Satellite.

Common Rhetorical (Discourse) Relations

Relation Meaning Connecting Word Example
CAUSE Satellite is the reason for the Nucleus event. "because", "due to"
CONTRAST Two nuclei are presented as opposing ideas. "however", "but", "whereas"
ELABORATION Satellite gives more detail about the Nucleus. "specifically", "for example"
EVIDENCE Satellite provides factual support for the Nucleus claim. "as shown by", "data indicates"
CONCESSION Satellite acknowledges something that seems to conflict with Nucleus. "although", "even though"
CONDITION Nucleus event is conditional upon the Satellite. "if", "provided that", "unless"

Discourse Segmentation: EDUs

The first step in computational discourse analysis is breaking text into the smallest possible meaning-bearing units called Elementary Discourse Units (EDUs). These are typically individual clauses.

Segmenting into EDUs

"The company's profits fell sharply last year, largely because they failed to innovate, and subsequently they had to lay off 500 employees."

Segmented into 3 EDUs:

  1. EDU 1 "The company's profits fell sharply last year,"
  2. EDU 2 "largely because they failed to innovate,"
  3. EDU 3 "and subsequently they had to lay off 500 employees."

Relations: EDU2 CAUSE → EDU1; EDU3 is the RESULT of EDU1.