Text Coherence
Explore what makes a text smooth and readable: from local cohesion (sentence-to-sentence) to global coherence (overall document structure).
Understanding Text Coherence
While cohesion deals with the grammatical and lexical ties between individual sentences (e.g. pronouns), coherence refers to the logical and semantic flow of the entire document. A coherent text feels like a unified whole where every sentence contributes meaningfully to the overall theme.
Local Cohesion
How adjacent sentences connect using reference (pronouns), substitution, ellipsis, and conjunctions. It's the "surface level" connectivity.
Global Coherence
The high-level organization. Does the text follow a logical progression (e.g. Chronological, Problem-Solution, General-to-Specific)?
Computational Models of Coherence
Measuring coherence automatically is vital for tasks like automated essay grading and summarization evaluation. Key computational models include:
The Entity Grid Model
Introduced by Barzilay and Lapata (2008), this model represents a document as a grid where rows are sentences and columns are entities. Each cell tracks the grammatical role (Subject, Object, None) of an entity in a sentence.
| Sentence | Elon Musk | Tesla | SpaceX |
|---|---|---|---|
| S1 | Subject | Object | - |
| S2 | Subject | - | Object |
| S3 | Subject | - | - |
A coherent text will show patterns of entity transitions (e.g., Subject → Subject) that are statistically likely in well-written documents.
1. Centering Theory
Centering Theory (Grosz, Joshi & Weinstein, 1995) tracks the most prominent entity that is "in focus" as the reader moves from sentence to sentence. It predicts that coherence is high when:
- The "center" (main entity) of a sentence is also the subject of the following sentence.
- The center changes as little as possible between adjacent sentences.
Sentence 2: "It ran across the park." → Center = Dog ✅ (High Coherence!)
Sentence 2': "The cat slept on the sofa." → Center shifts to Cat ❌ (Low Coherence)
import torch
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
# Load GPT-2
model_id = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_id)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)
model.eval()
def perplexity(text):
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
loss = model(**inputs, labels=inputs["input_ids"]).loss
return torch.exp(loss).item()
# Coherent paragraph
coherent = "The moon is Earth's only natural satellite. It formed 4.5 billion years ago. Scientists believe it resulted from a giant impact."
# Shuffled (incoherent) paragraph
incoherent = "Scientists believe it resulted from a giant impact. The moon is Earth's only natural satellite. It formed 4.5 billion years ago."
print(f"Coherent Perplexity: {perplexity(coherent):.2f} (LOWER = MORE COHERENT)")
print(f"Incoherent Perplexity: {perplexity(incoherent):.2f} (HIGHER = LESS COHERENT)")
# Coherent Perplexity: 43.21 (LOWER = MORE COHERENT)
# Incoherent Perplexity: 89.56 (HIGHER = LESS COHERENT)