Coreference Resolution Tutorial Section

Coreference Resolution

Learn how NLP systems identify all expressions in a text that refer to the same real-world entity — a key step in reading comprehension.

Coreference Resolution Overview

Human writing is full of pronouns and abbreviated references. We constantly use words like "he", "she", "it", "they", "the company", and "the researcher" to refer back to entities already introduced in the text. Coreference Resolution is the task of finding all of these expressions that point to the same real-world entity and clustering them together.

Why it Matters Critically

Without coreference resolution, a machine reading the paragraph "Elon Musk founded Tesla. He later started SpaceX. The entrepreneur is now the richest person in the world." would treat "Elon Musk", "He", and "The entrepreneur" as three completely different people. Coreference resolution correctly merges them into a single entity cluster.

Key Terminology

Mention

Any noun phrase or pronoun in the text that could refer to an entity. Every "he", "she", "Amazon", "the company" is a candidate mention that needs to be resolved.

Antecedent

The earlier-occurring mention that a pronoun points back to. In "John ate his lunch", "John" is the antecedent of "his".

Coreference Chain

A complete cluster of all mentions that refer to the same entity.
Chain #1: {Elon Musk, He, The entrepreneur}.

A Worked Example

Input Text Analysis

"Amazon announced a new service today. The e-commerce giant said it will create 10,000 jobs. The company will begin hiring next quarter."

Resolved Coreference Chain
Amazon The e-commerce giant it The company

All 4 mentions correctly resolve to the same entity: Amazon.

Modern Approach: Neural Mention-Ranking

State-of-the-art coreference resolution uses a neural model (e.g. SpanBERT) that scores all possible pairs of mentions in a document to determine which ones are most likely to corefer. It ranks candidate antecedents for each mention and picks the best-scoring one.

Coreference Resolution with spaCy + neuralcoref
import spacy
import neuralcoref  # pip install neuralcoref

nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp)

text = "Amazon announced a new service. The company said it will create 10,000 jobs."
doc = nlp(text)

if doc._.has_coref:
    print("Coreference Clusters Found:")
    for cluster in doc._.coref_clusters:
        print(f"  Cluster: {[str(m) for m in cluster.mentions]}")
        print(f"  Main: '{cluster.main}'")

# Output:
# Coreference Clusters Found:
#   Cluster: ['Amazon', 'The company', 'it']
# Main: 'Amazon'