Coreference Q&A

Coreference resolution – short Q&A

20 questions and answers on coreference resolution, including mentions, anaphora, entity clusters and modern neural coreference models.

1

What is coreference resolution?

Answer: Coreference resolution identifies which expressions in a text (mentions) refer to the same real-world entity, grouping them into clusters that represent discourse entities.

2

What is an anaphor and what is an antecedent?

Answer: An anaphor is a referring expression such as a pronoun or definite NP whose interpretation depends on another expression, and its antecedent is the earlier mention that provides that interpretation.

3

What are mentions in coreference resolution?

Answer: Mentions are spans of text that can participate in coreference, typically noun phrases, pronouns, named entities or nominal expressions like “the company” or “its CEO.”

4

What is a coreference cluster?

Answer: A coreference cluster is a set of mentions that all refer to the same entity, such as {“Barack Obama”, “the president”, “he”} across several sentences in a document.

5

How is coreference resolution evaluated?

Answer: Evaluation uses metrics like MUC, B^3, CEAF and the averaged CoNLL score, which compare predicted clusters against gold clusters based on overlaps in entity mentions and links.

6

What is the difference between coreference and anaphora resolution?

Answer: Anaphora resolution focuses mainly on resolving pronouns or anaphoric expressions to antecedents, while coreference resolution handles all mention types and constructs full entity clusters across a discourse.

7

What features did early coreference systems use?

Answer: Traditional systems used lexical, syntactic, semantic and discourse features: string matching, head nouns, gender, number, animacy, grammatical roles, distance and compatibility of semantic types or named entity labels.

8

How do mention-pair models approach coreference?

Answer: Mention-pair models classify each pair of mentions as coreferent or not using feature-based or neural classifiers, then use clustering or transitivity to assemble entity clusters from pairwise decisions.

9

What is an entity-centric or mention-ranking approach?

Answer: Mention-ranking models rank candidate antecedents for each mention and select the best one (or decide it starts a new cluster), directly optimizing over choices rather than independent pair predictions.

10

How do neural coreference models work?

Answer: Neural models encode contextual representations for tokens and mention spans (often using BiLSTMs or transformers), then compute scores for antecedent links and clusters, sometimes training end-to-end on cluster-level objectives.

11

What challenges make coreference resolution difficult?

Answer: Coreference requires understanding world knowledge, discourse structure, subtle lexical and syntactic cues, and dealing with long-distance dependencies, ambiguous pronouns and non-referential expressions like pleonastic “it.”

12

What is non-referential “it” and why is it important?

Answer: Non-referential (pleonastic) “it” appears in constructions like “It is raining” or “It seems that...”; such pronouns do not refer to entities, so systems must detect and exclude them from coreference chains.

13

How do large language models help coreference resolution?

Answer: Large language models provide rich contextual embeddings that capture syntactic and semantic relations, enabling simpler architectures to achieve strong coreference performance when fine-tuned on annotated corpora.

14

What corpora are commonly used for coreference research?

Answer: Widely used corpora include OntoNotes, ACE and the CoNLL shared task datasets, which provide multilayer annotations including coreference chains across multiple genres and sometimes languages.

15

How does coreference resolution support downstream NLP tasks?

Answer: Coreference helps ensure that information about the same entity is aggregated across mentions, improving tasks like information extraction, summarization, question answering and discourse analysis.

16

What is bridging reference and how is it related to coreference?

Answer: Bridging references are related but non-coreferent expressions (e.g. “a car” and “the wheels”); they complement coreference by capturing associative links rather than strict identity between mentions.

17

How do coreference systems ensure agreement in gender and number?

Answer: Systems use morphological cues, lexicons or learned predictors to estimate gender and number for mentions, treating mismatches as negative evidence against coreference links, especially for pronouns.

18

What is cross-document coreference?

Answer: Cross-document coreference extends clustering to mentions across multiple documents, linking references to the same entity appearing in different texts, which is important for large-scale knowledge integration.

19

How does coreference interact with named entity recognition?

Answer: NER often provides mention types that help constrain coreference (e.g. person vs. organization), while coreference can propagate entity types and canonical names to pronouns and nominal mentions throughout a document.

20

Why is annotation consistency critical in coreference corpora?

Answer: Coreference decisions can be subjective; inconsistent guidelines or annotator disagreement lead to noisy clusters, making it harder to train and fairly evaluate coreference models.

🔍 Coreference concepts covered

This page covers coreference resolution: mentions and antecedents, entity clustering, classic feature-based and neural coreference models, evaluation metrics and how coreference supports higher-level NLP applications.

Mentions & clusters
Anaphora & pleonastic it
Mention-pair & ranking
Neural coref models
CoNLL metrics
Downstream tasks