QA Q&A

Question answering – short Q&A

20 questions and answers on question answering, including extractive reading comprehension, open-domain QA, multi-hop reasoning and common evaluation metrics like EM and F1.

1

What is question answering (QA) in NLP?

Answer: QA takes a natural language question and returns a relevant answer, using either a given context passage (reading comprehension) or large corpora and knowledge sources (open-domain QA).

2

What is the difference between extractive and abstractive QA?

Answer: Extractive QA selects an answer span directly from the context, while abstractive QA may generate a novel answer sentence that paraphrases or synthesizes information from multiple parts of the context.

3

How do span-based QA models work?

Answer: Span-based models, often built on transformers, encode question and context together and predict start and end indices in the context that mark the most probable answer span.

4

What is open-domain QA?

Answer: Open-domain QA answers questions using large text collections or the web, typically via a retrieve-then-read pipeline where a retriever finds passages and a reader extracts or generates answers from them.

5

What is multi-hop QA?

Answer: Multi-hop QA requires combining evidence from multiple sentences or documents—for example, linking facts about two entities—to answer complex questions that cannot be solved with a single local clue.

6

Which metrics are commonly used to evaluate extractive QA?

Answer: Exact Match (EM) and token-level F1 are standard: EM checks if the predicted span matches the reference exactly, while F1 measures the overlap in tokens between predicted and gold answers.

7

What is a reader–retriever architecture?

Answer: A retriever selects candidate passages from a large corpus given the question, and a reader model performs fine-grained comprehension over those passages to extract or generate the answer.

8

Why is answer normalization important in QA evaluation?

Answer: Normalization (lowercasing, stripping punctuation and articles) helps treat semantically equivalent answers like “the Eiffel Tower” and “Eiffel Tower” as the same when computing EM and F1 scores.

9

What is a no-answer case in QA datasets?

Answer: Some questions deliberately have no answer in the provided context; models must learn to abstain by predicting a special no-answer label or span instead of forcing an incorrect answer.

10

How do large language models support QA?

Answer: Large language models can answer many questions directly via prompting, or act as generators in retrieval-augmented QA systems where retrieved documents ground the produced answers in external evidence.

11

Why is careful question design important in QA benchmarks?

Answer: Poorly designed questions may allow shallow heuristics or annotation artifacts to succeed, so benchmarks strive for diverse, reasoning-intensive questions that truly test comprehension and knowledge use.

12

What is conversational QA?

Answer: Conversational QA handles multi-turn dialogues where each question depends on previous turns and the system must perform coreference resolution and context tracking across the conversation history.

13

How does domain adaptation affect QA performance?

Answer: QA models trained on general-domain data can struggle in specialized areas like biomedicine or law; fine-tuning on domain-specific corpora and terminology greatly improves accuracy and robustness.

14

What are common sources of error in QA systems?

Answer: Systems may misinterpret the question, focus on irrelevant context, pick partially correct spans, hallucinate unsupported answers or fail when reasoning across multiple sentences is required.

15

How does QA relate to information retrieval (IR)?

Answer: IR retrieves relevant documents or passages for a query, while QA goes further by extracting or generating precise answers; modern open-domain QA tightly integrates IR and reading components.

16

What is knowledge-grounded QA?

Answer: Knowledge-grounded QA answers questions using structured knowledge bases or knowledge graphs, mapping questions to entities and relations and executing reasoning paths to derive answers.

17

What role does tokenization play in span-based QA?

Answer: Since models predict start and end positions over tokens, consistent tokenization (e.g. WordPiece) is essential so that predicted spans can be accurately mapped back to the original text segments.

18

Why is calibration important in QA systems?

Answer: Well-calibrated confidence scores let systems abstain when unsure, defer to humans in high-stakes scenarios and support better decision-making in deployed QA applications.

19

What ethical issues arise with QA over personal data?

Answer: QA over emails, chats or medical notes raises privacy and consent concerns; systems must protect sensitive information, control access and avoid exposing confidential content in generated answers.

20

Where are QA systems used in practice?

Answer: QA systems power search assistants, customer support bots, enterprise knowledge search, educational tutors and tools that let users query large document collections in natural language.

🔍 Question answering concepts covered

This page covers question answering: extractive and open-domain QA, multi-hop reasoning, reader–retriever architectures, EM/F1 evaluation and practical considerations for deploying QA systems safely.

Extractive vs abstractive QA
Open-domain & multi-hop
Reader–retriever setups
EM/F1 evaluation
Domain adaptation
Ethics & privacy