NLP Q&A
20 Core Questions
Interview Prep
Natural Language Processing: Interview Q&A
Short questions and answers on NLP fundamentals: text preprocessing, embeddings, language models and transformers.
Tokens
Embeddings
Sequence Tasks
Transformers
1
What is NLP in one sentence?
⚡ Beginner
Answer: NLP is the field that teaches computers to understand, generate and work with human language.
2
What is tokenization and why is it needed?
⚡ Beginner
Answer: Tokenization splits text into basic units (words, subwords, characters) so models can operate on discrete symbols.
3
What are word embeddings?
⚡ Beginner
Answer: Embeddings map tokens to dense vectors that capture semantic similarity (similar words have similar vectors).
4
Name some classical NLP tasks.
⚡ Beginner
Answer: Examples: text classification, sentiment analysis, NER, machine translation, summarization, question answering.
5
What problem do subword tokenization methods like BPE or WordPiece solve?
🔥 Advanced
Answer: They handle out-of-vocabulary words and rich morphology by representing words as smaller, reusable subword units.
6
What is the difference between bag-of-words and embeddings?
📊 Intermediate
Answer: Bag-of-words uses sparse counts ignoring word order; embeddings use dense vectors capturing semantics and can be ordered in sequences.
7
What is a language model?
📊 Intermediate
Answer: A language model assigns probabilities to sequences of tokens and can generate likely next tokens.
8
What is the main idea behind the transformer architecture?
🔥 Advanced
Answer: Transformers use self-attention to let each token attend to all others, capturing long-range dependencies without recurrence.
9
What is self-attention in simple terms?
📊 Intermediate
Answer: Self-attention computes a weighted combination of all token representations for each token, where weights indicate relevance.
10
What is BERT and how is it trained?
🔥 Advanced
Answer: BERT is a bidirectional transformer encoder trained with masked language modeling and next sentence prediction.
11
What does fine-tuning mean in the context of pre-trained NLP models?
📊 Intermediate
Answer: Fine-tuning takes a pre-trained language model and trains it a bit more on a specific downstream task with task-specific heads.
12
Why is handling OOV (out-of-vocabulary) words important in NLP?
📊 Intermediate
Answer: Real text constantly introduces new words, names and typos; without OOV strategies the model would treat them all as unknown with no nuance.
13
What is the difference between sequence-to-sequence and sequence labeling tasks?
🔥 Advanced
Answer: Seq2seq maps an input sequence to a different-length output sequence (e.g., translation); sequence labeling assigns a label per input token (e.g., NER).
14
What are some common evaluation metrics in NLP?
⚡ Beginner
Answer: Metrics: accuracy, precision/recall/F1 for classification/NER, BLEU/ROUGE for translation/summarization, perplexity for language models.
15
Why is context important for understanding word meaning?
⚡ Beginner
Answer: Many words are polysemous (multiple meanings); their correct sense depends on neighboring words and sentence structure.
16
What are contextual embeddings (e.g., from BERT) vs static embeddings (e.g., word2vec)?
🔥 Advanced
Answer: Static embeddings give one vector per word type; contextual embeddings give different vectors per occurrence depending on context.
17
Name some common sources of bias in NLP models.
🔥 Advanced
Answer: Bias can stem from imbalanced training data, historical stereotypes, annotation artifacts and spurious correlations.
18
Give a real-world use case where NLP is central.
⚡ Beginner
Answer: Examples: chatbots, search engines, document classification, sentiment analysis in social media.
19
When would classical NLP methods be preferable to large transformers?
📊 Intermediate
Answer: For small datasets, strict latency/compute limits, or narrow tasks where simpler models are easier to deploy and interpret.
20
What is the key message to remember about NLP today?
⚡ Beginner
Answer: Modern NLP combines good text preprocessing, robust embeddings and powerful sequence models; understanding each layer helps you design effective solutions.
Quick Recap: NLP
If you can explain tokens → embeddings → sequence models → task heads, you have a clear mental map for most practical NLP systems.