NLP: 20 Interview Questions

Question 1

1 What are the typical stages in an NLP pipeline? ⚡ Easy

Answer

Answer:

Data acquisition: collect text corpus.
Text preprocessing: cleaning, normalization, tokenization.
Feature extraction: TF-IDF, embeddings, bag-of-words.
Modeling: traditional ML or deep learning (RNN, Transformer).
Evaluation: accuracy, F1, BLEU, perplexity.
Deployment & monitoring.

Question 2

2 Differentiate: tokenization, stemming, lemmatization. ⚡ Easy

Answer

Answer:

Tokenization: splitting text into tokens (words, subwords, characters).
Stemming: crude heuristic chop to root form (e.g., "running" → "run"). Porter Stemmer.
Lemmatization: dictionary-based, returns actual lemma (e.g., "better" → "good"). Slower but accurate.

Question 3

3 Compare Bag-of-Words (BoW) and TF-IDF. 📊 Medium

Answer

Answer: BoW counts word occurrences; ignores semantics, sparsity, stopwords dominate. TF-IDF: term frequency * inverse document frequency. Downweights common words, highlights discriminative terms. Both lose word order and context.

Question 4

4 Explain Word2Vec. Compare CBOW and Skip-gram. 🔥 Hard

Answer

Answer: Word2Vec learns dense vector representations from context.

CBOW: predict target word from surrounding context. Faster, better for frequent words.
Skip-gram: predict context words from target. Works better for small data, rare words.

Both use negative sampling or hierarchical softmax.

Question 5

5 How does GloVe differ from Word2Vec? 📊 Medium

Answer

Answer: GloVe (Global Vectors) leverages global corpus statistics (co-occurrence matrix) and factorization, while Word2Vec is prediction-based (local context windows). GloVe trains faster on small data, captures global ratios. Both produce similar quality embeddings.

Question 6

6 Why subword tokenization? Explain BPE and WordPiece. 🔥 Hard

Answer

Answer: Solves OOV and handles morphologically rich languages.

BPE (Byte-Pair Encoding): iteratively merges most frequent character pairs. Used in GPT.
WordPiece: merges pairs maximizing likelihood (probability gain). Used in BERT.
Unigram LM: probabilistically segment.

Question 7

7 Describe BERT's architecture and pre-training objectives. 🔥 Hard

Answer

Answer: BERT = Bidirectional Encoder Representations from Transformers. Encoder-only Transformer, deep bidirectional. Pre-trained on:

MLM (Masked Language Model): 15% tokens masked, predict original.
NSP (Next Sentence Prediction): predict if sentence B follows A.

Fine-tuned for downstream tasks.

Question 8

8 Compare GPT and BERT architectures. 🔥 Hard

Question 9

9 What is Named Entity Recognition? How is it typically modeled? 📊 Medium

Answer

Answer: NER locates and classifies entities (person, org, location, date). Classic: CRF with handcrafted features. Modern: BiLSTM-CRF or Transformer (BERT) + linear layer + CRF. IOB tagging (Inside, Outside, Beginning).

Question 10

10 List approaches for sentiment analysis. 📊 Medium

Answer

Answer:

Lexicon-based: SentiWordNet, VADER – use sentiment dictionaries.
Machine learning: Naive Bayes, SVM with TF-IDF.
Deep learning: LSTM, CNN, Transformer fine-tuning.
Aspect-based: fine-grained (product aspects).

Question 11

11 How does seq2seq with attention work for NLP tasks? 🔥 Hard

Answer

Answer: Encoder RNN reads source sentence, hidden states. Attention computes context vector as weighted sum of encoder states. Decoder RNN generates target tokens, using context and previous output. Breakthrough for MT, summarization.

Question 12

12 What is perplexity in language modeling? 📊 Medium

Answer

Answer: Perplexity (PPL) = exp(cross-entropy loss). Measures how "surprised" the model is by test data. Lower is better. Intuition: 2^entropy = average number of choices per token. Cannot compare across different tokenizations.

Question 13

13 Differentiate BLEU and ROUGE metrics. 🔥 Hard

Answer

Answer:

BLEU: precision-based, n-gram overlap, brevity penalty. Used in machine translation.
ROUGE: recall-based, measures overlap of n-grams, longest common subsequence. Used in summarization.

Both correlate moderately with human judgment.

Question 14

14 Why Transformers dominate over LSTMs for NLP? 📊 Medium

Answer

Answer:

Parallelization (no sequential dependency).
Long-range dependencies via direct pairwise attention.
Better gradient flow (no vanishing).
Scales with compute/data (large LMs).

Question 15

15 Fine-tuning vs feature-based transfer learning in NLP. 📊 Medium

Answer

Answer:

Feature-based: use pretrained embeddings (Word2Vec, GloVe) as static features.
Fine-tuning: load pretrained model (BERT) and update all weights on downstream task. Generally better performance.

Question 16

16 What is prompting and in-context learning in LLMs? 🔥 Hard

Answer

Answer:

Prompting: providing task description in natural language to LLM.
In-context learning: include few examples in prompt; model infers pattern without gradient updates.
Zero-shot: no examples; few-shot: 2-5 examples.

Question 17

17 Extractive vs abstractive summarization. 📊 Medium

Answer

Answer:

Extractive: select sentences/phrases directly from source. Simpler, fact-preserving.
Abstractive: generate novel sentences, paraphrase. Harder, requires generation (seq2seq, BART, T5).

Question 18

18 What is coreference resolution? Give example. 📊 Medium

Answer

Answer: Task of clustering mentions that refer to same entity. Example: "John said he would come" → John = he. End-to-end neural models (span-based, BERT) are SOTA.

Question 19

19 How is POS tagging typically done today? ⚡ Easy

Answer

Answer: BiLSTM-CRF or Transformer encoder (BERT) with token-level classifier. Universal Dependencies tags (NOUN, VERB, ADJ). Pre-trained models fine-tuned on annotated corpora (e.g., OntoNotes).

Question 20

20 Key challenges in deploying large language models? 🔥 Hard

Answer

Answer:

Hallucination: generating false/unsupported facts.
Bias: social stereotypes from training data.
Safety: toxic outputs, jailbreaking.
Cost: inference expensive, memory footprint.
Evaluation: open-ended generation hard to evaluate.

NLP: 20 Interview Questions

NLP – Interview Cheat Sheet

Preprocessing

Embeddings

Architectures

Metrics