NLP Interview: 20 Essential Q&A

Question 1

1 What is tokenization in NLP? ⚡ easy

Answer

Answer: Tokenization splits text into smaller units (tokens) – words, subwords, or characters. Example: "NLP is fun" → ["NLP", "is", "fun"]. Essential for preprocessing.

Question 2

2 Difference between stemming and lemmatization? ⚡ easy

Answer

Answer: Stemming chops off affixes (rule-based, e.g., "running" → "run"). Lemmatization uses vocabulary/dictionary to return base form ("better" → "good"). Lemmatization is more accurate.

Question 3

3 What is a stop word? ⚡ easy

Answer

Answer: Common words (the, is, at) that are often removed because they carry little semantic meaning. But context matters – not always removed.

Question 4

4 Explain Bag-of-Words (BoW) model. 📊 medium

Answer

Answer: BoW represents text as a multiset of words, ignoring order and grammar. Creates a sparse vector of word counts. Simple but loses context.

Question 5

5 What is TF-IDF? 📊 medium

Answer

Answer: Term Frequency–Inverse Document Frequency. Weights words by how often they appear in a document (TF) and how rare across corpus (IDF). Highlights important words.

Question 6

6 Word embeddings vs one-hot encoding? 📊 medium

Answer

Answer: One-hot creates large sparse vectors with no similarity. Embeddings (word2vec, GloVe) are dense, low-dimensional, and capture semantic relationships (king - man + woman ≈ queen).

Question 7

7 What is Word2Vec and its two architectures? 📊 medium

Answer

Answer: Word2Vec predicts word embeddings. CBOW predicts target from context; Skip-gram predicts context from target. Skip-gram works better for rare words.

Question 8

8 Define perplexity in language models. 🔥 hard

Answer

Answer: Perplexity measures how well a probability model predicts a sample. Lower perplexity means better generalization. It's 2^(cross-entropy).

Question 9

9 What is an N-gram model? 📊 medium

Answer

Answer: An N-gram predicts next word using previous N-1 words. Unigram (1), bigram (2), trigram (3). Simple but suffers from sparsity.

Question 10

10 Name common POS tagging algorithms. 📊 medium

Answer

Answer: Hidden Markov Models (HMM), Conditional Random Fields (CRF), and deep learning (BiLSTM + CRF, transformer-based).

Question 11

11 What is Named Entity Recognition (NER)? ⚡ easy

Answer

Answer: NER locates and classifies entities (person, org, location) in text. e.g., "Apple" as ORG.

Question 12

12 Explain the attention mechanism. 🔥 hard

Answer

Answer: Attention allows model to focus on relevant parts of input when producing output. Computes weighted sum of values based on query-key similarity. Self-attention = attention within same sequence.

Question 13

13 Transformer architecture in one sentence? 🔥 hard

Answer

Answer: The Transformer uses multi-head self-attention and feedforward layers, no recurrence, enabling parallelization and long-range dependencies.

Question 14

14 How does BERT differ from GPT? 🔥 hard

Answer

Answer: BERT is encoder-only, bidirectional (masked LM), excels at understanding tasks (classification, QA). GPT is decoder-only, autoregressive (left-to-right), designed for generation.

Question 15

15 What is the purpose of masked language modeling? 📊 medium

Answer

Answer: MLM (used in BERT) masks random tokens and trains model to predict them, forcing bidirectional context. Great for learning deep representations.

Question 16

16 What is sentiment analysis? ⚡ easy

Answer

Answer: Classifying text polarity (positive, negative, neutral). Often uses LSTMs, transformers, or lexicon-based methods.

Question 17

17 Explain beam search in text generation. 🔥 hard

Answer

Answer: Beam search keeps top-k hypotheses at each step, reducing risk of missing high-probability sequences. k is beam width. Balances diversity and optimality.

Question 18

18 What is BLEU score? 📊 medium

Answer

Answer: BLEU (bilingual evaluation understudy) measures n-gram overlap between generated and reference text. Common for translation, summarization. Ranges 0–1.

Question 19

19 Coreference resolution – define. 🔥 hard

Answer

Answer: Identifying when two expressions refer to the same entity. e.g., "John said he would come" → "John" and "he" corefer.

Question 20

20 What are some challenges in NLP? 📊 medium

Answer

Answer: Ambiguity, context, sarcasm, low-resource languages, bias in models, and commonsense reasoning. Still active research.

NLP Interview: 20 Essential Q&A

NLP interview cheat sheet