NLP Interview 20 short answers
essential prep

NLP Interview: 20 Essential Q&A

Master Natural Language Processing fundamentals, from preprocessing to transformers. Short, crisp answers for interview success.

tokenization word2vec BERT transformers sentiment
1 What is tokenization in NLP? ⚡ easy
Answer: Tokenization splits text into smaller units (tokens) – words, subwords, or characters. Example: "NLP is fun" → ["NLP", "is", "fun"]. Essential for preprocessing.
from nltk.tokenize import word_tokenize; word_tokenize("NLP rocks!")
preprocessing splitting
2 Difference between stemming and lemmatization? ⚡ easy
Answer: Stemming chops off affixes (rule-based, e.g., "running" → "run"). Lemmatization uses vocabulary/dictionary to return base form ("better" → "good"). Lemmatization is more accurate.
3 What is a stop word? ⚡ easy
Answer: Common words (the, is, at) that are often removed because they carry little semantic meaning. But context matters – not always removed.
4 Explain Bag-of-Words (BoW) model. 📊 medium
Answer: BoW represents text as a multiset of words, ignoring order and grammar. Creates a sparse vector of word counts. Simple but loses context.
5 What is TF-IDF? 📊 medium
Answer: Term Frequency–Inverse Document Frequency. Weights words by how often they appear in a document (TF) and how rare across corpus (IDF). Highlights important words.
TF = (term count)/total terms IDF = log(N/df)
6 Word embeddings vs one-hot encoding? 📊 medium
Answer: One-hot creates large sparse vectors with no similarity. Embeddings (word2vec, GloVe) are dense, low-dimensional, and capture semantic relationships (king - man + woman ≈ queen).
7 What is Word2Vec and its two architectures? 📊 medium
Answer: Word2Vec predicts word embeddings. CBOW predicts target from context; Skip-gram predicts context from target. Skip-gram works better for rare words.
8 Define perplexity in language models. 🔥 hard
Answer: Perplexity measures how well a probability model predicts a sample. Lower perplexity means better generalization. It's 2^(cross-entropy).
9 What is an N-gram model? 📊 medium
Answer: An N-gram predicts next word using previous N-1 words. Unigram (1), bigram (2), trigram (3). Simple but suffers from sparsity.
10 Name common POS tagging algorithms. 📊 medium
Answer: Hidden Markov Models (HMM), Conditional Random Fields (CRF), and deep learning (BiLSTM + CRF, transformer-based).
11 What is Named Entity Recognition (NER)? ⚡ easy
Answer: NER locates and classifies entities (person, org, location) in text. e.g., "Apple" as ORG.
12 Explain the attention mechanism. 🔥 hard
Answer: Attention allows model to focus on relevant parts of input when producing output. Computes weighted sum of values based on query-key similarity. Self-attention = attention within same sequence.
Attention(Q,K,V) = softmax(QK^T/√d) V
13 Transformer architecture in one sentence? 🔥 hard
Answer: The Transformer uses multi-head self-attention and feedforward layers, no recurrence, enabling parallelization and long-range dependencies.
14 How does BERT differ from GPT? 🔥 hard
Answer: BERT is encoder-only, bidirectional (masked LM), excels at understanding tasks (classification, QA). GPT is decoder-only, autoregressive (left-to-right), designed for generation.
15 What is the purpose of masked language modeling? 📊 medium
Answer: MLM (used in BERT) masks random tokens and trains model to predict them, forcing bidirectional context. Great for learning deep representations.
16 What is sentiment analysis? ⚡ easy
Answer: Classifying text polarity (positive, negative, neutral). Often uses LSTMs, transformers, or lexicon-based methods.
17 Explain beam search in text generation. 🔥 hard
Answer: Beam search keeps top-k hypotheses at each step, reducing risk of missing high-probability sequences. k is beam width. Balances diversity and optimality.
18 What is BLEU score? 📊 medium
Answer: BLEU (bilingual evaluation understudy) measures n-gram overlap between generated and reference text. Common for translation, summarization. Ranges 0–1.
19 Coreference resolution – define. 🔥 hard
Answer: Identifying when two expressions refer to the same entity. e.g., "John said he would come" → "John" and "he" corefer.
20 What are some challenges in NLP? 📊 medium
Answer: Ambiguity, context, sarcasm, low-resource languages, bias in models, and commonsense reasoning. Still active research.
🔄 ambiguity 🧠 commonsense 🌍 low-resource

NLP interview cheat sheet

  • Tokenization / stemming / lemmatization
  • BoW, TF-IDF, embeddings
  • RNN / LSTM / Attention
  • Transformers (BERT, GPT)
  • NER, POS, sentiment
  • Evaluation: BLEU, perplexity

Pro tip: Understand trade-offs between classical and deep learning approaches in NLP.

20/20 NLP questions Text Processing