NLP Interview
20 short answers
essential prep
NLP Interview: 20 Essential Q&A
Master Natural Language Processing fundamentals, from preprocessing to transformers. Short, crisp answers for interview success.
tokenization
word2vec
BERT
transformers
sentiment
1
What is tokenization in NLP?
⚡ easy
Answer: Tokenization splits text into smaller units (tokens) – words, subwords, or characters. Example: "NLP is fun" → ["NLP", "is", "fun"]. Essential for preprocessing.
from nltk.tokenize import word_tokenize; word_tokenize("NLP rocks!")
preprocessing
splitting
2
Difference between stemming and lemmatization?
⚡ easy
Answer: Stemming chops off affixes (rule-based, e.g., "running" → "run"). Lemmatization uses vocabulary/dictionary to return base form ("better" → "good"). Lemmatization is more accurate.
3
What is a stop word?
⚡ easy
Answer: Common words (the, is, at) that are often removed because they carry little semantic meaning. But context matters – not always removed.
4
Explain Bag-of-Words (BoW) model.
📊 medium
Answer: BoW represents text as a multiset of words, ignoring order and grammar. Creates a sparse vector of word counts. Simple but loses context.
5
What is TF-IDF?
📊 medium
Answer: Term Frequency–Inverse Document Frequency. Weights words by how often they appear in a document (TF) and how rare across corpus (IDF). Highlights important words.
TF = (term count)/total terms
IDF = log(N/df)
6
Word embeddings vs one-hot encoding?
📊 medium
Answer: One-hot creates large sparse vectors with no similarity. Embeddings (word2vec, GloVe) are dense, low-dimensional, and capture semantic relationships (king - man + woman ≈ queen).
7
What is Word2Vec and its two architectures?
📊 medium
Answer: Word2Vec predicts word embeddings. CBOW predicts target from context; Skip-gram predicts context from target. Skip-gram works better for rare words.
8
Define perplexity in language models.
🔥 hard
Answer: Perplexity measures how well a probability model predicts a sample. Lower perplexity means better generalization. It's 2^(cross-entropy).
9
What is an N-gram model?
📊 medium
Answer: An N-gram predicts next word using previous N-1 words. Unigram (1), bigram (2), trigram (3). Simple but suffers from sparsity.
10
Name common POS tagging algorithms.
📊 medium
Answer: Hidden Markov Models (HMM), Conditional Random Fields (CRF), and deep learning (BiLSTM + CRF, transformer-based).
11
What is Named Entity Recognition (NER)?
⚡ easy
Answer: NER locates and classifies entities (person, org, location) in text. e.g., "Apple" as ORG.
12
Explain the attention mechanism.
🔥 hard
Answer: Attention allows model to focus on relevant parts of input when producing output. Computes weighted sum of values based on query-key similarity. Self-attention = attention within same sequence.
Attention(Q,K,V) = softmax(QK^T/√d) V
13
Transformer architecture in one sentence?
🔥 hard
Answer: The Transformer uses multi-head self-attention and feedforward layers, no recurrence, enabling parallelization and long-range dependencies.
14
How does BERT differ from GPT?
🔥 hard
Answer: BERT is encoder-only, bidirectional (masked LM), excels at understanding tasks (classification, QA). GPT is decoder-only, autoregressive (left-to-right), designed for generation.
15
What is the purpose of masked language modeling?
📊 medium
Answer: MLM (used in BERT) masks random tokens and trains model to predict them, forcing bidirectional context. Great for learning deep representations.
16
What is sentiment analysis?
⚡ easy
Answer: Classifying text polarity (positive, negative, neutral). Often uses LSTMs, transformers, or lexicon-based methods.
17
Explain beam search in text generation.
🔥 hard
Answer: Beam search keeps top-k hypotheses at each step, reducing risk of missing high-probability sequences. k is beam width. Balances diversity and optimality.
18
What is BLEU score?
📊 medium
Answer: BLEU (bilingual evaluation understudy) measures n-gram overlap between generated and reference text. Common for translation, summarization. Ranges 0–1.
19
Coreference resolution – define.
🔥 hard
Answer: Identifying when two expressions refer to the same entity. e.g., "John said he would come" → "John" and "he" corefer.
20
What are some challenges in NLP?
📊 medium
Answer: Ambiguity, context, sarcasm, low-resource languages, bias in models, and commonsense reasoning. Still active research.
🔄 ambiguity
🧠 commonsense
🌍 low-resource
NLP interview cheat sheet
- Tokenization / stemming / lemmatization
- BoW, TF-IDF, embeddings
- RNN / LSTM / Attention
- Transformers (BERT, GPT)
- NER, POS, sentiment
- Evaluation: BLEU, perplexity
Pro tip: Understand trade-offs between classical and deep learning approaches in NLP.
20/20 NLP questions
Text Processing