NLP Q&A 20 Core Questions
Interview Prep

Natural Language Processing: Interview Q&A

Short questions and answers on NLP fundamentals: text preprocessing, embeddings, language models and transformers.

Tokens Embeddings Sequence Tasks Transformers
1 What is NLP in one sentence? ⚡ Beginner
Answer: NLP is the field that teaches computers to understand, generate and work with human language.
2 What is tokenization and why is it needed? ⚡ Beginner
Answer: Tokenization splits text into basic units (words, subwords, characters) so models can operate on discrete symbols.
3 What are word embeddings? ⚡ Beginner
Answer: Embeddings map tokens to dense vectors that capture semantic similarity (similar words have similar vectors).
4 Name some classical NLP tasks. ⚡ Beginner
Answer: Examples: text classification, sentiment analysis, NER, machine translation, summarization, question answering.
5 What problem do subword tokenization methods like BPE or WordPiece solve? 🔥 Advanced
Answer: They handle out-of-vocabulary words and rich morphology by representing words as smaller, reusable subword units.
6 What is the difference between bag-of-words and embeddings? 📊 Intermediate
Answer: Bag-of-words uses sparse counts ignoring word order; embeddings use dense vectors capturing semantics and can be ordered in sequences.
7 What is a language model? 📊 Intermediate
Answer: A language model assigns probabilities to sequences of tokens and can generate likely next tokens.
8 What is the main idea behind the transformer architecture? 🔥 Advanced
Answer: Transformers use self-attention to let each token attend to all others, capturing long-range dependencies without recurrence.
9 What is self-attention in simple terms? 📊 Intermediate
Answer: Self-attention computes a weighted combination of all token representations for each token, where weights indicate relevance.
10 What is BERT and how is it trained? 🔥 Advanced
Answer: BERT is a bidirectional transformer encoder trained with masked language modeling and next sentence prediction.
11 What does fine-tuning mean in the context of pre-trained NLP models? 📊 Intermediate
Answer: Fine-tuning takes a pre-trained language model and trains it a bit more on a specific downstream task with task-specific heads.
12 Why is handling OOV (out-of-vocabulary) words important in NLP? 📊 Intermediate
Answer: Real text constantly introduces new words, names and typos; without OOV strategies the model would treat them all as unknown with no nuance.
13 What is the difference between sequence-to-sequence and sequence labeling tasks? 🔥 Advanced
Answer: Seq2seq maps an input sequence to a different-length output sequence (e.g., translation); sequence labeling assigns a label per input token (e.g., NER).
14 What are some common evaluation metrics in NLP? ⚡ Beginner
Answer: Metrics: accuracy, precision/recall/F1 for classification/NER, BLEU/ROUGE for translation/summarization, perplexity for language models.
15 Why is context important for understanding word meaning? ⚡ Beginner
Answer: Many words are polysemous (multiple meanings); their correct sense depends on neighboring words and sentence structure.
16 What are contextual embeddings (e.g., from BERT) vs static embeddings (e.g., word2vec)? 🔥 Advanced
Answer: Static embeddings give one vector per word type; contextual embeddings give different vectors per occurrence depending on context.
17 Name some common sources of bias in NLP models. 🔥 Advanced
Answer: Bias can stem from imbalanced training data, historical stereotypes, annotation artifacts and spurious correlations.
18 Give a real-world use case where NLP is central. ⚡ Beginner
Answer: Examples: chatbots, search engines, document classification, sentiment analysis in social media.
19 When would classical NLP methods be preferable to large transformers? 📊 Intermediate
Answer: For small datasets, strict latency/compute limits, or narrow tasks where simpler models are easier to deploy and interpret.
20 What is the key message to remember about NLP today? ⚡ Beginner
Answer: Modern NLP combines good text preprocessing, robust embeddings and powerful sequence models; understanding each layer helps you design effective solutions.

Quick Recap: NLP

If you can explain tokens → embeddings → sequence models → task heads, you have a clear mental map for most practical NLP systems.