Natural Language Processing: Interview Q&A

Short questions and answers on NLP fundamentals: text preprocessing, embeddings, language models and transformers.

Tokens Embeddings Sequence Tasks Transformers

1 What is NLP in one sentence? ⚡ Beginner

Answer: NLP is the field that teaches computers to understand, generate and work with human language.

2 What is tokenization and why is it needed? ⚡ Beginner

Answer: Tokenization splits text into basic units (words, subwords, characters) so models can operate on discrete symbols.

3 What are word embeddings? ⚡ Beginner

Answer: Embeddings map tokens to dense vectors that capture semantic similarity (similar words have similar vectors).

4 Name some classical NLP tasks. ⚡ Beginner

Answer: Examples: text classification, sentiment analysis, NER, machine translation, summarization, question answering.

5 What problem do subword tokenization methods like BPE or WordPiece solve? 🔥 Advanced

Answer: They handle out-of-vocabulary words and rich morphology by representing words as smaller, reusable subword units.

6 What is the difference between bag-of-words and embeddings? 📊 Intermediate

Answer: Bag-of-words uses sparse counts ignoring word order; embeddings use dense vectors capturing semantics and can be ordered in sequences.

7 What is a language model? 📊 Intermediate

Answer: A language model assigns probabilities to sequences of tokens and can generate likely next tokens.

8 What is the main idea behind the transformer architecture? 🔥 Advanced

Answer: Transformers use self-attention to let each token attend to all others, capturing long-range dependencies without recurrence.

9 What is self-attention in simple terms? 📊 Intermediate

Answer: Self-attention computes a weighted combination of all token representations for each token, where weights indicate relevance.

10 What is BERT and how is it trained? 🔥 Advanced

Answer: BERT is a bidirectional transformer encoder trained with masked language modeling and next sentence prediction.

11 What does fine-tuning mean in the context of pre-trained NLP models? 📊 Intermediate

Answer: Fine-tuning takes a pre-trained language model and trains it a bit more on a specific downstream task with task-specific heads.

12 Why is handling OOV (out-of-vocabulary) words important in NLP? 📊 Intermediate

Answer: Real text constantly introduces new words, names and typos; without OOV strategies the model would treat them all as unknown with no nuance.

13 What is the difference between sequence-to-sequence and sequence labeling tasks? 🔥 Advanced

Answer: Seq2seq maps an input sequence to a different-length output sequence (e.g., translation); sequence labeling assigns a label per input token (e.g., NER).

14 What are some common evaluation metrics in NLP? ⚡ Beginner

Answer: Metrics: accuracy, precision/recall/F1 for classification/NER, BLEU/ROUGE for translation/summarization, perplexity for language models.

15 Why is context important for understanding word meaning? ⚡ Beginner

Answer: Many words are polysemous (multiple meanings); their correct sense depends on neighboring words and sentence structure.

16 What are contextual embeddings (e.g., from BERT) vs static embeddings (e.g., word2vec)? 🔥 Advanced

Answer: Static embeddings give one vector per word type; contextual embeddings give different vectors per occurrence depending on context.

17 Name some common sources of bias in NLP models. 🔥 Advanced

Answer: Bias can stem from imbalanced training data, historical stereotypes, annotation artifacts and spurious correlations.

18 Give a real-world use case where NLP is central. ⚡ Beginner

Answer: Examples: chatbots, search engines, document classification, sentiment analysis in social media.

19 When would classical NLP methods be preferable to large transformers? 📊 Intermediate

Answer: For small datasets, strict latency/compute limits, or narrow tasks where simpler models are easier to deploy and interpret.

20 What is the key message to remember about NLP today? ⚡ Beginner

Answer: Modern NLP combines good text preprocessing, robust embeddings and powerful sequence models; understanding each layer helps you design effective solutions.

Quick Recap: NLP

If you can explain tokens → embeddings → sequence models → task heads, you have a clear mental map for most practical NLP systems.

Back: RL Q&A Next: Time Series Q&A