NLP Exercises – Topic-wise Coding Practice
Strengthen your Natural Language Processing skills with short, topic-wise coding exercises on preprocessing, text representation, classification, sequence labeling, transformers and real-world applications in Python.
1. Text Preprocessing Exercises
Write a function that takes a raw text string and returns a cleaned version with lowercase letters, removed HTML tags, normalized whitespace and optional stopword removal. Test it on at least five noisy example sentences and show before/after pairs.
Given a list of words, print their stemmed and lemmatized forms side-by-side to see how the algorithms differ on English text. Summarize which words are over‑stemmed or incorrectly lemmatized and discuss when this might hurt downstream tasks.
Define a preprocessing pipeline for Twitter or social media data that handles mentions, hashtags, emojis, URLs and repeated characters. Specify the exact steps and justify which elements you normalize, which you keep and why.
Collect examples of visually similar but distinct Unicode characters (different quotes, dashes, accented characters). Describe or implement a normalization strategy that maps them to a canonical form and explain its impact on tokenization.
2. Text Representation & Embeddings Exercises
Using a small corpus of 5–10 sentences, build both bag-of-words and TF-IDF matrices and print them to compare how frequent and rare words are weighted.
Load pre-trained GloVe or Word2Vec embeddings and write a function that returns the top-5 most similar words to a query token using cosine similarity. Run it on at least three different query words and interpret the neighbours you get.
Given pre-trained word embeddings, build simple document embeddings by averaging word vectors. Compare cosine similarity between document pairs from similar and different topics.
Design and implement at least two strategies for handling words that are not present in your embedding vocabulary (e.g., <UNK> token, subword composition). Compare their effect on a simple downstream task.
3. Text Classification Exercises
Using any small labeled sentiment dataset, train a logistic regression classifier with TF-IDF features and print confusion matrix and F1-score.
Train on one domain (e.g., movie reviews) and evaluate on another (e.g., product reviews). Analyze performance drop and write a short explanation of why it happens.
Collect at least 20 misclassified texts from your classifier and categorize error types (sarcasm, domain shift, negation, label noise). Propose at least three concrete changes to reduce these errors.
Take a classification dataset with imbalanced labels and experiment with at least two techniques such as class weights, undersampling or oversampling. Compare macro F1 before and after applying each technique.
4. Sequence Labeling & NER Exercises
Use spaCy or a Hugging Face NER model to detect entities in a paragraph and print the text with entities wrapped in color-coded brackets such as <ORG>Apple</ORG>.
Given a list of entity spans (start, end, label) and the tokenized text, write a function that converts them into BIO tags and verify your tags on a few examples.
Run the same off‑the‑shelf NER model on news text, scientific abstracts and social media posts. Manually inspect outputs and list at least five typical failure cases per domain.
For a chosen domain (e.g., resumes, finance, healthcare), design a custom NER label set and write guidelines describing when to assign each label. Create at least 10 manually annotated example sentences with your schema.
5. Transformers & Fine-tuning Exercises
Run a small sentence through a BERT model and extract attention weights for a chosen head. Visualize them as a matrix to see which words attend to which tokens.
Fine-tune a small transformer (e.g., DistilBERT) on a sentiment or classification dataset with 1–2 epochs and track training/validation curves. Experiment with different learning rates and batch sizes and compare results.
Design at least five different prompts to solve the same NLP task (e.g., sentiment, extraction) with a generative model. Compare outputs and discuss which prompt patterns work best and why.
Measure average inference time for a transformer model on short vs long inputs. Plot latency against sequence length and discuss trade‑offs for real‑time applications.
6. Applications & Mini Project Exercises
Create a small command-line script that reads a sentence from the user and prints predicted sentiment using a pre-trained model.
Load one article as context and let users ask questions from the terminal. Use a QA pipeline and print both the answer and confidence score.
Design a simple evaluation dashboard (wireframe or written description) to monitor an NLP system in production. List the key metrics, error slices and logs you would track over time.
Create a checklist of at least 10 questions to evaluate the ethical impact of an NLP application (bias, privacy, misuse). Apply it to one of your own projects and summarize the main risks you discovered.