Text Classification Tutorial Section

Text Classification

Master text classification from spam detection to news categorization using Naive Bayes, Logistic Regression, and BERT.

The Foundation of Categorization

Text Classification is the task of automatically assigning a predefined category label to a piece of text. It is one of the most commercially valuable NLP applications.

Spam Detection

Email → Spam or Ham

News Categories

Article → Sports / Tech / Politics

Sentiment

Review → Positive / Negative

Language ID

Text → English / French / Hindi

The Standard Pipeline

Raw Text
Preprocess
Vectorize
Train Model
Predict Label

Level 1 — Naive Bayes (Classic Baseline)

Naive Bayes calculates the probability that each label generated the words in the document. It is extremely fast and works well for spam detection.

How Naive Bayes Thinks

Email: "Free money prize win!"

Label P("Free") P("money") P("prize") Combined
SPAM 0.35 0.40 0.45 0.063
HAM 0.02 0.05 0.01 0.00001

SPAM score is much higher → Email classified as SPAM!

Python: Scikit-Learn Naive Bayes
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

train_texts = [
    "Win a free iPhone now!", "Click here for free money",
    "Limited time offer, claim your prize",
    "Meeting rescheduled to 3pm", "Please review the attached report",
    "Can we connect for a quick call tomorrow?"
]
train_labels = ["spam", "spam", "spam", "ham", "ham", "ham"]

model = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english')),
    ('clf', MultinomialNB())
])
model.fit(train_texts, train_labels)

test_texts = ["Get your free reward now!", "Let's schedule a team meeting"]
for text, label in zip(test_texts, model.predict(test_texts)):
    icon = "SPAM" if label == "spam" else "HAM"
    print(f"[{icon}] {text}")

Level 2 — Zero-Shot BERT (No Labels Needed!)

Zero-Shot Classification with BERT lets you categorize text into ANY label you describe in plain English — without needing ANY labelled training examples!
Python: Hugging Face Zero-Shot
from transformers import pipeline

classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

text = "Apple announced record quarterly earnings driven by iPhone sales."
labels = ["Technology", "Finance", "Sports", "Politics"]
result = classifier(text, labels)

print("Label Scores:")
for label, score in zip(result['labels'], result['scores']):
    bar = "█" * int(score * 30)
    print(f"  {label:<12} {score:.1%}  {bar}")