Anomaly Detection

Anomaly detection identifies rare observations that deviate significantly from the majority of the data, such as fraud, network intrusions or faulty sensors.

Real-World Use Cases

Credit card fraud detection.
Network intrusion detection.
Industrial equipment fault monitoring.
Medical anomaly detection (rare diseases, unusual lab results).

Isolation Forest

Isolation Forest isolates anomalies by randomly partitioning the feature space; anomalies are easier to isolate and thus have shorter average path lengths in the trees.

IsolationForest with scikit-learn

from sklearn.ensemble import IsolationForest

iso = IsolationForest(
    n_estimators=200,
    contamination=0.02,
    random_state=42
)
iso.fit(X_train)

scores = iso.decision_function(X_test)
labels = iso.predict(X_test)  # -1 = anomaly, 1 = normal

One-Class SVM

One‑Class SVM learns a decision boundary around the "normal" class and flags points that lie outside this region as anomalies.

from sklearn.svm import OneClassSVM

ocsvm = OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)
ocsvm.fit(X_train_normal)

pred = ocsvm.predict(X_test)  # -1 anomaly, 1 normal

Evaluating Anomaly Detectors

Evaluation is tricky because anomalies are rare and labels may be incomplete.

Use precision‑recall curves instead of accuracy for highly imbalanced data.
Work closely with domain experts to validate flagged anomalies.
Consider cost‑sensitive metrics (false negatives are often more expensive than false positives).

Previous: Dimensionality Reduction Next: Reinforcement Learning