Machine Learning
Anomaly Detection
Fraud & Intrusion
Anomaly Detection
Anomaly detection identifies rare observations that deviate significantly from the majority of the data, such as fraud, network intrusions or faulty sensors.
Real-World Use Cases
- Credit card fraud detection.
- Network intrusion detection.
- Industrial equipment fault monitoring.
- Medical anomaly detection (rare diseases, unusual lab results).
Isolation Forest
Isolation Forest isolates anomalies by randomly partitioning the feature space; anomalies are easier to isolate and thus have shorter average path lengths in the trees.
IsolationForest with scikit-learn
from sklearn.ensemble import IsolationForest
iso = IsolationForest(
n_estimators=200,
contamination=0.02,
random_state=42
)
iso.fit(X_train)
scores = iso.decision_function(X_test)
labels = iso.predict(X_test) # -1 = anomaly, 1 = normal
One-Class SVM
One‑Class SVM learns a decision boundary around the "normal" class and flags points that lie outside this region as anomalies.
from sklearn.svm import OneClassSVM
ocsvm = OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)
ocsvm.fit(X_train_normal)
pred = ocsvm.predict(X_test) # -1 anomaly, 1 normal
Evaluating Anomaly Detectors
Evaluation is tricky because anomalies are rare and labels may be incomplete.
- Use precision‑recall curves instead of accuracy for highly imbalanced data.
- Work closely with domain experts to validate flagged anomalies.
- Consider cost‑sensitive metrics (false negatives are often more expensive than false positives).