Data Visualization
Visualization principles, Matplotlib, Seaborn, and Plotly for exploratory analysis.
Principles for Clear & Honest Data Visualizations
Choosing the Right Chart Type
Your choice of chart should reflect the relationship you want to show:
- Comparison: bar chart, grouped bar, line chart for time series.
- Distribution: histogram, box plot, violin plot, density plot.
- Relationship: scatter plot, bubble chart, heatmap.
- Composition: stacked bar chart, 100% stacked bar (avoid pie charts for many categories).
Color & Human Perception
Colors should encode information, not distract. Use color sparingly to highlight important elements and respect accessibility (color‑blind safe palettes).
- Use a neutral base color and a strong accent color for highlights.
- Avoid using too many categorical colors; group or filter instead.
- Don’t rely on color alone to encode critical information; use shapes or labels as well.
Avoid Misleading Visualizations
Small formatting decisions can significantly change the message. Always aim for honest, reproducible charts that match the underlying data.
- Start axes at zero when comparing magnitudes in bar charts.
- Keep aspect ratios reasonable so slopes are not exaggerated.
- Label units, time ranges and filters clearly.
First Steps with Matplotlib Plots
Line & Bar Charts
Matplotlib offers both a stateful interface via plt and an object‑oriented API
using figure and axes objects. For quick experiments, the stateful style is fine; for dashboards
and reusable plots, prefer fig, ax = plt.subplots() and call methods on
ax for full control over each chart element.
import matplotlib.pyplot as plt
months = ["Jan", "Feb", "Mar", "Apr"]
sales = [100, 120, 90, 150]
plt.figure(figsize=(8, 4))
plt.plot(months, sales, marker="o", color="#e67e22")
plt.title("Monthly Sales")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Histograms
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(loc=0, scale=1, size=1000)
plt.figure(figsize=(6, 4))
plt.hist(data, bins=30, color="#3498db", edgecolor="black", alpha=0.8)
plt.title("Histogram of Normal Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()