Data Science

Data Visualization

Visualization principles, Matplotlib, Seaborn, and Plotly for exploratory analysis.

Principles for Clear & Honest Data Visualizations

Choosing the Right Chart Type

Your choice of chart should reflect the relationship you want to show:

  • Comparison: bar chart, grouped bar, line chart for time series.
  • Distribution: histogram, box plot, violin plot, density plot.
  • Relationship: scatter plot, bubble chart, heatmap.
  • Composition: stacked bar chart, 100% stacked bar (avoid pie charts for many categories).

Color & Human Perception

Colors should encode information, not distract. Use color sparingly to highlight important elements and respect accessibility (color‑blind safe palettes).

  • Use a neutral base color and a strong accent color for highlights.
  • Avoid using too many categorical colors; group or filter instead.
  • Don’t rely on color alone to encode critical information; use shapes or labels as well.

Avoid Misleading Visualizations

Small formatting decisions can significantly change the message. Always aim for honest, reproducible charts that match the underlying data.

  • Start axes at zero when comparing magnitudes in bar charts.
  • Keep aspect ratios reasonable so slopes are not exaggerated.
  • Label units, time ranges and filters clearly.

First Steps with Matplotlib Plots

Line & Bar Charts

Matplotlib offers both a stateful interface via plt and an object‑oriented API using figure and axes objects. For quick experiments, the stateful style is fine; for dashboards and reusable plots, prefer fig, ax = plt.subplots() and call methods on ax for full control over each chart element.

import matplotlib.pyplot as plt

months = ["Jan", "Feb", "Mar", "Apr"]
sales = [100, 120, 90, 150]

plt.figure(figsize=(8, 4))
plt.plot(months, sales, marker="o", color="#e67e22")
plt.title("Monthly Sales")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Histograms

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(loc=0, scale=1, size=1000)

plt.figure(figsize=(6, 4))
plt.hist(data, bins=30, color="#3498db", edgecolor="black", alpha=0.8)
plt.title("Histogram of Normal Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()

Seaborn for Data Visualization

Plotly Interactive Visualization