Python for Data Science Beginner
~15 min read

Python Basics Every Data Scientist Should Know

Python is the main programming language for Data Science. In this lesson you will review the core syntax, data structures and patterns used in almost every DS project.

Core Python Data Structures

Data structures are building blocks of any program. You mainly use lists, tuples, dictionaries and sets.

# List: ordered, mutable
numbers = [1, 2, 3, 4]
numbers.append(5)

# Tuple: ordered, immutable
point = (10, 20)

# Dictionary: key-value mapping
user = {"name": "Alice", "age": 25}

# Set: unique elements, no order
tags = {"python", "data", "python"}

print("numbers:", numbers)
print("point:", point)
print("user:", user)
print("tags:", tags)

Control Flow & List Comprehensions

Loops and conditions are standard, but in Data Science you will often see list comprehensions for concise transformations.

nums = [1, 2, 3, 4, 5]

# Classic loop
squares = []
for n in nums:
    squares.append(n**2)

# List comprehension
squares2 = [n**2 for n in nums]
evens = [n for n in nums if n % 2 == 0]

print("squares:", squares)
print("squares2:", squares2)
print("evens:", evens)

Functions & Reusable Code

Functions help you organize repeated logic, like cleaning data or computing a metric.

def clean_column(values, fill_value=0):
    """Replace None with fill_value and cast to float."""
    cleaned = []
    for v in values:
        if v is None:
            cleaned.append(float(fill_value))
        else:
            cleaned.append(float(v))
    return cleaned

raw = [1, None, 2.5, "3.0"]
print(clean_column(raw, fill_value=0))

NumPy & pandas: Working with Data

For Data Science you almost always rely on NumPy for numerical arrays and pandas for tabular data.

import numpy as np
import pandas as pd

# NumPy array
arr = np.array([[1, 2, 3],
                [4, 5, 6]])
print("Array shape:", arr.shape)
print("Column means:", arr.mean(axis=0))

# pandas DataFrame
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000]
})

print(df.head())
print(df[["age", "salary"]].describe())