Related Data Science Links
Learn Python Data Science Tutorial, validate concepts with Python Data Science MCQ Questions, and prepare interviews through Python Data Science Interview Questions and Answers.
Python Basics Every Data Scientist Should Know
Python is the main programming language for Data Science. In this lesson you will review the core syntax, data structures and patterns used in almost every DS project.
Core Python Data Structures
Data structures are building blocks of any program. You mainly use lists, tuples, dictionaries and sets.
# List: ordered, mutable
numbers = [1, 2, 3, 4]
numbers.append(5)
# Tuple: ordered, immutable
point = (10, 20)
# Dictionary: key-value mapping
user = {"name": "Alice", "age": 25}
# Set: unique elements, no order
tags = {"python", "data", "python"}
print("numbers:", numbers)
print("point:", point)
print("user:", user)
print("tags:", tags)
Control Flow & List Comprehensions
Loops and conditions are standard, but in Data Science you will often see list comprehensions for concise transformations.
nums = [1, 2, 3, 4, 5]
# Classic loop
squares = []
for n in nums:
squares.append(n**2)
# List comprehension
squares2 = [n**2 for n in nums]
evens = [n for n in nums if n % 2 == 0]
print("squares:", squares)
print("squares2:", squares2)
print("evens:", evens)
Functions & Reusable Code
Functions help you organize repeated logic, like cleaning data or computing a metric.
def clean_column(values, fill_value=0):
"""Replace None with fill_value and cast to float."""
cleaned = []
for v in values:
if v is None:
cleaned.append(float(fill_value))
else:
cleaned.append(float(v))
return cleaned
raw = [1, None, 2.5, "3.0"]
print(clean_column(raw, fill_value=0))
NumPy & pandas: Working with Data
For Data Science you almost always rely on NumPy for numerical arrays and pandas for tabular data.
import numpy as np
import pandas as pd
# NumPy array
arr = np.array([[1, 2, 3],
[4, 5, 6]])
print("Array shape:", arr.shape)
print("Column means:", arr.mean(axis=0))
# pandas DataFrame
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
})
print(df.head())
print(df[["age", "salary"]].describe())