R for Data Science Beginner
~12 min read

R Basics for Data Analysis & Visualization

R is a popular language for statistics and data visualization. The tidyverse ecosystem makes working with data frames and plots very productive.

Data Frames & Tidyverse

A data frame in R is similar to a pandas DataFrame in Python. The tidyverse collection of packages provides a modern, consistent grammar for data manipulation and visualization.

In the tidyverse philosophy, each column is a variable, each row is an observation and each table is one type of observational unit. This makes it easier to reason about transformations and to chain multiple operations together using the pipe operator %>%.

# install.packages("tidyverse") # run once
library(tidyverse)

df <- tibble(
  name   = c("Alice", "Bob", "Charlie"),
  age    = c(25, 30, 35),
  salary = c(50000, 60000, 70000)
)

print(df)

# Filter and mutate
df2 <- df %>%
  filter(salary > 55000) %>%
  mutate(age_group = if_else(age < 30, "Young", "Experienced"))

print(df2)

Visualization with ggplot2

ggplot2 implements the Grammar of Graphics and is one of the most powerful plotting libraries for data analysis.

library(tidyverse)

# Built-in dataset
data("mtcars")

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
  geom_point(size = 3) +
  labs(
    title = "Fuel Efficiency vs Weight",
    x = "Weight (1000 lbs)",
    y = "Miles per Gallon",
    color = "Cylinders"
  ) +
  theme_minimal()