Related Data Science Links
Learn R Data Science Tutorial, validate concepts with R Data Science MCQ Questions, and prepare interviews through R Data Science Interview Questions and Answers.
R Basics for Data Analysis & Visualization
R is a popular language for statistics and data visualization. The tidyverse ecosystem makes working with data frames and plots very productive.
Data Frames & Tidyverse
A data frame in R is similar to a pandas DataFrame in Python. The
tidyverse collection of packages provides a modern, consistent grammar for
data manipulation and visualization.
In the tidyverse philosophy, each column is a variable, each row is an observation and each table
is one type of observational unit. This makes it easier to reason about transformations and to
chain multiple operations together using the pipe operator %>%.
# install.packages("tidyverse") # run once
library(tidyverse)
df <- tibble(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000)
)
print(df)
# Filter and mutate
df2 <- df %>%
filter(salary > 55000) %>%
mutate(age_group = if_else(age < 30, "Young", "Experienced"))
print(df2)
Visualization with ggplot2
ggplot2 implements the Grammar of Graphics and is one of the most powerful plotting libraries for data analysis.
library(tidyverse)
# Built-in dataset
data("mtcars")
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point(size = 3) +
labs(
title = "Fuel Efficiency vs Weight",
x = "Weight (1000 lbs)",
y = "Miles per Gallon",
color = "Cylinders"
) +
theme_minimal()