Course Navigation

Plots:

Build a Data Analysis Project in R

A great way to practice data analysis in R is to go through the entire process yourself: generating your own dataset, cleaning it, and then performing some analysis or visualization. This gives you hands-on experience with the full data workflow.

Step 1: Create Your Own Dataset

You can start by making a simple dataset directly in R. For example, let’s simulate some student exam scores:

# Create a dataset of student names and their exam scores
students <- data.frame(  name = c("Alice", "Bob", "Charlie", "Diana", "Ethan", "Fiona"), 
score = c(85, 92, 76, 88, 95, NA) # Notice one missing value
)

# View the dataset
students

Step 2: Clean the Data

Real datasets often have missing values or inconsistencies. Here we’ll check for missing data and handle it:

# Check for missing values
is.na(students$score)

# Replace missing scores with the average score
students$score[is.na(students$score)] <- mean(students$score, na.rm = TRUE)#

# Confirm cleaned dataset
students

Step 3: Perform Analysis

Now that the data is clean, you can calculate summary statistics:‍

# Calculate mean, median, and standard deviation
mean(students$score)
median(students$score)
sd(students$score)

You can also test a hypothesis — for example, whether the average score is different from 80:

t.test(students$score, mu = 80)

Step 4: Visualize the Data

Visualizations make it easier to understand patterns:

# Plot a histogram of scores
hist(students$score,      
main = "Distribution of Exam Scores",     
xlab = "Scores",     
col = "lightblue",     
border = "black")‍