A great way to practice data analysis in R is to go through the entire process yourself: generating your own dataset, cleaning it, and then performing some analysis or visualization. This gives you hands-on experience with the full data workflow.
You can start by making a simple dataset directly in R. For example, let’s simulate some student exam scores:
# Create a dataset of student names and their exam scores
students <- data.frame( name = c("Alice", "Bob", "Charlie", "Diana", "Ethan", "Fiona"),
score = c(85, 92, 76, 88, 95, NA) # Notice one missing value
)
# View the dataset
students
Real datasets often have missing values or inconsistencies. Here we’ll check for missing data and handle it:
# Check for missing values
is.na(students$score)
# Replace missing scores with the average score
students$score[is.na(students$score)] <- mean(students$score, na.rm = TRUE)#
# Confirm cleaned dataset
students
Now that the data is clean, you can calculate summary statistics:
# Calculate mean, median, and standard deviation
mean(students$score)
median(students$score)
sd(students$score)
You can also test a hypothesis — for example, whether the average score is different from 80:
t.test(students$score, mu = 80)
Visualizations make it easier to understand patterns:
# Plot a histogram of scores
hist(students$score,
main = "Distribution of Exam Scores",
xlab = "Scores",
col = "lightblue",
border = "black")