1 Introduction

This practical brings together what we’ve covered so far:

Practical 1: Data frames, including how to access columns and rows
Practical 2: R Markdown, including chunk options for controlling output
Chapter 3: ggplot2

The goal is to create a complete, reproducible analysis document that combines data exploration with professional visualisations.

1.1 Key concepts recap

In Practical 1, you learned about data frames — the fundamental structure for storing tabular data in R. You accessed columns using $ notation (e.g., mtcars$mpg) and explored built-in datasets.

In Practical 2, you learned how to create R Markdown documents that weave together code, output, and narrative text. You used chunk options to control what appears in your final document:

Goal	Chunk option
Hide code, show output	`#\| echo: false`
Hide everything	`#\| include: false`
Figure caption	`#\| fig.cap: "..."`
Figure size	`#\| fig.width`, `#\| fig.height`

In lectures, you learned about ggplot2 and the grammar of graphics:

ggplot(data, aes(x = var1, y = var2)) +
  geom_*() +
  labs(x = "X Label", y = "Y Label", title = "Title")

Now we combine all three: create ggplot2 visualisations inside an R Markdown document, using chunk options to produce professional output.

2 Creating ggplots in R Markdown

When you create a ggplot in an R Markdown code chunk, the plot is automatically included in your output document. Here’s an example:

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(x = "Weight (1000 lbs)", y = "Miles per Gallon")

Figure 2.1: Scatterplot of weight vs fuel efficiency.

A typical analysis workflow in R Markdown involves:

Load packages (in a setup chunk with #| include: false)
Import data (or use built-in datasets)
Explore and clean data (packages such as dplyr are useful here)
Create visualisations (with appropriate chunk options)
Interpret results (in the narrative text)

Tips for professional output:

Always add axis labels: Use labs() to make plots self-explanatory.
Use figure captions: Add #| fig.cap: "..." to your chunk options.
Control figure size: Use #| fig.width and #| fig.height for the actual dimensions, and #| out.width for how much space it takes.
Hide code for reports: Use #| echo: false so readers see only the visualisation, not the code. Note: in these practicals, especially the solutions, the code is sometimes included for illustration. In the assignments and exam, whether you need to hide the code will be specified.

3 Exercises

In this exercise, you will create an R Markdown document that contains 7 different types of visualisations using the mpg dataset (fuel economy data for 234 vehicles). Your final document should include:

The 7 plots (one for each question below)
A brief written interpretation after each plot
Professional formatting using chunk options

Getting started:

Create a new R Markdown document: File > New File > R Markdown…
Choose “Document” and “PDF” output
Save it as practical03_plots.Rmd
In your setup chunk, load ggplot2:

```{r setup}
#| include: false
library(ggplot2)
```

For each question:

Create the plot in its own code chunk
Add a figure caption using #| fig.cap: "..."
Hide the code using #| echo: false
Write 1-2 sentences interpreting what you observe

When you have completed all 7 plots, knit your document to PDF.

Histogram: Create a histogram of cty using geom_histogram(). Experiment with different binwidth values (try 1, 2, and 5). Add appropriate axis labels using labs(). In your interpretation, describe whether the distribution is symmetric or skewed.

Answer: The distribution is right-skewed with most vehicles getting 15-20 city MPG.
```
ggplot(mpg, aes(x = cty)) +
  geom_histogram(binwidth = 2, fill = "steelblue", colour = "white") +
  labs(x = "City MPG", y = "Count")
```
Figure 3.1: Distribution of city fuel efficiency.
Density plot: Create a density plot of cty using geom_density(). Add a rug plot underneath using geom_rug(). In your interpretation, compare this to the histogram — what does the density plot show that the histogram doesn’t (and vice versa)?

Answer: Density plot shows a smoother view of the distribution shape. Histogram shows actual counts and is affected by bin width choice
```
ggplot(mpg, aes(x = cty)) +
  geom_density(fill = "lightblue", alpha = 0.5) +
  geom_rug() +
  labs(x = "City MPG", y = "Density")
```
Figure 3.2: Density of city fuel efficiency.
Bar chart: Create a bar chart of class (vehicle type) using geom_bar(). Add appropriate axis labels. In your interpretation, state which vehicle class is most common in the dataset.

Answer: SUV is the most common vehicle class
```
ggplot(mpg, aes(x = class)) +
  geom_bar(fill = "steelblue") +
  labs(x = "Vehicle Class", y = "Count")
```
Figure 3.3: Bar chart of vehicle type.
Scatterplot: Explore the relationship between cty and hwy. Create a scatterplot of cty ($y$-axis) vs hwy ($x$-axis) using geom_point(). Add a trend line using geom_smooth(method = "lm"). In your interpretation, describe the relationship you observe.

Answer: Strong positive linear relationship: higher highway MPG = higher city MPG
```
ggplot(mpg, aes(x = hwy, y = cty)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Highway MPG", y = "City MPG")
```
Figure 3.4: City fuel efficiency against highway fuel efficiency.
Line plot: Create a line plot using the economics dataset. Plot psavert (personal savings rate) over time using geom_line(). Add appropriate axis labels. In your interpretation, describe the trend you observe.

Answer: General declining trend from 1970s to 2005, then increase after 2008 crisis
```
ggplot(economics, aes(x = date, y = psavert)) +
  geom_line() +
  labs(x = "Date", y = "Personal Savings Rate (%)")
```
Figure 3.5: Personal saving rates over time.
Stripchart: Create a stripchart of cty by class using geom_jitter() with width = 0.2. Add appropriate axis labels.
```
ggplot(mpg, aes(x = class, y = cty)) +
  geom_jitter(width = 0.2) +
  labs(x = "Vehicle Class", y = "City MPG")
```
Figure 3.6: Boxplot of city fuel efficiency by vehicle class.
Boxplot: Compare the distribution of cty across vehicle classes using geom_boxplot() with aes(x = class, y = cty). Add appropriate axis labels. Combine the boxplot with a jitter plot (use outlier.shape = NA in the boxplot to avoid duplicate points). In your interpretation, state which vehicle class has the best city fuel efficiency.

Answer: Subcompact has the best median city fuel efficiency
```
ggplot(mpg, aes(x = class, y = cty)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5) +
  labs(x = "Vehicle Class", y = "City MPG")
```
Figure 3.7: Boxplot of city fuel efficiency by vehicle class, overlayed by stripchart / jitter plot.
Create a complete analysis document

Using another dataset such as mtcars or airquality, create a complete R Markdown document that includes:
1. An appropriate title and brief introduction
2. At least three different types of plots (e.g., histogram, boxplot, scatterplot)
3. Proper figure captions for each plot
4. Consistent figure sizing throughout
5. Hidden code (using #| echo: false)
6. Brief interpretation after each plot
Answer: This is left as an exercise.

4 Summary

This practical demonstrated how to:

Combine data exploration with visualisation (ggplot2)
Create reproducible analysis documents with R Markdown
Use chunk options to control the appearance of code and figures
Structure a workflow from data to insights

In future practicals, we will build on these skills to create more sophisticated visualisations using colours, scales, faceting, and themes.

MAS2908 - Practical 03 (Solutions)

Putting It All Together

Clement Lee

Semester 2, 2025/2026

1 Introduction

1.1 Key concepts recap

2 Creating ggplots in R Markdown

3 Exercises

4 Summary