1 Introduction

This practical brings together what we’ve covered so far:

The goal is to create a complete, reproducible analysis document that combines data exploration with professional visualisations.

1.1 Key concepts recap

In Practical 1, you learned about data frames — the fundamental structure for storing tabular data in R. You accessed columns using $ notation (e.g., mtcars$mpg) and explored built-in datasets.

In Practical 2, you learned how to create R Markdown documents that weave together code, output, and narrative text. You used chunk options to control what appears in your final document:

Goal Chunk option
Hide code, show output #| echo: false
Hide everything #| include: false
Figure caption #| fig.cap: "..."
Figure size #| fig.width, #| fig.height

In lectures, you learned about ggplot2 and the grammar of graphics:

ggplot(data, aes(x = var1, y = var2)) +
  geom_*() +
  labs(x = "X Label", y = "Y Label", title = "Title")

Now we combine all three: create ggplot2 visualisations inside an R Markdown document, using chunk options to produce professional output.

2 Creating ggplots in R Markdown

When you create a ggplot in an R Markdown code chunk, the plot is automatically included in your output document. Here’s an example:

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(x = "Weight (1000 lbs)", y = "Miles per Gallon")
Scatterplot of weight vs fuel efficiency.

Figure 2.1: Scatterplot of weight vs fuel efficiency.

A typical analysis workflow in R Markdown involves:

  1. Load packages (in a setup chunk with #| include: false)
  2. Import data (or use built-in datasets)
  3. Explore and clean data (packages such as dplyr are useful here)
  4. Create visualisations (with appropriate chunk options)
  5. Interpret results (in the narrative text)

Tips for professional output:

3 Exercises

In this exercise, you will create an R Markdown document that contains 7 different types of visualisations using the mpg dataset (fuel economy data for 234 vehicles). Your final document should include:

Getting started:

  1. Create a new R Markdown document: File > New File > R Markdown…
  2. Choose “Document” and “PDF” output
  3. Save it as practical03_plots.Rmd
  4. In your setup chunk, load ggplot2:
```{r setup}
#| include: false
library(ggplot2)
```

For each question:

When you have completed all 7 plots, knit your document to PDF.

  1. Histogram: Create a histogram of cty using geom_histogram(). Experiment with different binwidth values (try 1, 2, and 5). Add appropriate axis labels using labs(). In your interpretation, describe whether the distribution is symmetric or skewed.

    Answer: The distribution is right-skewed with most vehicles getting 15-20 city MPG.

    ggplot(mpg, aes(x = cty)) +
      geom_histogram(binwidth = 2, fill = "steelblue", colour = "white") +
      labs(x = "City MPG", y = "Count")
    Distribution of city fuel efficiency.

    Figure 3.1: Distribution of city fuel efficiency.

  2. Density plot: Create a density plot of cty using geom_density(). Add a rug plot underneath using geom_rug(). In your interpretation, compare this to the histogram — what does the density plot show that the histogram doesn’t (and vice versa)?

    Answer: Density plot shows a smoother view of the distribution shape. Histogram shows actual counts and is affected by bin width choice

    ggplot(mpg, aes(x = cty)) +
      geom_density(fill = "lightblue", alpha = 0.5) +
      geom_rug() +
      labs(x = "City MPG", y = "Density")
    Density of city fuel efficiency.

    Figure 3.2: Density of city fuel efficiency.

  3. Bar chart: Create a bar chart of class (vehicle type) using geom_bar(). Add appropriate axis labels. In your interpretation, state which vehicle class is most common in the dataset.

    Answer: SUV is the most common vehicle class

    ggplot(mpg, aes(x = class)) +
      geom_bar(fill = "steelblue") +
      labs(x = "Vehicle Class", y = "Count")
    Bar chart of vehicle type.

    Figure 3.3: Bar chart of vehicle type.

  4. Scatterplot: Explore the relationship between cty and hwy. Create a scatterplot of cty (\(y\)-axis) vs hwy (\(x\)-axis) using geom_point(). Add a trend line using geom_smooth(method = "lm"). In your interpretation, describe the relationship you observe.

    Answer: Strong positive linear relationship: higher highway MPG = higher city MPG

    ggplot(mpg, aes(x = hwy, y = cty)) +
      geom_point() +
      geom_smooth(method = "lm") +
      labs(x = "Highway MPG", y = "City MPG")
    City fuel efficiency against highway fuel efficiency.

    Figure 3.4: City fuel efficiency against highway fuel efficiency.

  5. Line plot: Create a line plot using the economics dataset. Plot psavert (personal savings rate) over time using geom_line(). Add appropriate axis labels. In your interpretation, describe the trend you observe.

    Answer: General declining trend from 1970s to 2005, then increase after 2008 crisis

    ggplot(economics, aes(x = date, y = psavert)) +
      geom_line() +
      labs(x = "Date", y = "Personal Savings Rate (%)")
    Personal saving rates over time.

    Figure 3.5: Personal saving rates over time.

  6. Stripchart: Create a stripchart of cty by class using geom_jitter() with width = 0.2. Add appropriate axis labels.

    ggplot(mpg, aes(x = class, y = cty)) +
      geom_jitter(width = 0.2) +
      labs(x = "Vehicle Class", y = "City MPG")
    Boxplot of city fuel efficiency by vehicle class.

    Figure 3.6: Boxplot of city fuel efficiency by vehicle class.

  7. Boxplot: Compare the distribution of cty across vehicle classes using geom_boxplot() with aes(x = class, y = cty). Add appropriate axis labels. Combine the boxplot with a jitter plot (use outlier.shape = NA in the boxplot to avoid duplicate points). In your interpretation, state which vehicle class has the best city fuel efficiency.

    Answer: Subcompact has the best median city fuel efficiency

    ggplot(mpg, aes(x = class, y = cty)) +
      geom_boxplot(outlier.shape = NA) +
      geom_jitter(width = 0.2, alpha = 0.5) +
      labs(x = "Vehicle Class", y = "City MPG")
    Boxplot of city fuel efficiency by vehicle class, overlayed by stripchart / jitter plot.

    Figure 3.7: Boxplot of city fuel efficiency by vehicle class, overlayed by stripchart / jitter plot.

  8. Create a complete analysis document

    Using another dataset such as mtcars or airquality, create a complete R Markdown document that includes:

    1. An appropriate title and brief introduction
    2. At least three different types of plots (e.g., histogram, boxplot, scatterplot)
    3. Proper figure captions for each plot
    4. Consistent figure sizing throughout
    5. Hidden code (using #| echo: false)
    6. Brief interpretation after each plot

    Answer: This is left as an exercise.

4 Summary

This practical demonstrated how to:

In future practicals, we will build on these skills to create more sophisticated visualisations using colours, scales, faceting, and themes.