1 Introduction

This practical focuses on visualising temporal (time series) data and understanding the difference between geom_line() and geom_path(). You will learn how to:

2 Data: Money Demand

We will use the moneydemand dataset from Practical 4 (available on Canvas). Recall its key variables:

moneydemand <- read_csv("moneydemand.csv"))
str(moneydemand)
## spc_tbl_ [96 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ year  : num [1:96] 1879 1880 1881 1882 1883 ...
##  $ logM  : num [1:96] -7.42 -7.25 -7.09 -7.04 -7.01 ...
##  $ logYp : num [1:96] 5.61 5.7 5.74 5.79 5.81 ...
##  $ Rs    : num [1:96] 5.07 5.23 5.2 5.64 5.62 ...
##  $ Rl    : num [1:96] 4.88 4.58 4.26 4.31 4.33 4.28 4.08 3.81 3.87 3.8 ...
##  $ Rm    : num [1:96] 2.58 2.73 2.87 3.14 3.16 2.89 2.23 2.86 3.51 2.95 ...
##  $ logSpp: num [1:96] -3.76 -2.69 -2.7 -2.67 -2.79 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   year = col_double(),
##   ..   logM = col_double(),
##   ..   logYp = col_double(),
##   ..   Rs = col_double(),
##   ..   Rl = col_double(),
##   ..   Rm = col_double(),
##   ..   logSpp = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

3 geom_line() vs geom_path()

The key difference between these two geoms:

3.1 When they are the same

When data is sorted by the \(x\)-variable (as is typical for time series), both geoms produce identical plots:

# Using geom_line()
ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_line() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "Using geom_line()")

# Using geom_path() - identical result
ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_path() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "Using geom_path()")

3.2 When they differ

Let’s shuffle the rows and see what happens:

set.seed(42)
moneydemand_shuffled <- moneydemand |>
  slice_sample(n = nrow(moneydemand))  # shuffle all rows

# geom_line() still works - it sorts by x internally
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
  geom_line() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "geom_line() with shuffled data - still correct!")

# geom_path() creates chaos - it connects in row order
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
  geom_path() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "geom_path() with shuffled data - chaotic!")

3.3 Exercises

  1. Create a line plot of Rl (long-term interest rate) over year using the original (unshuffled) moneydemand data. Add appropriate axis labels.

    ggplot(moneydemand, aes(x = year, y = Rl)) +
      geom_line() +
      labs(x = "Year", y = "Long-term Interest Rate (%)")

  2. Using the shuffled data moneydemand_shuffled, try to create the same plot using geom_path(). Describe what happens and explain why.

    ggplot(moneydemand_shuffled, aes(x = year, y = Rl)) +
      geom_path() +
      labs(x = "Year", y = "Long-term Interest Rate (%)",
           title = "Chaotic plot from shuffled data")

    The plot appears chaotic because geom_path() connects points in row order, not by the \(x\)-value. Since the data is shuffled, it jumps backwards and forwards in time.

  3. Fix the chaotic plot from Q2 by sorting the data before plotting. Use arrange() within the pipe.

    moneydemand_shuffled |>
      arrange(year) |>
      ggplot(aes(x = year, y = Rl)) +
      geom_path() +
      labs(x = "Year", y = "Long-term Interest Rate (%)")

  4. Explain why geom_line() would produce a correct plot even with shuffled data, while geom_path() would not.

    geom_line() internally sorts the data by the \(x\)-variable before connecting points, so row order doesn’t matter. geom_path() connects points in the exact order they appear in the data frame, so if rows are shuffled, the connecting lines will jump around chaotically.

4 Using geom_path() to explore two variables over time

The real power of geom_path() is showing how two variables evolve together over time. By plotting one variable against another and connecting in temporal order, you add time as a third dimension.

ggplot(moneydemand, aes(x = Rs, y = Rl)) +
  geom_path(alpha = 0.7) +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       title = "Evolution of interest rates (1879-1974)")

4.1 Enhancing with colour

Map year to colour to show the temporal progression:

ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       colour = "Year")

4.2 A polished example

ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
  geom_path(linewidth = 1, alpha = 0.8) +
  geom_point(size = 1.5) +
  scale_colour_viridis_c(option = "plasma") +
  labs(
    x = "Short-term Interest Rate (%)",
    y = "Long-term Interest Rate (%)",
    colour = "Year",
    title = "Evolution of US Interest Rates (1879-1974)",
    caption = "Source: lmtest::moneydemand"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
A polished time series visualisation combining multiple concepts.

Figure 4.1: A polished time series visualisation combining multiple concepts.

4.3 Exercises

  1. Create a geom_path() plot showing how Rm (interest rate on money) and logSpp (log stock prices) evolved together over time. Map year to colour.

    ggplot(moneydemand, aes(x = Rm, y = logSpp, colour = year)) +
      geom_path(linewidth = 1) +
      scale_colour_viridis_c() +
      labs(x = "Interest Rate on Money (%)",
           y = "Log Stock Prices",
           colour = "Year")

  2. Create a similar plot for Rs (short-term rate) vs logSpp. Add points on top of the path to mark individual years.

    ggplot(moneydemand, aes(x = Rs, y = logSpp, colour = year)) +
      geom_path(linewidth = 0.8, alpha = 0.7) +
      geom_point(size = 1) +
      scale_colour_viridis_c() +
      labs(x = "Short-term Interest Rate (%)",
           y = "Log Stock Prices",
           colour = "Year")

  3. Using the shuffled data, attempt to create the plot from Q5. Describe what goes wrong and how you would fix it.

    # Broken version with shuffled data
    ggplot(moneydemand_shuffled, aes(x = Rm, y = logSpp, colour = year)) +
      geom_path(linewidth = 1) +
      scale_colour_viridis_c() +
      labs(x = "Interest Rate on Money (%)",
           y = "Log Stock Prices",
           colour = "Year",
           title = "Broken: shuffled data")

    # Fixed by arranging by year
    moneydemand_shuffled |>
      arrange(year) |>
      ggplot(aes(x = Rm, y = logSpp, colour = year)) +
      geom_path(linewidth = 1) +
      scale_colour_viridis_c() +
      labs(x = "Interest Rate on Money (%)",
           y = "Log Stock Prices",
           colour = "Year",
           title = "Fixed: sorted by year")

    With shuffled data, geom_path() connects points in the wrong order, creating a chaotic criss-crossing pattern. The fix is to sort by year using arrange(year) before plotting.

  4. Compare the Rs vs Rl path plot with a simple scatterplot (geom_point() only). What additional information does geom_path() reveal that the scatterplot does not?

    # Scatterplot only
    ggplot(moneydemand, aes(x = Rs, y = Rl)) +
      geom_point() +
      labs(x = "Short-term Interest Rate (%)",
           y = "Long-term Interest Rate (%)",
           title = "Scatterplot only")

    # Path plot
    ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
      geom_path(linewidth = 1) +
      scale_colour_viridis_c() +
      labs(x = "Short-term Interest Rate (%)",
           y = "Long-term Interest Rate (%)",
           colour = "Year",
           title = "Path plot shows temporal evolution")

    The scatterplot shows the relationship between the two variables but not the temporal order. The path plot reveals how the variables evolved together over time — you can see the trajectory the economy took through this 2-D space from 1879 to 1974, including periods of rapid change and more stable periods.

5 Working with dates

5.1 lubridate

Real-world time series data often comes with dates stored as separate columns (year, month, day) or as strings. The lubridate package provides functions to assemble these into proper date or time objects that R understands:

Note: R’s technical name for these objects is “datetime” (specifically POSIXct). We use “time object” here to avoid confusion with Date objects, since “datetime” contains “date” and the two can easily be mixed up.

Function Purpose Example
make_datetime() Create time object from components make_datetime(year, month, day, hour)
make_date() Create date object from components make_date(year, month, day)
ymd() Parse date from a string ymd("2024-01-15")
year(), month(), day() Extract components from a date year(date_column)
library(lubridate)

# Create a date column from separate year/month/day integer columns
data <- data |>
  mutate(date = make_date(year, month, day))

# Create a datetime column (if hours are also available)
data <- data |>
  mutate(datetime = make_datetime(year, month, day, hour))

The moneydemand dataset uses a single integer year column, so lubridate is not needed here. For datasets with daily or hourly observations (e.g., a column each for year, month, day), using make_date() or make_datetime() is essential for correct axis spacing and labelling.

5.2 Formatting date axes: scale_x_date() and strftime() format codes

Once you have a proper date column, scale_x_date() (for Date objects) or scale_x_datetime() (for POSIXct objects) lets you control how the axis labels are formatted. The date_labels argument accepts strftime() format codes:

Code Meaning Example
%Y 4-digit year 2024
%b Abbreviated month Jan, Feb
%B Full month name January
%d Day of month 01–31

Since moneydemand$year is a numeric integer (not a Date), the simplest approach is scale_x_continuous() with custom breaks:

ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1880, 1970, by = 10)) +
  labs(x = "Year", y = "Short-term Interest Rate (%)")

Alternatively, we can convert year to a proper Date object first using mutate() and make_date(). This unlocks scale_x_date(), which accepts strftime() format codes via its date_labels argument and handles date spacing automatically:

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
moneydemand |>
  mutate(date = make_date(year)) |>
  ggplot(aes(x = date, y = Rs)) +
  geom_line() +
  scale_x_date(date_labels = "%Y",
               date_breaks = "10 years") +
  labs(x = "Year", y = "Short-term Interest Rate (%)")

Both plots look the same here, but the scale_x_date() approach generalises to datasets with monthly or daily data, where you might use date_labels = "%b %Y" (e.g., “Jan 1960”) or date_labels = "%d %b" (e.g., “15 Jan”).

6 Summary

geom_line() vs geom_path():

When to use which:

Fixing shuffled data:

Working with dates: