1 Introduction

This practical focuses on visualising temporal (time series) data and understanding the difference between geom_line() and geom_path(). You will learn how to:

Create line plots for time series data
Understand the key difference between geom_line() and geom_path()
Use geom_path() to visualise how two variables evolve together over time
Handle data ordering issues in temporal visualisation

2 Data: Money Demand

We will use the moneydemand dataset from Practical 4 (available on Canvas). Recall its key variables:

year: Year of observation (1879–1974)
Rs: Short-term interest rate
Rl: Long-term interest rate
Rm: Interest rate on money
logSpp: Log of stock prices

moneydemand <- read_csv("moneydemand.csv"))
str(moneydemand)

## spc_tbl_ [96 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ year  : num [1:96] 1879 1880 1881 1882 1883 ...
##  $ logM  : num [1:96] -7.42 -7.25 -7.09 -7.04 -7.01 ...
##  $ logYp : num [1:96] 5.61 5.7 5.74 5.79 5.81 ...
##  $ Rs    : num [1:96] 5.07 5.23 5.2 5.64 5.62 ...
##  $ Rl    : num [1:96] 4.88 4.58 4.26 4.31 4.33 4.28 4.08 3.81 3.87 3.8 ...
##  $ Rm    : num [1:96] 2.58 2.73 2.87 3.14 3.16 2.89 2.23 2.86 3.51 2.95 ...
##  $ logSpp: num [1:96] -3.76 -2.69 -2.7 -2.67 -2.79 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   year = col_double(),
##   ..   logM = col_double(),
##   ..   logYp = col_double(),
##   ..   Rs = col_double(),
##   ..   Rl = col_double(),
##   ..   Rm = col_double(),
##   ..   logSpp = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

3 `geom_line()` vs `geom_path()`

The key difference between these two geoms:

geom_line(): Connects points in order of the $x$-variable
geom_path(): Connects points in row order (the order they appear in the data frame)

3.1 When they are the same

When data is sorted by the $x$-variable (as is typical for time series), both geoms produce identical plots:

# Using geom_line()
ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_line() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "Using geom_line()")

# Using geom_path() - identical result
ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_path() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "Using geom_path()")

3.2 When they differ

Let’s shuffle the rows and see what happens:

set.seed(42)
moneydemand_shuffled <- moneydemand |>
  slice_sample(n = nrow(moneydemand))  # shuffle all rows

# geom_line() still works - it sorts by x internally
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
  geom_line() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "geom_line() with shuffled data - still correct!")

# geom_path() creates chaos - it connects in row order
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
  geom_path() +
  labs(x = "Year", y = "Short-term Interest Rate (%)",
       title = "geom_path() with shuffled data - chaotic!")

3.3 Exercises

Create a line plot of Rl (long-term interest rate) over year using the original (unshuffled) moneydemand data. Add appropriate axis labels.
```
ggplot(moneydemand, aes(x = year, y = Rl)) +
  geom_line() +
  labs(x = "Year", y = "Long-term Interest Rate (%)")
```
Using the shuffled data moneydemand_shuffled, try to create the same plot using geom_path(). Describe what happens and explain why.
```
ggplot(moneydemand_shuffled, aes(x = year, y = Rl)) +
  geom_path() +
  labs(x = "Year", y = "Long-term Interest Rate (%)",
       title = "Chaotic plot from shuffled data")
```
The plot appears chaotic because geom_path() connects points in row order, not by the $x$-value. Since the data is shuffled, it jumps backwards and forwards in time.

Fix the chaotic plot from Q2 by sorting the data before plotting. Use arrange() within the pipe.

moneydemand_shuffled |>
  arrange(year) |>
  ggplot(aes(x = year, y = Rl)) +
  geom_path() +
  labs(x = "Year", y = "Long-term Interest Rate (%)")

Explain why geom_line() would produce a correct plot even with shuffled data, while geom_path() would not.

geom_line() internally sorts the data by the $x$-variable before connecting points, so row order doesn’t matter. geom_path() connects points in the exact order they appear in the data frame, so if rows are shuffled, the connecting lines will jump around chaotically.

4 Using `geom_path()` to explore two variables over time

The real power of geom_path() is showing how two variables evolve together over time. By plotting one variable against another and connecting in temporal order, you add time as a third dimension.

ggplot(moneydemand, aes(x = Rs, y = Rl)) +
  geom_path(alpha = 0.7) +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       title = "Evolution of interest rates (1879-1974)")

4.1 Enhancing with colour

Map year to colour to show the temporal progression:

ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       colour = "Year")

4.2 A polished example

ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
  geom_path(linewidth = 1, alpha = 0.8) +
  geom_point(size = 1.5) +
  scale_colour_viridis_c(option = "plasma") +
  labs(
    x = "Short-term Interest Rate (%)",
    y = "Long-term Interest Rate (%)",
    colour = "Year",
    title = "Evolution of US Interest Rates (1879-1974)",
    caption = "Source: lmtest::moneydemand"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Figure 4.1: A polished time series visualisation combining multiple concepts.

4.3 Exercises

Create a geom_path() plot showing how Rm (interest rate on money) and logSpp (log stock prices) evolved together over time. Map year to colour.

ggplot(moneydemand, aes(x = Rm, y = logSpp, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Interest Rate on Money (%)",
       y = "Log Stock Prices",
       colour = "Year")

Create a similar plot for Rs (short-term rate) vs logSpp. Add points on top of the path to mark individual years.

ggplot(moneydemand, aes(x = Rs, y = logSpp, colour = year)) +
  geom_path(linewidth = 0.8, alpha = 0.7) +
  geom_point(size = 1) +
  scale_colour_viridis_c() +
  labs(x = "Short-term Interest Rate (%)",
       y = "Log Stock Prices",
       colour = "Year")

Using the shuffled data, attempt to create the plot from Q5. Describe what goes wrong and how you would fix it.

# Broken version with shuffled data
ggplot(moneydemand_shuffled, aes(x = Rm, y = logSpp, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Interest Rate on Money (%)",
       y = "Log Stock Prices",
       colour = "Year",
       title = "Broken: shuffled data")

# Fixed by arranging by year
moneydemand_shuffled |>
  arrange(year) |>
  ggplot(aes(x = Rm, y = logSpp, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Interest Rate on Money (%)",
       y = "Log Stock Prices",
       colour = "Year",
       title = "Fixed: sorted by year")

With shuffled data, geom_path() connects points in the wrong order, creating a chaotic criss-crossing pattern. The fix is to sort by year using arrange(year) before plotting.

Compare the Rs vs Rl path plot with a simple scatterplot (geom_point() only). What additional information does geom_path() reveal that the scatterplot does not?

# Scatterplot only
ggplot(moneydemand, aes(x = Rs, y = Rl)) +
  geom_point() +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       title = "Scatterplot only")

# Path plot
ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
  geom_path(linewidth = 1) +
  scale_colour_viridis_c() +
  labs(x = "Short-term Interest Rate (%)",
       y = "Long-term Interest Rate (%)",
       colour = "Year",
       title = "Path plot shows temporal evolution")

The scatterplot shows the relationship between the two variables but not the temporal order. The path plot reveals how the variables evolved together over time — you can see the trajectory the economy took through this 2-D space from 1879 to 1974, including periods of rapid change and more stable periods.

5 Working with dates

5.1 `lubridate`

Real-world time series data often comes with dates stored as separate columns (year, month, day) or as strings. The lubridate package provides functions to assemble these into proper date or time objects that R understands:

Note: R’s technical name for these objects is “datetime” (specifically POSIXct). We use “time object” here to avoid confusion with Date objects, since “datetime” contains “date” and the two can easily be mixed up.

Function	Purpose	Example
`make_datetime()`	Create time object from components	`make_datetime(year, month, day, hour)`
`make_date()`	Create date object from components	`make_date(year, month, day)`
`ymd()`	Parse date from a string	`ymd("2024-01-15")`
`year()`, `month()`, `day()`	Extract components from a date	`year(date_column)`

library(lubridate)

# Create a date column from separate year/month/day integer columns
data <- data |>
  mutate(date = make_date(year, month, day))

# Create a datetime column (if hours are also available)
data <- data |>
  mutate(datetime = make_datetime(year, month, day, hour))

The moneydemand dataset uses a single integer year column, so lubridate is not needed here. For datasets with daily or hourly observations (e.g., a column each for year, month, day), using make_date() or make_datetime() is essential for correct axis spacing and labelling.

5.2 Formatting date axes: `scale_x_date()` and `strftime()` format codes

Once you have a proper date column, scale_x_date() (for Date objects) or scale_x_datetime() (for POSIXct objects) lets you control how the axis labels are formatted. The date_labels argument accepts strftime() format codes:

Code	Meaning	Example
`%Y`	4-digit year	2024
`%b`	Abbreviated month	Jan, Feb
`%B`	Full month name	January
`%d`	Day of month	01–31

Since moneydemand$year is a numeric integer (not a Date), the simplest approach is scale_x_continuous() with custom breaks:

ggplot(moneydemand, aes(x = year, y = Rs)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1880, 1970, by = 10)) +
  labs(x = "Year", y = "Short-term Interest Rate (%)")

Alternatively, we can convert year to a proper Date object first using mutate() and make_date(). This unlocks scale_x_date(), which accepts strftime() format codes via its date_labels argument and handles date spacing automatically:

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

moneydemand |>
  mutate(date = make_date(year)) |>
  ggplot(aes(x = date, y = Rs)) +
  geom_line() +
  scale_x_date(date_labels = "%Y",
               date_breaks = "10 years") +
  labs(x = "Year", y = "Short-term Interest Rate (%)")

Both plots look the same here, but the scale_x_date() approach generalises to datasets with monthly or daily data, where you might use date_labels = "%b %Y" (e.g., “Jan 1960”) or date_labels = "%d %b" (e.g., “15 Jan”).

6 Summary

geom_line() vs geom_path():

geom_line() connects points by $x$-value; robust to shuffled data
geom_path() connects points by row order; requires sorted data
Both produce identical results when data is sorted by $x$

When to use which:

Use geom_line() for standard time series ($y$ vs time on $x$-axis)
Use geom_path() to show how two variables evolve together over time, adding time as a third dimension

Fixing shuffled data:

Sort by the time variable using arrange() before plotting with geom_path()
Or use geom_line() which handles unsorted data automatically

Working with dates:

For data with separate year/month/day columns, use lubridate::make_date() or make_datetime() to create proper date or time objects before plotting
For data stored as strings, use ymd() (and related functions) to parse them into dates
Once you have a Date column, use scale_x_date(date_labels = ...) to format axis labels using strftime() codes (e.g., "%b %Y" for “Jan 2024”)
If the time variable is already numeric (as with moneydemand$year), scale_x_continuous(breaks = ...) gives you control over tick positions without needing lubridate or scale_x_date()

MAS2908 - Practical 07 (Solutions)

Clement Lee

Semester 2, 2025/2026

1 Introduction

2 Data: Money Demand

3 `geom_line()` vs `geom_path()`

3.1 When they are the same

3.2 When they differ

3.3 Exercises

4 Using `geom_path()` to explore two variables over time

4.1 Enhancing with colour

4.2 A polished example

4.3 Exercises

5 Working with dates

5.1 `lubridate`

5.2 Formatting date axes: `scale_x_date()` and `strftime()` format codes

6 Summary

MAS2908 - Practical 07 (Solutions)

Clement Lee

Semester 2, 2025/2026

1 Introduction

2 Data: Money Demand

3 geom_line() vs geom_path()

3.1 When they are the same

3.2 When they differ

3.3 Exercises

4 Using geom_path() to explore two variables over time

4.1 Enhancing with colour

4.2 A polished example

4.3 Exercises

5 Working with dates

5.1 lubridate

5.2 Formatting date axes: scale_x_date() and strftime() format codes

6 Summary

3 `geom_line()` vs `geom_path()`

4 Using `geom_path()` to explore two variables over time

5.1 `lubridate`

5.2 Formatting date axes: `scale_x_date()` and `strftime()` format codes