This practical focuses on visualising temporal (time series) data and
understanding the difference between geom_line() and geom_path(). You will
learn how to:
geom_line() and geom_path()geom_path() to visualise how two variables evolve together over timeWe will use the moneydemand dataset from Practical 4 (available on Canvas).
Recall its key variables:
year: Year of observation (1879–1974)Rs: Short-term interest rateRl: Long-term interest rateRm: Interest rate on moneylogSpp: Log of stock pricesmoneydemand <- read_csv("moneydemand.csv"))
str(moneydemand)
## spc_tbl_ [96 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ year : num [1:96] 1879 1880 1881 1882 1883 ...
## $ logM : num [1:96] -7.42 -7.25 -7.09 -7.04 -7.01 ...
## $ logYp : num [1:96] 5.61 5.7 5.74 5.79 5.81 ...
## $ Rs : num [1:96] 5.07 5.23 5.2 5.64 5.62 ...
## $ Rl : num [1:96] 4.88 4.58 4.26 4.31 4.33 4.28 4.08 3.81 3.87 3.8 ...
## $ Rm : num [1:96] 2.58 2.73 2.87 3.14 3.16 2.89 2.23 2.86 3.51 2.95 ...
## $ logSpp: num [1:96] -3.76 -2.69 -2.7 -2.67 -2.79 ...
## - attr(*, "spec")=
## .. cols(
## .. year = col_double(),
## .. logM = col_double(),
## .. logYp = col_double(),
## .. Rs = col_double(),
## .. Rl = col_double(),
## .. Rm = col_double(),
## .. logSpp = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
geom_line() vs geom_path()The key difference between these two geoms:
geom_line(): Connects points in order of the \(x\)-variablegeom_path(): Connects points in row order (the order they appear
in the data frame)When data is sorted by the \(x\)-variable (as is typical for time series), both geoms produce identical plots:
# Using geom_line()
ggplot(moneydemand, aes(x = year, y = Rs)) +
geom_line() +
labs(x = "Year", y = "Short-term Interest Rate (%)",
title = "Using geom_line()")
# Using geom_path() - identical result
ggplot(moneydemand, aes(x = year, y = Rs)) +
geom_path() +
labs(x = "Year", y = "Short-term Interest Rate (%)",
title = "Using geom_path()")
Let’s shuffle the rows and see what happens:
set.seed(42)
moneydemand_shuffled <- moneydemand |>
slice_sample(n = nrow(moneydemand)) # shuffle all rows
# geom_line() still works - it sorts by x internally
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
geom_line() +
labs(x = "Year", y = "Short-term Interest Rate (%)",
title = "geom_line() with shuffled data - still correct!")
# geom_path() creates chaos - it connects in row order
ggplot(moneydemand_shuffled, aes(x = year, y = Rs)) +
geom_path() +
labs(x = "Year", y = "Short-term Interest Rate (%)",
title = "geom_path() with shuffled data - chaotic!")
Create a line plot of Rl (long-term interest rate) over year using the
original (unshuffled) moneydemand data. Add appropriate axis labels.
ggplot(moneydemand, aes(x = year, y = Rl)) +
geom_line() +
labs(x = "Year", y = "Long-term Interest Rate (%)")
Using the shuffled data moneydemand_shuffled, try to create the same plot
using geom_path(). Describe what happens and explain why.
ggplot(moneydemand_shuffled, aes(x = year, y = Rl)) +
geom_path() +
labs(x = "Year", y = "Long-term Interest Rate (%)",
title = "Chaotic plot from shuffled data")
The plot appears chaotic because geom_path() connects points in row order, not by the \(x\)-value. Since the data is shuffled, it jumps backwards and forwards in time.
Fix the chaotic plot from Q2 by sorting the data before plotting. Use
arrange() within the pipe.
moneydemand_shuffled |>
arrange(year) |>
ggplot(aes(x = year, y = Rl)) +
geom_path() +
labs(x = "Year", y = "Long-term Interest Rate (%)")
Explain why geom_line() would produce a correct plot even with shuffled
data, while geom_path() would not.
geom_line() internally sorts the data by the \(x\)-variable before connecting points, so row order doesn’t matter. geom_path() connects points in the exact order they appear in the data frame, so if rows are shuffled, the connecting lines will jump around chaotically.
geom_path() to explore two variables over timeThe real power of geom_path() is showing how two variables evolve together
over time. By plotting one variable against another and connecting in
temporal order, you add time as a third dimension.
ggplot(moneydemand, aes(x = Rs, y = Rl)) +
geom_path(alpha = 0.7) +
labs(x = "Short-term Interest Rate (%)",
y = "Long-term Interest Rate (%)",
title = "Evolution of interest rates (1879-1974)")
Map year to colour to show the temporal progression:
ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
geom_path(linewidth = 1) +
scale_colour_viridis_c() +
labs(x = "Short-term Interest Rate (%)",
y = "Long-term Interest Rate (%)",
colour = "Year")
ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
geom_path(linewidth = 1, alpha = 0.8) +
geom_point(size = 1.5) +
scale_colour_viridis_c(option = "plasma") +
labs(
x = "Short-term Interest Rate (%)",
y = "Long-term Interest Rate (%)",
colour = "Year",
title = "Evolution of US Interest Rates (1879-1974)",
caption = "Source: lmtest::moneydemand"
) +
theme_minimal() +
theme(legend.position = "bottom")
Figure 4.1: A polished time series visualisation combining multiple concepts.
Create a geom_path() plot showing how Rm (interest rate on money) and
logSpp (log stock prices) evolved together over time. Map year to
colour.
ggplot(moneydemand, aes(x = Rm, y = logSpp, colour = year)) +
geom_path(linewidth = 1) +
scale_colour_viridis_c() +
labs(x = "Interest Rate on Money (%)",
y = "Log Stock Prices",
colour = "Year")
Create a similar plot for Rs (short-term rate) vs logSpp. Add points
on top of the path to mark individual years.
ggplot(moneydemand, aes(x = Rs, y = logSpp, colour = year)) +
geom_path(linewidth = 0.8, alpha = 0.7) +
geom_point(size = 1) +
scale_colour_viridis_c() +
labs(x = "Short-term Interest Rate (%)",
y = "Log Stock Prices",
colour = "Year")
Using the shuffled data, attempt to create the plot from Q5. Describe what goes wrong and how you would fix it.
# Broken version with shuffled data
ggplot(moneydemand_shuffled, aes(x = Rm, y = logSpp, colour = year)) +
geom_path(linewidth = 1) +
scale_colour_viridis_c() +
labs(x = "Interest Rate on Money (%)",
y = "Log Stock Prices",
colour = "Year",
title = "Broken: shuffled data")
# Fixed by arranging by year
moneydemand_shuffled |>
arrange(year) |>
ggplot(aes(x = Rm, y = logSpp, colour = year)) +
geom_path(linewidth = 1) +
scale_colour_viridis_c() +
labs(x = "Interest Rate on Money (%)",
y = "Log Stock Prices",
colour = "Year",
title = "Fixed: sorted by year")
With shuffled data, geom_path() connects points in the wrong order, creating a chaotic criss-crossing pattern. The fix is to sort by year using arrange(year) before plotting.
Compare the Rs vs Rl path plot with a simple scatterplot (geom_point()
only). What additional information does geom_path() reveal that the
scatterplot does not?
# Scatterplot only
ggplot(moneydemand, aes(x = Rs, y = Rl)) +
geom_point() +
labs(x = "Short-term Interest Rate (%)",
y = "Long-term Interest Rate (%)",
title = "Scatterplot only")
# Path plot
ggplot(moneydemand, aes(x = Rs, y = Rl, colour = year)) +
geom_path(linewidth = 1) +
scale_colour_viridis_c() +
labs(x = "Short-term Interest Rate (%)",
y = "Long-term Interest Rate (%)",
colour = "Year",
title = "Path plot shows temporal evolution")
The scatterplot shows the relationship between the two variables but not the temporal order. The path plot reveals how the variables evolved together over time — you can see the trajectory the economy took through this 2-D space from 1879 to 1974, including periods of rapid change and more stable periods.
lubridateReal-world time series data often comes with dates stored as separate columns
(year, month, day) or as strings. The lubridate package provides functions
to assemble these into proper date or time objects that R understands:
Note: R’s technical name for these objects is “datetime” (specifically
POSIXct). We use “time object” here to avoid confusion withDateobjects, since “datetime” contains “date” and the two can easily be mixed up.
| Function | Purpose | Example |
|---|---|---|
make_datetime() |
Create time object from components | make_datetime(year, month, day, hour) |
make_date() |
Create date object from components | make_date(year, month, day) |
ymd() |
Parse date from a string | ymd("2024-01-15") |
year(), month(), day() |
Extract components from a date | year(date_column) |
library(lubridate)
# Create a date column from separate year/month/day integer columns
data <- data |>
mutate(date = make_date(year, month, day))
# Create a datetime column (if hours are also available)
data <- data |>
mutate(datetime = make_datetime(year, month, day, hour))
The moneydemand dataset uses a single integer year column, so lubridate
is not needed here. For datasets with daily or hourly observations (e.g., a
column each for year, month, day), using make_date() or make_datetime()
is essential for correct axis spacing and labelling.
scale_x_date() and strftime() format codesOnce you have a proper date column, scale_x_date() (for Date objects) or
scale_x_datetime() (for POSIXct objects) lets you control how the axis
labels are formatted. The date_labels argument accepts strftime() format
codes:
| Code | Meaning | Example |
|---|---|---|
%Y |
4-digit year | 2024 |
%b |
Abbreviated month | Jan, Feb |
%B |
Full month name | January |
%d |
Day of month | 01–31 |
Since moneydemand$year is a numeric integer (not a Date), the simplest
approach is scale_x_continuous() with custom breaks:
ggplot(moneydemand, aes(x = year, y = Rs)) +
geom_line() +
scale_x_continuous(breaks = seq(1880, 1970, by = 10)) +
labs(x = "Year", y = "Short-term Interest Rate (%)")
Alternatively, we can convert year to a proper Date object first using
mutate() and make_date(). This unlocks scale_x_date(), which accepts
strftime() format codes via its date_labels argument and handles date
spacing automatically:
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
moneydemand |>
mutate(date = make_date(year)) |>
ggplot(aes(x = date, y = Rs)) +
geom_line() +
scale_x_date(date_labels = "%Y",
date_breaks = "10 years") +
labs(x = "Year", y = "Short-term Interest Rate (%)")
Both plots look the same here, but the scale_x_date() approach generalises
to datasets with monthly or daily data, where you might use
date_labels = "%b %Y" (e.g., “Jan 1960”) or date_labels = "%d %b" (e.g.,
“15 Jan”).
geom_line() vs geom_path():
geom_line() connects points by \(x\)-value; robust to shuffled datageom_path() connects points by row order; requires sorted dataWhen to use which:
geom_line() for standard time series (\(y\) vs time on \(x\)-axis)geom_path() to show how two variables evolve together over time,
adding time as a third dimensionFixing shuffled data:
arrange() before plotting with
geom_path()geom_line() which handles unsorted data automaticallyWorking with dates:
lubridate::make_date()
or make_datetime() to create proper date or time objects before plottingymd() (and related functions) to parse
them into datesDate column, use scale_x_date(date_labels = ...) to
format axis labels using strftime() codes (e.g., "%b %Y" for “Jan 2024”)moneydemand$year),
scale_x_continuous(breaks = ...) gives you control over tick positions
without needing lubridate or scale_x_date()