5 Other Aesthetics
5.1 Introduction
In previous chapters, we have seen how colour (colour / color / fill)
and opacity (alpha) can be used as aesthetics to map variables to visual
properties. The same principle applies to other visual properties: shape, line
type, size, and even the axes themselves.
Recall that aesthetics are the visual properties that encode data (e.g.,
position, colour, shape), while scales control how the mapping from data
values to aesthetic values works. All scale functions in ggplot2 follow the
consistent naming pattern:
scale_<aesthetic>_<type>()
This chapter covers the remaining common aesthetics you will encounter.
5.2 Shape
The shape aesthetic maps categorical variables to point shapes. This is useful when you need to distinguish groups in a scatterplot, especially in combination with colour for redundant encoding or when printing in black and white.
Note: Compared to colour, the shape aesthetic is less mature in ggplot2.
The scale functions and available values are more limited, and the visual
distinction between shapes is generally weaker than between colours. Unless you
specifically need shapes (e.g., for black-and-white printing), colour is usually
the more effective choice. It is rarely worth spending much time manually
specifying shapes.
5.2.1 Default shape mapping
When you map a variable to shape, ggplot2 automatically selects from a
set of distinct shapes:
ggplot(mpg, aes(x = displ, y = hwy, shape = drv)) +
geom_point(size = 3) +
labs(x = "Engine Displacement (L)", y = "Highway MPG", shape = "Drive")
Figure 5.1: Default shape mapping for categorical variable.
5.2.2 Available shapes
R provides 26 built-in shapes (0–25):
Figure 5.2: Available point shapes in R.
Shapes 0–14 are hollow, 15–20 are solid, and 21–25 have both colour
(outline) and fill (interior).
5.2.3 Manual shape selection
Use scale_shape_manual() to specify exact shapes:
ggplot(mpg, aes(x = displ, y = hwy, shape = drv)) +
geom_point(size = 3) +
scale_shape_manual(values = c("4" = 16, "f" = 17, "r" = 15)) +
labs(x = "Engine Displacement (L)", y = "Highway MPG", shape = "Drive")
Figure 5.3: Manually specified shapes.
5.2.4 Combining shape and colour
For maximum clarity, combine shape with colour for redundant encoding:
ggplot(mpg, aes(x = displ, y = hwy, shape = drv, colour = drv)) +
geom_point(size = 3) +
scale_colour_brewer(palette = "Set1") +
labs(x = "Engine Displacement (L)", y = "Highway MPG",
shape = "Drive", colour = "Drive")
Figure 5.4: Shape and colour combined for redundant encoding.
A note on redundant encoding: While mapping the same variable to multiple aesthetics (e.g., both shape and colour) can improve accessibility, this approach “uses up” aesthetics that could otherwise represent additional variables. In data with many variables of interest, it is generally better to reserve each aesthetic for a different variable rather than encoding the same information redundantly.
5.3 Line type
The line type aesthetic maps categorical variables to line patterns. This is essential when you need to distinguish multiple lines, particularly for black-and-white printing.
5.3.1 Available line types
ggplot2 provides six named line types:
Figure 5.5: Available line types in ggplot2.
5.3.2 Default line type mapping
The economics_long dataset contains US economic time series data in long
format, with a variable column indicating which economic indicator is being
measured and a value01 column containing standardised values (scaled to
0–1 for comparability):
ggplot(economics_long, aes(x = date, y = value01, linetype = variable)) +
geom_line() +
labs(x = "Year", y = "Standardised Value", linetype = "Variable")
Figure 5.6: Default line type mapping.
5.3.3 Combining line type and colour
As with shape, combining line type and colour provides redundant encoding that improves accessibility:
ggplot(economics_long, aes(x = date, y = value01,
linetype = variable, colour = variable)) +
geom_line() +
scale_colour_viridis_d() +
labs(x = "Year", y = "Standardised Value",
linetype = "Variable", colour = "Variable")
Figure 5.7: Line type combined with colour for redundant encoding.
The same trade-off applies here: redundant encoding uses up aesthetics that could otherwise show additional variables. Use redundant encoding when accessibility is important; otherwise, reserve each aesthetic for a different variable.
5.4 Size
The size aesthetic maps continuous variables to the size of points or lines. This creates what is sometimes called a bubble chart.
5.4.1 Default size mapping
ggplot(mpg, aes(x = displ, y = hwy, size = cty)) +
geom_point(alpha = 0.6) +
labs(x = "Engine Displacement (L)", y = "Highway MPG", size = "City MPG")
Figure 5.8: Size mapped to a continuous variable.
5.4.2 Controlling the size range
Use scale_size_continuous() to set the range of sizes:
ggplot(mpg, aes(x = displ, y = hwy, size = cty)) +
geom_point(alpha = 0.6) +
scale_size_continuous(range = c(1, 10)) +
labs(x = "Engine Displacement (L)", y = "Highway MPG", size = "City MPG")
Figure 5.9: Controlling the size range.
5.4.3 Size by area vs radius
By default, scale_size_continuous() scales by area, which is perceptually more
accurate. Use scale_radius() only when the variable represents a radius:
ggplot(mpg, aes(x = displ, y = hwy, size = cty, colour = drv)) +
geom_point(alpha = 0.7) +
scale_size_area(max_size = 10) +
scale_colour_brewer(palette = "Set1") +
labs(x = "Engine Displacement (L)", y = "Highway MPG",
size = "City MPG", colour = "Drive")
Figure 5.10: Size scaled by area (default) is perceptually accurate.
5.4.4 Identity scales
For most aesthetics, the default scales automatically map your data values to
appropriate visual values. For example, scale_size_continuous() maps your data
range to a sensible range of point sizes. However, if your data already contains
values that are directly interpretable as sizes (e.g., 1–10), you can use
scale_size_identity() to use those values directly:
ggplot(mpg, aes(x = cty, y = hwy, size = displ)) +
geom_point(alpha = 0.5) +
scale_size_identity() +
labs(x = "City MPG", y = "Highway MPG")
Figure 5.11: Using scale_size_identity() to use data values directly as sizes.
In this example, the displ values (engine displacement in litres, ranging from
about 1.6 to 7) are used directly as point sizes. This works because the range
happens to be sensible for point sizes.
Caution: Identity scales bypass the automatic mapping, so they only work when your data values happen to be in a sensible range for that aesthetic. For size, values between roughly 1 and 10 work well; for alpha, values must be between 0 and 1. Avoid using identity scales for aesthetics like colour or shape unless you know exactly what you are doing (e.g., your data contains valid colour hex codes or shape numbers).
5.5 \(X\) and \(Y\) axes
The \(x\) and \(y\) axes are also controlled by scale functions, and you can transform them in various ways.
5.5.1 Default continuous scales
The default scale functions scale_x_continuous() and scale_y_continuous()
are applied automatically to numeric variables. Their main uses are:
- Setting axis limits: Control the range of values displayed
- Customising tick marks: Control where breaks appear and their labels
5.5.2 Setting axis limits
Use xlim() and ylim() for quick limits, or scale_*_continuous(limits = )
for more control:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_x_continuous(limits = c(2, 6)) +
scale_y_continuous(limits = c(15, 40)) +
labs(x = "Engine Displacement (L)", y = "Highway MPG")## Warning: Removed 33 rows containing missing values or values outside
## the scale range (`geom_point()`).
Figure 5.12: Setting axis limits.
Warning: Setting limits with scale_*_continuous(limits = ) removes data
outside the range. Use coord_cartesian(xlim = , ylim = ) to zoom without
removing data (covered in Chapter 6). See also Section
7.4 in Chapter 7 for a direct comparison of
these two approaches with dplyr::filter().
For customising tick marks, breaks, and labels (including avoiding scientific notation), see Section 6.8 in Chapter 6.
5.5.3 Logarithmic scales
Logarithmic scales are useful when data spans several orders of magnitude or follows a multiplicative relationship. The key insight is that taking logs transforms multiplicative relationships into linear ones.
Consider the geometric distribution (which you would have seen in the module
Introduction to Probability & Statistics in Stage 1) with probability mass
function (PMF). (Note: the geometric distribution in probability theory has
nothing to do with geoms in ggplot2 — the naming is purely coincidental.)
\[
f(x) = p(1-p)^x, \quad x = 0, 1, 2, \ldots
\]
Taking the logarithm of both sides: \[ \log f(x) = \log p + x \log(1-p) \]
This is a linear function of \(x\), with intercept \(\log p\) and slope \(\log(1-p)\). So if we plot counts from geometric data on a log scale, we should see approximately a straight line.
Let’s simulate geometric data and count the occurrences:
set.seed(42)
some_counts <- data.frame(x = rgeom(10000, prob = 0.5)) |>
count(x)
head(some_counts)## x n
## 1 0 5038
## 2 1 2482
## 3 2 1259
## 4 3 569
## 5 4 315
## 6 5 182
On the original (linear) scale, the counts decrease rapidly:
ggplot(some_counts, aes(x = x, y = n)) +
geom_point() +
labs(x = "x", y = "Count")
Figure 5.13: Geometric distribution counts on a linear scale.
But with scale_y_log10(), the relationship becomes approximately linear:
ggplot(some_counts, aes(x = x, y = n)) +
geom_point() +
scale_y_log10() +
labs(x = "x", y = "Count (log scale)")
Figure 5.14: Geometric distribution counts on a log scale — the relationship becomes linear.
The points fall approximately on a straight line, confirming the linear relationship between \(x\) and \(\log f(x)\).
Note that here we pre-counted the data using dplyr::count() before plotting
with geom_point(). In Section 7.3.4 (Chapter
7), we show that geom_bar() can produce an equivalent log-scale
bar chart directly from the raw data, without any pre-counting.
5.6 Facets
While faceting is not strictly an aesthetic (it doesn’t map a variable to a visual property like colour or size), it is a powerful way to visualise one or two additional discrete or categorical variables. Faceting creates small multiples — the same plot repeated for different subsets of the data.
Facets work best when the faceting variable has a small number of distinct values (levels or categories). With too many levels, the panels become too small to be useful.
5.6.1 facet_wrap()
facet_wrap() wraps a 1D ribbon of panels into 2D, based on a single variable:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ drv) +
labs(x = "Engine Displacement (L)", y = "Highway MPG")## `geom_smooth()` using formula = 'y ~ x'
Figure 5.16: facet_wrap() for a single variable.
Control the number of rows or columns:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ drv, ncol = 1) +
labs(x = "Engine Displacement (L)", y = "Highway MPG")
Figure 5.17: facet_wrap() with specified columns.
5.6.2 facet_grid()
facet_grid() creates a 2D grid of panels based on two variables:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ cyl) +
labs(x = "Engine Displacement (L)", y = "Highway MPG")
Figure 5.18: facet_grid() for two variables.
Use . to indicate no faceting for that dimension:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ .) +
labs(x = "Engine Displacement (L)", y = "Highway MPG")
Figure 5.19: facet_grid() with rows only.
5.6.3 Free scales
By default, all panels share the same axis scales. Use scales = "free" to
allow each panel its own scale:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class, scales = "free_y") +
labs(x = "Engine Displacement (L)", y = "Highway MPG")
Figure 5.20: Facets with free \(y\)-axis scales.
Options: "free", "free_x", "free_y", or "fixed" (default).
5.7 Summary
- Aesthetics are visual properties that encode data; scales control how the mapping works
- All scale functions follow the
scale_<aesthetic>_<type>()naming convention - Shape: distinguish categorical groups; less mature than colour, so don’t over-invest in manual specification
- Line type: distinguish lines, especially for black-and-white printing
- Size: represent continuous variables with point/line size; identity scales use data values directly (use with caution)
- \(X\) and \(Y\) axes: transform axes (log, sqrt, reverse) and control limits, breaks, and labels
- Facets: not an aesthetic, but useful for visualising additional discrete variables with few levels
- Combine multiple aesthetics (e.g., shape + colour) for redundant encoding, but be aware this uses up aesthetics that could show other variables