5 Other Aesthetics

5.1 Introduction

In previous chapters, we have seen how colour (colour / color / fill) and opacity (alpha) can be used as aesthetics to map variables to visual properties. The same principle applies to other visual properties: shape, line type, size, and even the axes themselves.

Recall that aesthetics are the visual properties that encode data (e.g., position, colour, shape), while scales control how the mapping from data values to aesthetic values works. All scale functions in ggplot2 follow the consistent naming pattern:

scale_<aesthetic>_<type>()

This chapter covers the remaining common aesthetics you will encounter.

5.2 Shape

The shape aesthetic maps categorical variables to point shapes. This is useful when you need to distinguish groups in a scatterplot, especially in combination with colour for redundant encoding or when printing in black and white.

Note: Compared to colour, the shape aesthetic is less mature in ggplot2. The scale functions and available values are more limited, and the visual distinction between shapes is generally weaker than between colours. Unless you specifically need shapes (e.g., for black-and-white printing), colour is usually the more effective choice. It is rarely worth spending much time manually specifying shapes.

5.2.1 Default shape mapping

When you map a variable to shape, ggplot2 automatically selects from a set of distinct shapes:

ggplot(mpg, aes(x = displ, y = hwy, shape = drv)) +
  geom_point(size = 3) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG", shape = "Drive")
Default shape mapping for categorical variable.

Figure 5.1: Default shape mapping for categorical variable.

5.2.2 Available shapes

R provides 26 built-in shapes (0–25):

Available point shapes in R.

Figure 5.2: Available point shapes in R.

Shapes 0–14 are hollow, 15–20 are solid, and 21–25 have both colour (outline) and fill (interior).

5.2.3 Manual shape selection

Use scale_shape_manual() to specify exact shapes:

ggplot(mpg, aes(x = displ, y = hwy, shape = drv)) +
  geom_point(size = 3) +
  scale_shape_manual(values = c("4" = 16, "f" = 17, "r" = 15)) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG", shape = "Drive")
Manually specified shapes.

Figure 5.3: Manually specified shapes.

5.2.4 Combining shape and colour

For maximum clarity, combine shape with colour for redundant encoding:

ggplot(mpg, aes(x = displ, y = hwy, shape = drv, colour = drv)) +
  geom_point(size = 3) +
  scale_colour_brewer(palette = "Set1") +
  labs(x = "Engine Displacement (L)", y = "Highway MPG",
       shape = "Drive", colour = "Drive")
Shape and colour combined for redundant encoding.

Figure 5.4: Shape and colour combined for redundant encoding.

A note on redundant encoding: While mapping the same variable to multiple aesthetics (e.g., both shape and colour) can improve accessibility, this approach “uses up” aesthetics that could otherwise represent additional variables. In data with many variables of interest, it is generally better to reserve each aesthetic for a different variable rather than encoding the same information redundantly.

5.3 Line type

The line type aesthetic maps categorical variables to line patterns. This is essential when you need to distinguish multiple lines, particularly for black-and-white printing.

5.3.1 Available line types

ggplot2 provides six named line types:

Available line types in ggplot2.

Figure 5.5: Available line types in ggplot2.

5.3.2 Default line type mapping

The economics_long dataset contains US economic time series data in long format, with a variable column indicating which economic indicator is being measured and a value01 column containing standardised values (scaled to 0–1 for comparability):

ggplot(economics_long, aes(x = date, y = value01, linetype = variable)) +
  geom_line() +
  labs(x = "Year", y = "Standardised Value", linetype = "Variable")
Default line type mapping.

Figure 5.6: Default line type mapping.

5.3.3 Combining line type and colour

As with shape, combining line type and colour provides redundant encoding that improves accessibility:

ggplot(economics_long, aes(x = date, y = value01,
                           linetype = variable, colour = variable)) +
  geom_line() +
  scale_colour_viridis_d() +
  labs(x = "Year", y = "Standardised Value",
       linetype = "Variable", colour = "Variable")
Line type combined with colour for redundant encoding.

Figure 5.7: Line type combined with colour for redundant encoding.

The same trade-off applies here: redundant encoding uses up aesthetics that could otherwise show additional variables. Use redundant encoding when accessibility is important; otherwise, reserve each aesthetic for a different variable.

5.4 Size

The size aesthetic maps continuous variables to the size of points or lines. This creates what is sometimes called a bubble chart.

5.4.1 Default size mapping

ggplot(mpg, aes(x = displ, y = hwy, size = cty)) +
  geom_point(alpha = 0.6) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG", size = "City MPG")
Size mapped to a continuous variable.

Figure 5.8: Size mapped to a continuous variable.

5.4.2 Controlling the size range

Use scale_size_continuous() to set the range of sizes:

ggplot(mpg, aes(x = displ, y = hwy, size = cty)) +
  geom_point(alpha = 0.6) +
  scale_size_continuous(range = c(1, 10)) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG", size = "City MPG")
Controlling the size range.

Figure 5.9: Controlling the size range.

5.4.3 Size by area vs radius

By default, scale_size_continuous() scales by area, which is perceptually more accurate. Use scale_radius() only when the variable represents a radius:

ggplot(mpg, aes(x = displ, y = hwy, size = cty, colour = drv)) +
  geom_point(alpha = 0.7) +
  scale_size_area(max_size = 10) +
  scale_colour_brewer(palette = "Set1") +
  labs(x = "Engine Displacement (L)", y = "Highway MPG",
       size = "City MPG", colour = "Drive")
Size scaled by area (default) is perceptually accurate.

Figure 5.10: Size scaled by area (default) is perceptually accurate.

5.4.4 Identity scales

For most aesthetics, the default scales automatically map your data values to appropriate visual values. For example, scale_size_continuous() maps your data range to a sensible range of point sizes. However, if your data already contains values that are directly interpretable as sizes (e.g., 1–10), you can use scale_size_identity() to use those values directly:

ggplot(mpg, aes(x = cty, y = hwy, size = displ)) +
  geom_point(alpha = 0.5) +
  scale_size_identity() +
  labs(x = "City MPG", y = "Highway MPG")
Using scale\_size\_identity() to use data values directly as sizes.

Figure 5.11: Using scale_size_identity() to use data values directly as sizes.

In this example, the displ values (engine displacement in litres, ranging from about 1.6 to 7) are used directly as point sizes. This works because the range happens to be sensible for point sizes.

Caution: Identity scales bypass the automatic mapping, so they only work when your data values happen to be in a sensible range for that aesthetic. For size, values between roughly 1 and 10 work well; for alpha, values must be between 0 and 1. Avoid using identity scales for aesthetics like colour or shape unless you know exactly what you are doing (e.g., your data contains valid colour hex codes or shape numbers).

5.5 \(X\) and \(Y\) axes

The \(x\) and \(y\) axes are also controlled by scale functions, and you can transform them in various ways.

5.5.1 Default continuous scales

The default scale functions scale_x_continuous() and scale_y_continuous() are applied automatically to numeric variables. Their main uses are:

  1. Setting axis limits: Control the range of values displayed
  2. Customising tick marks: Control where breaks appear and their labels

5.5.2 Setting axis limits

Use xlim() and ylim() for quick limits, or scale_*_continuous(limits = ) for more control:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  scale_x_continuous(limits = c(2, 6)) +
  scale_y_continuous(limits = c(15, 40)) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
## Warning: Removed 33 rows containing missing values or values outside
## the scale range (`geom_point()`).
Setting axis limits.

Figure 5.12: Setting axis limits.

Warning: Setting limits with scale_*_continuous(limits = ) removes data outside the range. Use coord_cartesian(xlim = , ylim = ) to zoom without removing data (covered in Chapter 6). See also Section 7.4 in Chapter 7 for a direct comparison of these two approaches with dplyr::filter().

For customising tick marks, breaks, and labels (including avoiding scientific notation), see Section 6.8 in Chapter 6.

5.5.3 Logarithmic scales

Logarithmic scales are useful when data spans several orders of magnitude or follows a multiplicative relationship. The key insight is that taking logs transforms multiplicative relationships into linear ones.

Consider the geometric distribution (which you would have seen in the module Introduction to Probability & Statistics in Stage 1) with probability mass function (PMF). (Note: the geometric distribution in probability theory has nothing to do with geoms in ggplot2 — the naming is purely coincidental.) \[ f(x) = p(1-p)^x, \quad x = 0, 1, 2, \ldots \]

Taking the logarithm of both sides: \[ \log f(x) = \log p + x \log(1-p) \]

This is a linear function of \(x\), with intercept \(\log p\) and slope \(\log(1-p)\). So if we plot counts from geometric data on a log scale, we should see approximately a straight line.

Let’s simulate geometric data and count the occurrences:

set.seed(42)
some_counts <- data.frame(x = rgeom(10000, prob = 0.5)) |>
  count(x)
head(some_counts)
##   x    n
## 1 0 5038
## 2 1 2482
## 3 2 1259
## 4 3  569
## 5 4  315
## 6 5  182

On the original (linear) scale, the counts decrease rapidly:

ggplot(some_counts, aes(x = x, y = n)) +
  geom_point() +
  labs(x = "x", y = "Count")
Geometric distribution counts on a linear scale.

Figure 5.13: Geometric distribution counts on a linear scale.

But with scale_y_log10(), the relationship becomes approximately linear:

ggplot(some_counts, aes(x = x, y = n)) +
  geom_point() +
  scale_y_log10() +
  labs(x = "x", y = "Count (log scale)")
Geometric distribution counts on a log scale --- the relationship becomes linear.

Figure 5.14: Geometric distribution counts on a log scale — the relationship becomes linear.

The points fall approximately on a straight line, confirming the linear relationship between \(x\) and \(\log f(x)\).

Note that here we pre-counted the data using dplyr::count() before plotting with geom_point(). In Section 7.3.4 (Chapter 7), we show that geom_bar() can produce an equivalent log-scale bar chart directly from the raw data, without any pre-counting.

5.5.4 Available transformations

Function Transformation
scale_x_continuous() Default continuous
scale_x_log10() Log base 10
scale_x_sqrt() Square root
scale_x_reverse() Reversed axis

5.5.5 Discrete scales

For categorical axes:

ggplot(mpg, aes(x = drv, y = hwy)) +
  geom_boxplot() +
  scale_x_discrete(limits = c("r", "f", "4")) +
  labs(x = "Drive Type", y = "Highway MPG")
Reordering discrete axis.

Figure 5.15: Reordering discrete axis.

5.6 Facets

While faceting is not strictly an aesthetic (it doesn’t map a variable to a visual property like colour or size), it is a powerful way to visualise one or two additional discrete or categorical variables. Faceting creates small multiples — the same plot repeated for different subsets of the data.

Facets work best when the faceting variable has a small number of distinct values (levels or categories). With too many levels, the panels become too small to be useful.

5.6.1 facet_wrap()

facet_wrap() wraps a 1D ribbon of panels into 2D, based on a single variable:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ drv) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
## `geom_smooth()` using formula = 'y ~ x'
facet\_wrap() for a single variable.

Figure 5.16: facet_wrap() for a single variable.

Control the number of rows or columns:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_wrap(~ drv, ncol = 1) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
facet\_wrap() with specified columns.

Figure 5.17: facet_wrap() with specified columns.

5.6.2 facet_grid()

facet_grid() creates a 2D grid of panels based on two variables:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_grid(drv ~ cyl) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
facet\_grid() for two variables.

Figure 5.18: facet_grid() for two variables.

Use . to indicate no faceting for that dimension:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_grid(drv ~ .) +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
facet\_grid() with rows only.

Figure 5.19: facet_grid() with rows only.

5.6.3 Free scales

By default, all panels share the same axis scales. Use scales = "free" to allow each panel its own scale:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  facet_wrap(~ class, scales = "free_y") +
  labs(x = "Engine Displacement (L)", y = "Highway MPG")
Facets with free $y$-axis scales.

Figure 5.20: Facets with free \(y\)-axis scales.

Options: "free", "free_x", "free_y", or "fixed" (default).

5.7 Summary

  • Aesthetics are visual properties that encode data; scales control how the mapping works
  • All scale functions follow the scale_<aesthetic>_<type>() naming convention
  • Shape: distinguish categorical groups; less mature than colour, so don’t over-invest in manual specification
  • Line type: distinguish lines, especially for black-and-white printing
  • Size: represent continuous variables with point/line size; identity scales use data values directly (use with caution)
  • \(X\) and \(Y\) axes: transform axes (log, sqrt, reverse) and control limits, breaks, and labels
  • Facets: not an aesthetic, but useful for visualising additional discrete variables with few levels
  • Combine multiple aesthetics (e.g., shape + colour) for redundant encoding, but be aware this uses up aesthetics that could show other variables