Chapter 9 ggplot2

ggplot2 is a widely used R package for creating high-quality and customizable visualizations. Developed by Hadley Wickham as part of the tidyverse, ggplot2 is based on the Grammar of Graphics, a theoretical framework for visualizing data. The package is built on top of the grid graphics system, which allows for greater control over the layout and appearance of plots.

ggplot2 follows a layered approach to constructing plots, with each layer representing a different aspect of the visualization. The basic structure of a ggplot2 plot consists of a data set, aesthetics, geometries, and scales.

Data set: ggplot2 plots are built on a data set, which is usually a data frame or a tibble. The data set should contain all the variables needed for the plot, including any grouping variables for faceting or color coding.

Aesthetics: Aesthetics are the visual properties of a plot, such as the color, size, and shape of points, lines, and bars. In ggplot2, aesthetics are mapped to variables in the data set using the aes() function.

Geometries: Geometries are the visual elements used to represent the data, such as points, lines, bars, and areas. In ggplot2, geometries are added to the plot using a geom_() function, where is the name of the geometry.

Scales: Scales determine how the data is mapped to the visual properties of the plot, such as the range of the x and y axes and the color palette. In ggplot2, scales are set using scale_() functions, where is the name of the aesthetic.

ggplot2 provides a wide range of geometries, aesthetics, and scales, which can be combined to create complex and informative visualizations. Some of the most commonly used geometries and aesthetics are:

  • geom_point(): Displays data as points.
  • geom_line(): Connects data points with a line.
  • geom_bar(): Displays data as vertical bars.
  • geom_histogram(): Displays data as a histogram.
  • geom_boxplot(): Displays the distribution of data using a box-and-whisker plot.
  • geom_smooth(): Adds a smoothed line to a scatter plot.
  • geom_area(): Displays data as a stacked area chart.
  • geom_tile(): Displays data as a heatmap.
  • geom_hex(): Displays data as a hexagonal heatmap.
  • geom_density(): Displays the density of data.
  • geom_violin(): Displays the distribution of data using a violin plot.
  • geom_jitter(): Adds random noise to points to prevent overlapping.
  • geom_abline(): Adds a straight line with a specified intercept and slope.
  • geom_hline(): Adds a horizontal line at a specified y-value.
  • geom_text(): Adds text labels to the plot.

Aesthetics include: x, y, color, fill, shape, size, alpha, linetype, group, weight.

ggplot2 also provides a number of customization options, such as changing the axis labels and titles, adjusting the plot margins, and adding text annotations.