Customizing Quarto reports and presentations

Lecture 19

Dr. Benjamin Soltoff

Cornell University
INFO 2950 - Spring 2024

March 28, 2024

Announcements

Announcements

  • Lab tomorrow

Application exercise

ae-17

  • Go to the course GitHub org and find your ae-17 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio Workbench, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow

Quarto

Quarto basics

---
title: "Gun deaths"
author: Your name
date: today
format: html
---

```{r}
#| label: setup
#| include: false

library(tidyverse)
library(rcis)

data("gun_deaths")
```

```{r}
#| label: youth

youth <- gun_deaths |>
  filter(age <= 65)
```

# Gun deaths by age

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

```{r}
#| label: youth-dist
#| echo: false

ggplot(data = youth, mapping = aes(x = age)) + 
  geom_freqpoly(binwidth = 1)
```

# Gun deaths by race

```{r}
#| label: race-dist

youth |>
  mutate(race = fct_infreq(race) |> fct_rev()) |>
  ggplot(mapping = aes(y = race)) +
  geom_bar() +
  labs(y = "Victim race")
```

Major components

  1. A YAML header surrounded by ---s
  2. Chunks of code surounded by ```
  3. Text mixed with simple text formatting using the Markdown syntax

Quarto code chunks

Rendering process

A schematic representing rendering of Quarto documents from .qmd, to knitr or jupyter, to plain text markdown, then converted by pandoc into any number of output types including html, PDF, or Word document.

Rendering process

A schematic representing the multi-language input (e.g. Python, R, Observable, Julia) and multi-format output (e.g. PDF, html, Word documents, and more) versatility of Quarto.

Your turn: Edit the Quarto document

  • Render gun-deaths.qmd as an HTML document
  • Add text describing the frequency polygon
05:00

Code chunks

```{r}
#| label: youth-dist
#| message: false
#| warning: false

# code goes here
```
  • Naming code chunks
  • Code chunk options
  • eval: true
  • include: true
  • echo: true
  • message: true or warning: true
  • cache: true

Caching with dependencies

```{r}
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv") |>
  filter(term >= 1945)
```
```{r}
#| cache: true
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 9,299 × 53
   caseId docketId caseIssuesId dateDecision decisionType usCite sctCite ledCite
   <chr>  <chr>    <chr>        <chr>               <dbl> <chr>  <chr>   <chr>  
 1 1945-… 1945-00… 1945-001-01… 12/10/1945              1 326 U… 66 S. … 90 L. …
 2 1945-… 1945-00… 1945-002-01… 12/3/1945               1 326 U… 66 S. … 90 L. …
 3 1945-… 1945-00… 1945-003-01… 11/13/1945              1 326 U… 66 S. … 90 L. …
 4 1945-… 1945-00… 1945-004-01… 11/13/1945              1 326 U… 66 S. … 90 L. …
 5 1945-… 1945-00… 1945-005-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 6 1945-… 1945-00… 1945-006-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 7 1945-… 1945-00… 1945-007-01… 11/5/1945               2 326 U… 66 S. … 90 L. …
 8 1945-… 1945-00… 1945-008-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 9 1945-… 1945-00… 1945-009-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
10 1945-… 1945-01… 1945-010-01… 12/10/1945              1 326 U… 66 S. … 90 L. …
# ℹ 9,289 more rows
# ℹ 45 more variables: lexisCite <chr>, term <dbl>, naturalCourt <dbl>,
#   chief <chr>, docket <chr>, caseName <chr>, dateArgument <chr>,
#   dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>,
#   adminActionState <dbl>, threeJudgeFdc <dbl>, caseOrigin <dbl>,
#   caseOriginState <dbl>, caseSource <dbl>, caseSourceState <dbl>, …
```{r}
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv")
```
```{r}
#| cache: true
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 9,299 × 53
   caseId docketId caseIssuesId dateDecision decisionType usCite sctCite ledCite
   <chr>  <chr>    <chr>        <chr>               <dbl> <chr>  <chr>   <chr>  
 1 1945-… 1945-00… 1945-001-01… 12/10/1945              1 326 U… 66 S. … 90 L. …
 2 1945-… 1945-00… 1945-002-01… 12/3/1945               1 326 U… 66 S. … 90 L. …
 3 1945-… 1945-00… 1945-003-01… 11/13/1945              1 326 U… 66 S. … 90 L. …
 4 1945-… 1945-00… 1945-004-01… 11/13/1945              1 326 U… 66 S. … 90 L. …
 5 1945-… 1945-00… 1945-005-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 6 1945-… 1945-00… 1945-006-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 7 1945-… 1945-00… 1945-007-01… 11/5/1945               2 326 U… 66 S. … 90 L. …
 8 1945-… 1945-00… 1945-008-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
 9 1945-… 1945-00… 1945-009-01… 11/5/1945               1 326 U… 66 S. … 90 L. …
10 1945-… 1945-01… 1945-010-01… 12/10/1945              1 326 U… 66 S. … 90 L. …
# ℹ 9,289 more rows
# ℹ 45 more variables: lexisCite <chr>, term <dbl>, naturalCourt <dbl>,
#   chief <chr>, docket <chr>, caseName <chr>, dateArgument <chr>,
#   dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>,
#   adminActionState <dbl>, threeJudgeFdc <dbl>, caseOrigin <dbl>,
#   caseOriginState <dbl>, caseSource <dbl>, caseSourceState <dbl>, …

Label your chunks

```{r}
#| label: raw-data-cache
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv")
```
```{r}
#| label: processed-data-cache
#| cache: true
#| dependson: raw-data-cache
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 29,021 × 53
   caseId docketId caseIssuesId dateDecision decisionType usCite sctCite ledCite
   <chr>  <chr>    <chr>        <chr>               <dbl> <chr>  <chr>   <chr>  
 1 1791-… 1791-00… 1791-001-01… 8/3/1791                6 2 U.S… <NA>    1 L. E…
 2 1791-… 1791-00… 1791-002-01… 8/3/1791                2 2 U.S… <NA>    1 L. E…
 3 1792-… 1792-00… 1792-001-01… 2/14/1792               2 2 U.S… <NA>    1 L. E…
 4 1792-… 1792-00… 1792-002-01… 8/7/1792                2 2 U.S… <NA>    1 L. E…
 5 1792-… 1792-00… 1792-003-01… 8/11/1792               8 2 U.S… <NA>    1 L. E…
 6 1792-… 1792-00… 1792-004-01… 8/11/1792               6 2 U.S… <NA>    1 L. E…
 7 1793-… 1793-00… 1793-001-01… 2/19/1793               8 2 U.S… <NA>    1 L. E…
 8 1793-… 1793-00… 1793-002-01… 2/20/1793               2 2 U.S… <NA>    1 L. E…
 9 1793-… 1793-00… 1793-003-01… 2/20/1793               8 2 U.S… <NA>    1 L. E…
10 1794-… 1794-00… 1794-001-01… 2/7/1794               NA 3 U.S… <NA>    1 L. E…
# ℹ 29,011 more rows
# ℹ 45 more variables: lexisCite <chr>, term <dbl>, naturalCourt <dbl>,
#   chief <chr>, docket <chr>, caseName <chr>, dateArgument <chr>,
#   dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>,
#   adminActionState <dbl>, threeJudgeFdc <dbl>, caseOrigin <dbl>,
#   caseOriginState <dbl>, caseSource <dbl>, caseSourceState <dbl>, …

Caching guidelines

  • Label your code chunks
  • Define dependencies
  • Never cache chunks that load packages

Inline code

We have data about `r nrow(gun_deaths)` individuals killed by guns.

Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

We have data about 100798 individuals killed by guns.

Only 15687 are older than 65.

Your turn: Modify chunk options

  • Set echo: false for each code chunk
  • Adjust the figure height and width options for the code chunks with plots
  • Enable caching for each chunk and render the document. Look at the file structure for the cache. What do you see?
07:00

YAML header

YAML header

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: html
---
  • YAML Ain’t Markup Language
  • Standardized format for storing hierarchical data in a human-readable syntax
  • Defines how quarto renders your .qmd file

HTML document

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: html
---

Table of contents

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format:
  html:
    toc: true
    toc-depth: 2
---

Appearance and style

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format:
  html:
    theme: superhero
    highlight-style: github
---

Global options

---
title: "My Document"
format:
  html:
    fig-width: 7
  pdf:
    fig-width: 5
execute:
  echo: true
  message: false
knitr:
  opts_chunk: 
    comment: "#>" 
---
  • Default document-level options
  • Some options are set with format
  • Some options are set with execute
  • Some options are set by knitr/jupyter

Your turn: Modify YAML options

  • Add a table of contents
  • Use themes for light and dark mode
  • Set relevant code chunk options globally
07:00

Other output formats

PDF document

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: pdf
---

Presentation

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: revealjs
---

Quarto supports multiple presentation formats

  • revealjs (HTML)
  • pptx (PowerPoint)
  • beamer (\(\LaTeX\)/PDF)

Additional Quarto

  • Dashboards
  • Websites
  • Books
  • Interactive web applications

R scripts

# gun-deaths.R
# 2022-04-18
# Examine the distribution of age of victims in gun_deaths

# load packages
library(tidyverse)
library(rcis)

# filter data for under 65
youth <- gun_deaths |>
  filter(age <= 65)

# number of individuals under 65 killed
nrow(gun_deaths) - nrow(youth)

# graph the distribution of youth
ggplot(data = youth, mapping = aes(x = age)) +
  geom_freqpoly(binwidth = 1)

# graph the distribution of youth, by race
youth |>
  mutate(race = fct_infreq(race) |> fct_rev()) |>
  ggplot(mapping = aes(y = race)) +
  geom_bar() +
  labs(y = "Victim race")

When to use a script

  • For troubleshooting
  • Initial stages of project
  • Building a reproducible pipeline
  • It depends

Running scripts

  • Interactively
  • Programmatically using source()

Recap

  • Quarto is an open-source, reproducible document system
  • Compatible with R, Python, Julia, Observable, and more
  • Supports multiple output formats

Have a good Spring Break