AE 11: Scraping multiple pages of articles from the Cornell Review

tidyverse: For data import, wrangling, and visualization.
rvest: For scraping HTML files.
lubridate: For formatting date variables.
robotstxt: For verifying if we can scrape a website.

Application exercise

Modified

February 29, 2024

Packages

We will use the following packages in this application exercise.

library(tidyverse)
library(rvest)
library(robotstxt)

This will be done in the iterate-cornell-review.R R script. Save the resulting data frame in the data folder.