AE 12: Querying the OMDB API with httr2

Application exercise

Modified

February 21, 2024

Packages

We will use the following packages in this application exercise.

tidyverse: For data import, wrangling, and visualization.
httr2: For querying APIs.
jsonlite: For some formatting

library(tidyverse)
library(httr2)
library(jsonlite)

Writing an API function

If an R package has not already been written for an application programming interface (API), you can write your own function to query the API. In this application exercise, we will write a function to query the Open Movie Database.

httr vs. httr2

The httr package is the most popular package for querying APIs in R. However, the httr2 package is a newer package that is more consistent with the tidyverse and has a more intuitive syntax. The developers of httr have superceded it with httr2 and recommend all new software development moving forward use httr2. Therefore we will use httr2 in this application exercise.

Create a request

The first step in querying an API is to create a request. A request is an object that contains the information needed to query the API. The request() function creates a request object. The base_url argument specifies the base URL of the API. The req <- syntax assigns the request object to the variable req.

omdb_req <- request(base_url = "http://www.omdbapi.com/")
omdb_req

<httr2_request>

GET http://www.omdbapi.com/

Body: empty

Perform a dry run

The req_dry_run() function performs a dry run of the request. A dry run is a test run of the request that does not actually query the API. It is useful for testing the request before actually querying the API.

omdb_req |>
  req_dry_run()

GET / HTTP/1.1
Host: www.omdbapi.com
User-Agent: httr2/1.0.0 r-curl/5.2.0 libcurl/8.1.2
Accept: */*
Accept-Encoding: deflate, gzip

Determine the shape of an API request

In order to submit a request, we need to define the shape of the request, or the exact URL used to submit the request. The URL of the request is the base URL of the API plus the path of the request.

APIs typically have three major components to the request:

The base URL for the web service (here it is http://www.omdbapi.com/).
The resource path which is the complete destination of the web service endpoint (OMDB API does not have a resource path).
The query parameters which are the parameters passed to the web service endpoint.

In order to create your request you need to read the documentation to see exactly how these components need to be specified.

Your turn: Use the OMDB documentation to determine the shape of the request for information on Sharknado.

# add an example request here

Generate the query

Store your API key

In order to access the OMDB API you need an API key. If you do not have one, use the example key provided on Canvas.

Your turn: Store your API key in .Renviron so you can access it in your code. Once you have saved the file, restart your R session to ensure the new environment variable is loaded.

# from the console run:
usethis::edit_r_environ()

# in .Renviron add:
omdb_key="your-key"

# add code here

Fetch the response

The req_perform() function fetches the response from the API. The response is stored as a response object.

# add code here

What did we get?

The HTTP response contains a number of useful pieces of information.

Status code

The status code is a number that indicates whether the request was successful. A status code of 200 indicates success. A status code of 400 or 500 indicates an error. resp_status() retrieves the numeric HTTP status code, whereas resp_status_desc() retrieves a brief textual description of the status code.

# add code here

HTTP status codes

Hopefully all you receive is a 200 code indicating the query was successful. If you get something different, the error code is useful in debugging your code and determining what (if anything) you can do to fix it

Code	Status
1xx	Informational
2xx	Success
3xx	Redirection
4xx	Client error (you did something wrong)
5xx	Server error (server did something wrong)

Tip

A more intuitive guide to HTTP status codes.

Body

The body of the response contains the actual data returned by the API. The body is a string of characters.

You can extract the body in various forms using the resp_body_*() family of functions. The resp_body_string() function retrieves the body as a string.

# add code here

JSON

Here the result is actually formatted as using JavaScript Object Notation (JSON), so we can use resp_body_json() to extract the data and store it as a list object in R.

# add code here

Convert to data frame

For data analysis purposes, we prefer that the data be stored as a data frame. The as_tibble() function converts the list object to a tibble.

# add code here

Write a function to query the OMDB API

Sharknado proved so popular that four sequels were made. Let’s write a function to query the OMDB API for information on any of the Sharknado films.

Your turn: Your function should:

Take a single argument (the title of the film)
Print a message using the message() function to track progress
Use throttling to ensure we do not overload the server and exceed any rate limits. Add req_throttle() to the request pipeline to limit the rate to 15 requests per minute.
Return a tibble with the information from the API

omdb_api <- function(title){
  # add code here
}

Once you have written your function, test it out by querying the API for information on “Sharknado”. Then apply an iterative operation to query the API for information on all five Sharknado films and store it in a single data frame.

# test function
omdb_api(title = "Sharknado")

NULL

# titles of films
sharknados <- c(
  "Sharknado", "Sharknado 2", "Sharknado 3",
  "Sharknado 4", "Sharknado 5"
)

# iterate over titles and query API
sharknados_df <- map(.x = sharknados, .f = omdb_api) |>
  list_rbind()
sharknados_df

data frame with 0 columns and 0 rows

Acknowledgments

These exercises draw substantially on the httr2 vignettes and reference documentation.