```
<- decision_tree(engine = "rpart") |>
tree_mod set_mode("classification")
<- workflow() |>
tree_wf add_formula(children ~ .) |>
add_model(tree_mod)
```

# Tune better models to predict children in hotel bookings

# Your Turn 1

Fill in the blanks to return the accuracy and ROC AUC for this model using 10-fold cross-validation.

```
set.seed(100)
|>
______ ______(resamples = hotels_folds) |>
______
```

# Your Turn 2

Create a new parsnip model called `rf_mod`

, which will learn an ensemble of classification trees from our training data using the **ranger** package. Update your `tree_wf`

with this new model.

Fit your workflow with 10-fold cross-validation and compare the ROC AUC of the random forest to your single decision tree model — which predicts the test set better?

*Hint: you’ll need https://www.tidymodels.org/find/parsnip/*

```
# model
<- _____ |>
rf_mod _____("ranger") |>
_____("classification")
# workflow
<- tree_wf |>
rf_wf update_model(_____)
# fit with cross-validation
set.seed(100)
|>
_____ fit_resamples(resamples = hotels_folds) |>
collect_metrics()
```

# Your Turn 3

Challenge: Fit 3 more random forest models, each using 5, 12, and 21 variables at each split. Update your `rf_wf`

with each new model. Which value maximizes the area under the ROC curve?

```
<- rf_mod |>
rf5_mod set_args(mtry = 5)
<- rf_mod |>
rf12_mod set_args(mtry = 12)
<- rf_mod |>
rf21_mod set_args(mtry = 21)
```

Do this for each model above:

```
<- rf_wf |>
_____ update_model(_____)
set.seed(100)
|>
_____ fit_resamples(resamples = hotels_folds) |>
collect_metrics()
```

# Your Turn 4

Edit the random forest model to tune the `mtry`

and `min_n`

hyper-parameters; call the new model spec `rf_tuner`

.

Update your workflow to use the tuned model.

Then use `tune_grid()`

to find the best combination of hyper-parameters to maximize `roc_auc`

; let tune set up the grid for you.

How does it compare to the average ROC AUC across folds from `fit_resamples()`

?

```
<- rand_forest(engine = "ranger") |>
rf_mod set_mode("classification")
<- workflow() |>
rf_wf add_formula(children ~ .) |>
add_model(rf_mod)
set.seed(100) # Important!
<- rf_wf |>
rf_results fit_resamples(resamples = hotels_folds,
metrics = metric_set(roc_auc),
# change me to control_grid() with tune_grid
control = control_resamples(verbose = TRUE,
save_workflow = TRUE))
|>
rf_results collect_metrics()
```

`# your code here`

# Your Turn 5

Use `fit_best()`

to take the best combination of hyper-parameters from `rf_results`

and use them to predict the test set.

How does our actual test ROC AUC compare to our cross-validated estimate?

`# your code here`

# Acknowledgments

- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.