tree_mod <- decision_tree(engine = "rpart") |>
set_mode("classification")
tree_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(tree_mod)Tune better models to predict children in hotel bookings
Your Turn 1
Fill in the blanks to return the accuracy and ROC AUC for this model using 10-fold cross-validation.
set.seed(100)
______ |>
______(resamples = hotels_folds) |>
______Your Turn 2
Create a new parsnip model called rf_mod, which will learn an ensemble of classification trees from our training data using the ranger package. Update your tree_wf with this new model.
Fit your workflow with 10-fold cross-validation and compare the ROC AUC of the random forest to your single decision tree model — which predicts the test set better?
Hint: you’ll need https://www.tidymodels.org/find/parsnip/
# model
rf_mod <- _____ |>
_____("ranger") |>
_____("classification")
# workflow
rf_wf <- tree_wf |>
update_model(_____)
# fit with cross-validation
set.seed(100)
_____ |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()Your Turn 3
Challenge: Fit 3 more random forest models, each using 5, 12, and 21 variables at each split. Update your rf_wf with each new model. Which value maximizes the area under the ROC curve?
rf5_mod <- rf_mod |>
set_args(mtry = 5)
rf12_mod <- rf_mod |>
set_args(mtry = 12)
rf21_mod <- rf_mod |>
set_args(mtry = 21) Do this for each model above:
_____ <- rf_wf |>
update_model(_____)
set.seed(100)
_____ |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()Your Turn 4
Edit the random forest model to tune the mtry and min_n hyper-parameters; call the new model spec rf_tuner.
Update your workflow to use the tuned model.
Then use tune_grid() to find the best combination of hyper-parameters to maximize roc_auc; let tune set up the grid for you.
How does it compare to the average ROC AUC across folds from fit_resamples()?
rf_mod <- rand_forest(engine = "ranger") |>
set_mode("classification")
rf_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(rf_mod)
set.seed(100) # Important!
rf_results <- rf_wf |>
fit_resamples(resamples = hotels_folds,
metrics = metric_set(roc_auc),
# change me to control_grid() with tune_grid
control = control_resamples(verbose = TRUE,
save_workflow = TRUE))
rf_results |>
collect_metrics()# your code hereYour Turn 5
Use fit_best() to take the best combination of hyper-parameters from rf_results and use them to predict the test set.
How does our actual test ROC AUC compare to our cross-validated estimate?
# your code hereAcknowledgments
- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.