ASSIGNMENT

6 Boosted Decision Trees and Random Forest (Henry)

In this assignment, you are going to fit and tune random forest and boosted decision trees to predict whether a given passenger survived the sinking of the Titanic based on the variables provided in the data set. You may use sklearn and gridsearch.

6.1

First, download the Titanic.csv file from Sakai, and drop the columns that have “na” values from the data set. Convert the gender variable into a binary variable with 0 representing male, and 1 representing female.

Then split 80% of the data into train and 20% of the data into test set.

6.2

Fit a random forest and boosted decision tree model on the training set with default parameters, and

estimate the time it required to fit each of the models. Which model required less time to train? Why do

you think that is? (Hint: look at the default values of the parameters.)

6.3

Choose a range of parameter values for each of the algorithms (tell us what parameters you played with),

and tune on the training set over 5 folds. Then, draw the ROC and provide the AUC for each of the

tuned models on the whole training set (in one figure) and test set (in another figure). Comment on the

similarities/di↵erences between the two models (if any).


    Customer Area

    Make your order right away

    Confidentiality and privacy guaranteed

    satisfaction guaranteed