The tvshows data set from AI:FCA-1 (exercise 7.3) is shown below:
Assume that we are reporting errors based on the absolute error value (basically, each error counts as 1). Gove very brief answers to the following questions:
What is the optimal tree with one node only? What is the associated error?
What is the optimal tree with a depth of 2 (i.e. a root node with leaves as children)? What is the associated error? Which instances end up at each leaf?
What is the smallest tree that classifies correctly all training instances? How will it classify a new instance described as (Comedy = true, Doctors = true, Lawyers = true, Guns = true) and another one as (Comedy = false, Doctors = false, Lawyers = true, Guns = true)? Which of the two test instances allow us to say that the tree is able to generalize?
If you were building the tree using the information gain as a splitting criterion, what would be the root?