(b) . [7.5 pts] Suppose that X
1
, ..., X
m
are categorical input attributes and Y is categorical output
attribute. Suppose we plan to learn a decision tree without pruning, using the standard algorithm.
b.1 (True or False -1.5 pts ) : If X
i
and Y are independent in the distribution that generated this
dataset, then X
i
will not appear in the decision tree.
Answer: False (because the attribute may become relevant further down the tree when the
records are restricted to some value of another attribute) (e.g. XOR)
b.2 (True or False -1.5 pts) : If IG(Y |X
i
) = 0 according to the values of entropy and conditional
entropy computed from the data, then X
i
will not appear in the decision tree.
Answer: False for same reason
b.3 (True or False -1.5 pts ) : The maximum depth of the decision tree must be less than m+1 .
Answer: True because the attributes are categorical and can each be split only once
b.4 (True or False -1.5 pts ) : Suppose data has R records, the maximum depth of the decision tree
must be less than 1 + log
2
R
Answer: False because the tree may be unbalanced
b.5 (True or False -1.5 pts) : Suppose one of the attributes has R distinct values, and it has a
unique value in each record. Then the decision tree will certainly have depth 0 or 1 (i.e. will be
a single node, or else a root node directly connected to a set of leaves)
Answer: True because that attribute will have perfect information gain. If an attribute has
perfect information gain it must split the records into ”pure” buckets which can be split no more.
3