What are Extra-Trees?

Extra-Trees, short for extremely randomised trees, are an extension of Random Forests. Geurts et al. (2006) found that since the optimal cutting point used in each subsample of a Random Forest would depend on the distributions of the independent variables in that particular training sub-sample, it could introduce errors. They recommended using Extra-Trees where further generalisation is achieved by also randomising the cutting point. Contrary to Breiman’s Random Forest algorithm, the Extra-Trees are grown on the full training sample instead of a subsample. There are also computational advantages to using Extra-Trees, with training systematically faster than Random Forests.

Similar to Random Forests, parameters such as minimum leaf size, number of trees in the forest and the maximum depth of trees can be determined by cross-validation, or in the case of time-series, time-series cross-validation.

Leave a Reply