Instance Reduction for Avoiding Overfitting in Decision Trees

Journal of Intelligent Systems 30 (1):438-459 (2021)
  Copy   BIBTEX

Abstract

Decision trees learning is one of the most practical classification methods in machine learning, which is used for approximating discrete-valued target functions. However, they may overfit the training data, which limits their ability to generalize to unseen instances. In this study, we investigated the use of instance reduction techniques to smooth the decision boundaries before training the decision trees. Noise filters such as ENN, RENN, and ALLKNN remove noisy instances while DROP3 and DROP5 may remove genuine instances. Extensive empirical experiments were conducted on 13 benchmark datasets from UCI machine learning repository with and without intentionally introduced noise. Empirical results show that eliminating border instances improves the classification accuracy of decision trees and reduces the tree size, which reduces the training and classification times. In datasets without intentionally added noise, applying noise filters without the use of the built-in Reduced Error Pruning gave the best classification accuracy. ENN, RENN, and ALLKNN outperformed decision trees learning without pruning in 9, 9, and 8 out of 13 datasets, respectively. The datasets reduced using ENN and RENN without built-in pruning were more effective when noise was intentionally introduced in different ratios.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,394

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

共生進化に基づく簡素な決定木の生成.志村 正道 大谷 紀子 - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:399-404.
Combating discrimination using Bayesian networks.Koray Mancuhan & Chris Clifton - 2014 - Artificial Intelligence and Law 22 (2):211-238.
複合属性による領域分割を用いた決定木 Dtmacc.Inazumi Hiroshige Kushi Yusuke - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:44-52.

Analytics

Added to PP
2021-01-23

Downloads
23 (#941,457)

6 months
3 (#1,471,455)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references