Study of Overfitting by Machine Learning Methods Using Generalization Equations

by Nageswara S Rao

Publication Type

Conference Paper

Book Title

Proceedings of International Conference on Information Fusion

Publication Date

June, 2023

Publisher Location

United States of America

Conference Name

International Conference on Information Fusion

Conference Location

Charleston, South Carolina, United States of America

Conference Sponsor

ISIF

Conference Date

Jun 26, 2023 - Jun 30, 2023

Abstract

The training error of Machine Learning (ML) methods has been extensively used for performance assessment, and its low values have been used as a main justification for complex methods such as estimator fusion and ensembles, and hyper parameter tuning. We present two practical cases where independent tests indicate that the low training error is more of a reflection of over-fitting rather than the generalization ability. We derive a generic form of the generalization equations that separates the training error terms of ML methods from their epistemic terms that correspond to approximation and learnability properties. It provides a framework to separately account for both terms to ensure an overall high generalization performance. For regression estimation tasks, we derive conditions for performance enhancements achieved by hyper parameter tuning, and fusion and ensemble methods over their constituent methods. We present experimental measurements and ML estimates that illustrate the analytical results for the throughput profile estimation of a data transport infrastructure.

Study of Overfitting by Machine Learning Methods Using Generalization Equations

Abstract

Researchers

Organizations