Randomized Sampling for Large Data Applications of SVM

by Erik M Ferragut, Jason A Laska

Publication Type

Conference Paper

Publication Date

December, 2012

Conference Name

International Conference of Machine Learning Applications

Conference Location

Boca Raton, Florida, United States of America

Conference Sponsor

IEEE

Conference Date

Dec 12, 2012 - Dec 15, 2012

Abstract

A trend in machine learning is the application of existing algorithms to ever-larger datasets. Support Vector Machines (SVM) have been shown to be very effective, but have been difficult to scale to large-data problems. Some approaches have sought to scale SVM training by approximating and parallelizing the underlying quadratic optimization problem. This paper pursues a different approach. Our algorithm, which we call Sampled SVM, uses an existing SVM training algorithm to create a new SVM training algorithm. It uses randomized data sampling to better extend SVMs to large data applications. Experiments on several datasets show that our method is faster than and comparably accurate to both the original SVM algorithm it is based on and the Cascade SVM, the leading data organization approach for SVMs in the literature. Further, we show that our approach is more amenable to parallelization than Cascade SVM.

Randomized Sampling for Large Data Applications of SVM

Abstract

Organizations