Skip to main content
SHARE
Publication

Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases...

by Luna Xu, Seung-hwan Lim, Min Li, Ali Butt, Ramakrishnan Kannan
Publication Type
Conference Paper
Journal Name
IEEE International Conference on Big Data
Book Title
2017 IEEE International Conference on Big Data (Big Data)
Publication Date
Page Numbers
273 to 282
Conference Name
2017 IEEE International Conference on Big Data (Big Data 2017)
Conference Location
Boston, Massachusetts, United States of America
Conference Sponsor
IEEE
Conference Date
-

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are key to supporting large scale data analysis that require efficient processing over millions of data samples. To this end, we present, ARION, a hardware acceleration based approach for scaling-up individual tasks of Spark, a popular data-parallel analytics platform. We support both linear algebraic operations of between two dense matrices, and between sparse and dense matrices in distributed environments. ARION provides a flexible control of acceleration according to matrix density, along with efficient scheduling based on runtime resource utilization. We demonstrate the benefit of our approach for general matrix multiplication operations over large matrices with up to four billion elements by using Gramian matrix computation that is commonly used in machine learning. Experiments show that our approach achieves more than 2× and 1.5× end-to-end performance speedups for dense and sparse matrices, respectively, and up to 57.04× faster computation compared to MLlib, a state of the art Spark-based implementation. This work is sponsored in part by the NSF under the grants: CNS-1565314, CNS-1405697, and CNS-1615411. The manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.