Skip to main content
SHARE
Publication

Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures

by Yongseok P Soh, Ramakrishnan Kannan, Piyush K Sao, Jee Choi
Publication Type
Conference Paper
Book Title
ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
Publication Date
Page Numbers
107 to 116
Volume
22
Publisher Location
New York, New York, United States of America
Conference Name
ICPP24: The 53rd International Conference on Parallel Processing
Conference Location
Gotland, Sweden
Conference Sponsor
ACM
Conference Date
-

This study presents the first constrained sparse tensor factorization (cSTF) framework that optimizes and fully offloads computation to massively parallel GPU architectures, and the first performance characterization of cSTF on GPU architectures. In contrast to prior work on tensor factorization, where the matricized tensor times Khatri-Rao product (MTTKRP) is the primary performance bottleneck, our systematic analysis of the cSTF algorithm on GPUs reveals that adding constraints creates an additional bottleneck in the update operation for many real-world sparse tensors. While executing the update operation on the GPU brings significant speedup over its CPU counterpart, it remains a significant bottleneck. To further accelerate the update operation, we propose cuADMM, a new update algorithm that leverages algorithmic and code optimization strategies to minimize both computation and data movement on GPUs. As a result, our framework delivers significantly improved performance compared to prior state-of-the-art. On 10 real-world sparse tensors, our framework achieves geometric mean speedup of 5.1 × (max 41.59 ×) and 7.01 × (max 58.05 ×) on the NIVIDA A100 and H100 GPUs, respectively, over the state-of-the-art SPLATT library running on a 26-core Intel Ice Lake Xeon CPU.