Skip to main content
SHARE
Publication

Design and analysis of CXL performance models for tightly-coupled heterogeneous computing...

by Anthony M Cabrera, Aaron R Young, Jeffrey S Vetter
Publication Type
Conference Paper
Journal Name
International Workshop on Extreme Heterogeneity Solutions
Book Title
ExHET '22: Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions
Publication Date
Page Numbers
1 to 6
Issue
1
Publisher Location
New York, United States of America
Conference Name
International Workshop on Extreme Heterogeneity Solutions
Conference Location
Seoul, South Korea
Conference Sponsor
ACM
Conference Date

Truly heterogeneous systems enable partitioned workloads to be mapped to the hardware that nets the best performance. However, current practice requires that inter-device communication between different vendors' hardware use host memory as an intermediary step. To date, there are no widely adopted solutions that allow accelerators to directly transfer data. A new cache-coherent protocol, CXL, aims to facilitate easier, fine-grained sharing between accelerators. In this work we analyze existing methods for designing heterogeneous applications that target GPUs and FPGAs working collaboratively, followed by an exploration to show the benefits of a CXL-enabled system. Specifically, we develop a test application that utilizes both an NVIDIA P100 GPU and a Xilinx U250 FPGA to show current communication limitations. From this application, we capture overall execution time and throughput measurements on the FPGA and GPU. We use these measurements as inputs to novel CXL performance models to show that using CXL caching instead of host memory results in a 1.31X speedup, while a more tightly-coupled pipelined implementation using CXL-enabled hardware would result in a speedup of 1.45X.