Clustering High-dimensional Toxicogenomics Data with Rare Signals...

by Guojing Cong, Scott Auerbach

Publication Type

Conference Paper

Book Title

2023 IEEE International Conference on Data Mining Workshops (ICDMW)

Publication Date

December, 2023

Page Numbers

1 to 7

Publisher Location

New Jersey, United States of America

Conference Name

2023 IEEE International Conference on Data Mining Workshops (ICDMW)

Conference Location

Shanghai, China

Conference Sponsor

Various

Conference Date

Dec 1, 2023 - Dec 4, 2023

View DOI Listing

Abstract

Toxicogenomics studies the gene and protein activities to drug treatments or toxic exposures. As the drugs and genes are numerous, toxicogenomics data are naturally high dimensional, with dimension sizes up to millions. In addition, the distribution of toxicogenomics data is oftentimes skewed, and they contain rare but important signals representing a cell or organism’s response to toxicity. The combination of high dimension and extremely skewed distribution of toxicogenomics data makes clustering analysis extremely challenging.We present our study of clustering toxicogenomics data using classical approaches such as principal component analysis as well as deep learning approaches such as auto-encoders. Our experiments show that these approaches fail to preserve rare signals and produce high-quality clusters. We then explore augmenting matrix factorization with deep learning techniques such as attention mechanism to produce latent representations for clustering. Our technique is able to better preserve rare signals after dimensionality reduction than prior approaches. Furthermore, we combine our augmented matrix factorization with a mechanism similar to autoencoder to balance separable clusters and low regeneration errors. Our experiments demonstrate better clustering with our proposed approach.

Clustering High-dimensional Toxicogenomics Data with Rare Signals...

Abstract

Researchers

Organizations