Efficient graph representation framework for chemical molecule similarity tasks

by Jiaji Ma, Seung-hwan Lim

Publication Type

Conference Paper

Book Title

2023 IEEE Sixth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)

Publication Date

January, 2024

Page Numbers

113 to 120

Publisher Location

New Jersey, United States of America

Conference Name

2023 IEEE Sixth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)

Conference Location

Laguna Hills, California, United States of America

Conference Sponsor

IEEE

Conference Date

Sep 25, 2023 - Sep 27, 2023

View DOI Listing

Abstract

Graph data has emerged in numerous scientific domains and machine learning techniques have been widely used for analysis and learning of diverse data for prediction and decision. Machine learning techniques can readily address complex problems by leveraging their structural information. But graphs cannot be directly used for existing machine learning algorithms unless encoded as vectors. The problem of efficient representation of graphs is a substantial challenge in graph machine learning. In this paper, we propose a novel two-stage framework for the representation of chemical molecule graphs based on the strengths of Graph Isomorphism Networks (GINs) and Siamese autoencoders. In the first stage, the GIN model is constructed and trained using the structural information of chemical molecule graphs. Node attributes, edge attributes, and edge indices are used as input data, while graph attributes are used as labels. The GIN model effectively captures the structural characteristics of graphs and can accurately predict graph attributes, i.e., molecular properties. It also generates Graph Embeddings, represented as vectors that encode the structural information of graphs. In the second stage, Graph Embedding vectors are further optimized for downstream similarity tasks while preserving the graph structural information. The Siamese autoencoder is constructed and trained, which reduces the dimensionality of the Graph Embedding vectors, while maximizing the preservation of structural information in the original high-dimensional vectors. The resulting low-dimensional Graph Embeddings can be effectively utilized for tasks such as approximate nearest neighbor search. The experimental results demonstrate the effectiveness of our proposed framework in accurately predicting graph similarity.

Efficient graph representation framework for chemical molecule similarity tasks

Abstract

Researchers

Organizations