Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targeted Object Removal from Images

by Arka Daw, Megan Hong-thanh Chung, Maria Mahbub, Amir Sadovnik

Publication Type

Conference Paper

Book Title

AdvML-Frontiers 2024: The Third Workshop on New Frontiers in Adversarial Machine Learning @ NeurIPS'24

Publication Date

December, 2024

Page Numbers

1 to 4

Publisher Location

Vancouver, Canada

Conference Name

NeurIPS 2024: Annual Conference on Neural Information Processing Systems

Conference Location

Vancouver, Canada

Conference Sponsor

N/A

Conference Date

Dec 9, 2024 - Dec 15, 2024

Abstract

Machine learning models are known to be vulnerable to adversarial attacks, but prior works have mostly focused on single-modalities. With the rise of large multi-modal models (LMMs) like CLIP, which combine vision and language capabilities, new vulnerabilities have emerged. However, these multimodal targeted attacks aim to completely change the model's output to what the adversary wants. In many realistic scenarios, an adversary might seek to make only subtle modifications to the output, so that the changes go unnoticed by downstream models or even by humans. We introduce Hiding-in-Plain-Sight (HiPS) attacks, a novel class of adversarial attacks that subtly modifies model predictions by selectively concealing target object(s), as if the target object was absent from the scene. We propose two HiPS attack variants, HiPS-cls and HiPS-cap, and demonstrate their effectiveness in transferring to downstream image captioning models, such as CLIP-Cap, for targeted object removal from image captions.

Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targeted Object Removal from Images

Abstract

Researchers

Organizations