General Unlearning Strategy for Graph Neural Networks

We consider the problem of graph unlearning, wherein graph neural network model is trained to specified accuracy and then deployed while a sequence of requests arrives to delete graph elements (nodes, edges) from the model. As GNN models are used in real-world implementation, this problem is increasingly vital to address — for example, when a user seeks to hide their connections with others in a social graph or when relationships in a knowledge graph become irrelevant or are not true anymore.

To unlearn information from a trained GNN, its influence on both GNN model weights as well as on representations of neighbors in the graph must be deleted from the model. However, existing methods using retraining and weight modification either degrade model weights shared across all nodes or are ineffective because of strong dependency of deleted edges on their local graph neighborhood. Realizing these pitfalls, we formalize the required properties for graph unlearning in the form of Deleted Edge Consistency and Neighborhood Influence and develop GNNDelete, a model-agnostic layer-wise operator that optimize both properties for unlearning tasks.

GNNDelete updates latent representations to delete nodes and edges from the model while keeping the rest of the learned knowledge intact. Experiments on six real-world and two knowledge graphs show that GNNDelete outperforms existing graph unlearning models by up to 36.9% in AUC on link prediction tasks and 22.5% in AUC on distinguishing deleted edges from nondeleted edges. GNNDelete efficient — e.g., it takes 12.3x less time and 9.3x less space than retraining from scratch on a large knowledge graph.

Graph neural networks (GNN) are increasingly used in real-world implementations, and the underlying graphs evolve over time in most deployed GNNs. Traditional machine learning approaches often work offline, wherein a model is trained once using the full training dataset and then locked for inference with few, if any, updates to the model. In contrast, online training can update the model using new training data points as they become available. However, neither offline nor online learning can deal with data deletion — the task of removing all traces of a data point from a model without sacrificing model performance. When data needs to be deleted from a model, the model must be updated accordingly. For example, GNNs must implement privacy provisions for preserving personal privacy (e.g., California Consumer Privacy Act (CCPA) and General Data Protection Regulation (GDPR)), meaning that endowing GNNs with data deletion capability is important yet sparsely studied in the literature.

Designing graph unlearning methods is nonetheless challenging. Removing data alone is insufficient to comply with recent demands for increased data privacy because models trained on the original data may still contain information about their patterns and features. A naive approach is to delete the data and retrain a model from scratch. However, this can be prohibitively expensive, especially in large datasets.

We introduce GNNDelete, a general approach for graph unlearning. We formalize two key GNN deletion properties:

  • Deleted Edge Consistency: Predicted probability for deleted edges of the unlearned model should be similar to those for nonexistent edges. The property enforces GNNDelete to unlearn information such that deleted edges masquerade as unconnected nodes.
  • Neighborhood Influence: We establish a connection between graph unlearning and Granger causality to ensure that local subgraphs after deletion are not affected and thus maintain the original predictive dependencies. However, existing graph unlearning methods do not consider this essential property, meaning they do not consider local connectivity influence, leading to sub-optimal deletion.

Using both properties, we develop GNNDelete, a layer-wise deletion operator to update node representations. When receiving deletion requests, GNNDelete freezes the model and learns additional small gated weight matrices that are shared across all nodes. Unlike existing methods that attempt to retrain several small models from scratch or directly update model weights which can be inefficient and suboptimal, GNNDelete uses small learnable matrices for inference without changing GNN model weights, achieving both efficiency and scalability. To optimize GNNDelete, we specify a novel objective function that satisfies Deleted Edge Consistency and Neighborhood Influence, yielding strong overall deletion.

Publication

GNNDelete: A General Unlearning Strategy for Graph Neural Networks
Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik
International Conference on Learning Representations, ICLR 2023

@inproceedings{cheng2023gnndelete,
title = {GNNDelete: A General Unlearning Strategy for Graph Neural Networks},
author = {Cheng, Jiali and Dasoulas, George and He, Huan and Agarwal, Chirag and Zitnik, Marinka},
booktitle = {International Conference on Learning Representations, ICLR},
year      = {2023}
}

Code

PyTorch implementation together with documentation and examples of usage is available in the GitHub repository.

Authors

Latest News

Jan 2023:   GNNDelete at ICLR 2023

Jan 2023:   New Network Principle for Molecular Phenotypes

Dec 2022:   Can we shorten rare disease diagnostic odyssey?

New preprint! Geometric deep learning for diagnosing patients with rare genetic diseases. Implications for using deep learning on sparsely-labeled medical datasets. Thankful for this collaboration with Zak Lab. Project website.

Nov 2022:   Can AI transform the way we discover new drugs?

Our conversation with Harvard Medicine News highlights recent developments and new features in Therapeutics Data Commons.

Oct 2022:   New Paper in Nature Biomedical Engineering

New paper on graph representation learning in biomedicine and healthcare published in Nature Biomedical Engineering.

Sep 2022:   New Paper in Nature Chemical Biology

Our paper on artificial intelligence foundation for therapeutic science is published in Nature Chemical Biology.

Sep 2022:   Self-Supervised Pre-Training at NeurIPS 2022

New paper on self-supervised contrastive pre-training accepted at NeurIPS 2022. Project page. Thankful for this collaboration with the Lincoln National Laboratory.

Sep 2022:   Best Paper Honorable Mention Award at IEEE VIS

Our paper on user-centric AI of drug repurposing received the Best Paper Honorable Mention Award at IEEE VIS 2022. Thankful for this collaboration with Gehlenborg Lab.

Sep 2022:   Multimodal Representation Learning with Graphs

Aug 2022:   On Graph AI for Precision Medicine

The recording of our tutorial on using graph AI to advance precision medicine is available. Tune into four hours of interactive lectures about state-of-the-art graph AI methods and applications in precision medicine.

Aug 2022:   Evaluating Explainability for GNNs

New preprint! We introduce a resource for broad evaluation of the quality and reliability of GNN explanations, addressing challenges and providing solutions for GNN explainability. Project website.

Jul 2022:   New Frontiers in Graph Learning at NeurIPS

Excited to organize the New Frontiers in Graph Learning workshop at NeurIPS.

Jul 2022:   AI4Science at NeurIPS

We are excited to host the AI4Science meeting at NeurIPS discussing AI-driven scientific discovery, implementation and verification of AI in science, the influence AI has on the conduct of science, and more.

Jul 2022:   Graph AI for Precision Medicine at ISMB

Jul 2022:   Welcoming Fellows and Summer Students

Welcoming a research fellow Julia Balla and three Summer students, Nicholas Ho, Satvik Tripathi, and Isuru Herath.

Jun 2022:   Broadly Generalizable Pre-Training Approach

Excited to share a preprint on self-supervised method for pre-training. Project website with evaluation on eight datasets, including electrodiagnostic testing, human daily activity recognition, and health state monitoring.

Jun 2022:   Welcoming New Postdocs

Excited to welcome George Dasoulas and Huan He, new postdocs joining us this Summer.

May 2022:   George Named the 2022 Wojcicki Troper Fellow

May 2022:   New preprint on PrimeKG

New preprint on building knowledge graphs to enable precision medicine applications.

May 2022:   Building KGs to Support Precision Medicine

Zitnik Lab  ·  Artificial Intelligence in Medicine and Science  ·  Harvard  ·  Department of Biomedical Informatics