Evaluating Explainability for Graph Neural Networks

GraphXAI is a resource to systematically evaluate and benchmark the quality of GNN explanations. A key component is a novel and flexible synthetic dataset generator called ShapeGGen that can automatically generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) together with ground-truth explanations that are robust to known pitfalls of explainable algorithms.

As graph AI models are increasingly used in high-stakes applications, it becomes essential to ensure that the relevant stakeholders can understand and trust their functionality. Only if the stakeholders clearly understand the behavior of these models, they can evaluate when and how much to rely on these models, and detect potential biases or errors in them. To this end, several approaches have been proposed to explain the predictions of GNNs. Based on the techniques they employ, these approaches can be broadly categorized into perturbation-based, gradient-based, and surrogate-based models.

To ensure that GNN explanations are reliable, it is important to correctly evaluate their quality. However, evaluating the quality of GNN explanations is a rather nascent research area with relatively little work. The approaches proposed thus far mainly leverage ground-truth explanations associated with specific datasets. However, this strategy is prone to several pitfalls:

  • For instance, there could be multiple underlying rationales (redundant/non-unique explanations) that could generate the true class labels and a given ground-truth explanation may only capture one of those, but the GNN model trained on the data may be relying on an entirely different rationale. In such a case, evaluating the explanation output by a state-of-the-art method using the ground-truth explanation is incorrect because the underlying GNN model itself is not relying on that ground-truth explanation.

  • In addition, even if there is a unique ground-truth explanation which generates the true class labels, the GNN model trained on the data could be a weak predictor which uses an entirely different rationale for making predictions. Post hoc explanations of such a model should not be evaluated based on the ground-truth explanation either.

  • Lastly, the ground-truth explanations corresponding to some of the existing benchmark datasets can be recovered using trivial baselines (e.g., random node or edge as explanation), and such datasets are not good candidates for reliably evaluating explanation quality.

Overview of GraphXAI

GraphXAI is a resource for systematic benchmarking and evaluation of GNN explainability methods. The process to evaluate explanation methods is to choose a graph problem and a GNN architecture to train, then train the GNN model and use a GNN explainer on its predictions to generate explanations. Finally, we compare explanations with a problem-given ground truth to provide a performance score for the GNN explainer. To this end, GraphXAI provides the following:

  • Dataset generator D} that can generate diverse types of graphs G, including homophilic, heterophilic, and attributed graphs suitable for the study of graph explainability. Prevailing benchmark datasets are designed for benchmarking GNN predictors and typically consist of a graph or a set of graphs and associated ground-truth label information. While these datasets are sufficient for studying GNN predictors, they cannot be readily used for studying GNN explainers because they lack a critical component, namely information on ground-truth explanations. GraphXAI addresses this critical gap by providing the SHapeGraph generator to create graphs with ground-truth explanations that are uniquely suited for studying GNN explainers.

  • GNN predictor f that is a user-specified GNN model trained on a dataset produced by D and optimized to predict labels for a particular downstream task.

  • GNN explanation method(s) O that takes a prediction f(u) and returns an explanation M(u) = O(f, u) for it.

  • Explanation quality metrics P such that each metric takes a set of explanations and evaluates them for correctness relative to ground-truth explanations.

When taken together, GraphXAI provides all the necessary functionality needed to systematically benchmark and evaluate GNN explainability methods. Further, it addresses the above mentioned pitfalls of state-of-the-art evaluation setups for GNN explanation methods.

GraphXAI includes the following:

  • novel generator ShapeGGen to automatically generate diverse types of XAI-ready benchmark datasets, including homophilic, heterophilic, and attributed graphs, each accompanied by ground-truth explanations,

  • graph and explanation functions compatible with deep learning frameworks, such as PyTorch and PyTorch Geometric libraries,

  • training and visualization functions for GNN explainers,

  • utility functions to support the development of new GNN explainers, and

  • comprehensive set of performance metrics to evaluate the correctness of explanations produced by GNN explainers relative to ground-truth explanations.

ShapeGGen Data Generator

ShapeGGen is a generator of XAI-ready graph datasets supported by graph theory and particularly suitable for benchmarking GNN explainers and study their limitations.

ShapeGGen generates graphs by combining subgraphs containing any given motif and additional nodes. The number of motifs in a k-hop neighborhood determines the node label (in the figure, we use a 1-hop neighborhood for labeling, and nodes with two motifs in its 1-hop neighborhood are highlighted in red). Feature explanations are some mask over important node features (green striped), with an option to add a protected feature (shown in purple) whose correlation to node labels is controllable. Node explanations are nodes contained in the motifs (horizontal striped nodes) and edge explanations (bold lines) are edges connecting nodes within motifs.


Evaluating Explainability for Graph Neural Networks
Chirag Agarwal*, Owen Queen*, Himabindu Lakkaraju and Marinka Zitnik
Scientific Data 2023 [arXiv]

* Equal Contribution

  title={Evaluating Explainability for Graph Neural Networks},
  author={Agarwal, Chirag and Queen, Owen and Lakkaraju, Himabindu and Zitnik, Marinka},
  journal={Scientific Data},
  publisher={Nature Publishing Group}


Datasets and Pytorch implementation of GraphXAI are available in the GitHub repository.


Latest News

Mar 2023:   New Paper in Nature Machine Intelligence

New paper with NASA in Nature Machine Intelligence on biomonitoring and precision health in deep space supported by artificial intelligence.

Mar 2023:   New Paper in Nature Machine Intelligence

Mar 2023:   TxGNN - Zero-shot prediction of therapeutic use

Mar 2023:   GraphXAI published in Scientific Data

Feb 2023:   Welcoming New Postdoctoral Fellows

A warm welcome to postdoctoral fellows Ruth Johnson and Wanxiang Shen. We are thrilled to have them joining us soon and look forward to working together.

Feb 2023:   New Preprint on Distribution Shifts

Feb 2023:   PrimeKG published in Scientific Data

Jan 2023:   GNNDelete published at ICLR 2023

Jan 2023:   New Network Principle for Molecular Phenotypes

Dec 2022:   Can we shorten rare disease diagnostic odyssey?

New preprint! Geometric deep learning for diagnosing patients with rare genetic diseases. Implications for using deep learning on sparsely-labeled medical datasets. Thankful for this collaboration with Zak Lab. Project website.

Nov 2022:   Can AI transform the way we discover new drugs?

Our conversation with Harvard Medicine News highlights recent developments and new features in Therapeutics Data Commons.

Oct 2022:   New Paper in Nature Biomedical Engineering

New paper on graph representation learning in biomedicine and healthcare published in Nature Biomedical Engineering.

Sep 2022:   New Paper in Nature Chemical Biology

Our paper on artificial intelligence foundation for therapeutic science is published in Nature Chemical Biology.

Sep 2022:   Self-Supervised Pre-Training at NeurIPS 2022

New paper on self-supervised contrastive pre-training accepted at NeurIPS 2022. Project page. Thankful for this collaboration with the Lincoln National Laboratory.

Sep 2022:   Best Paper Honorable Mention Award at IEEE VIS

Our paper on user-centric AI of drug repurposing received the Best Paper Honorable Mention Award at IEEE VIS 2022. Thankful for this collaboration with Gehlenborg Lab.

Sep 2022:   Multimodal Representation Learning with Graphs

Aug 2022:   On Graph AI for Precision Medicine

The recording of our tutorial on using graph AI to advance precision medicine is available. Tune into four hours of interactive lectures about state-of-the-art graph AI methods and applications in precision medicine.

Aug 2022:   Evaluating Explainability for GNNs

New preprint! We introduce a resource for broad evaluation of the quality and reliability of GNN explanations, addressing challenges and providing solutions for GNN explainability. Project website.

Jul 2022:   New Frontiers in Graph Learning at NeurIPS

Excited to organize the New Frontiers in Graph Learning workshop at NeurIPS.

Jul 2022:   AI4Science at NeurIPS

We are excited to host the AI4Science meeting at NeurIPS discussing AI-driven scientific discovery, implementation and verification of AI in science, the influence AI has on the conduct of science, and more.

Zitnik Lab  ·  Artificial Intelligence in Medicine and Science  ·  Harvard  ·  Department of Biomedical Informatics