Towards integration of privacy enhancing technologies in explainable artificial intelligence

Download	View final version: Towards integration of privacy enhancing technologies in explainable artificial intelligence (PDF, 8.7 MiB) View supplementary information part 1: Towards integration of privacy enhancing technologies in explainable artificial intelligence (JPG, 199 KiB) View supplementary information part 2: Towards integration of privacy enhancing technologies in explainable artificial intelligence (JPG, 200 KiB)
DOI	Resolve DOI: https://doi.org/10.1016/j.knosys.2025.115235
Author	Search for: Allana, Sonal¹ORCID identifier: https://orcid.org/0000-0002-4879-276X; Search for: Dara, Rozita¹ORCID identifier: https://orcid.org/0000-0002-3728-0275; Search for: Lin, Xiaodong¹ORCID identifier: https://orcid.org/0000-0001-8916-6645; Search for: Xiong, Pulei²ORCID identifier: https://orcid.org/0000-0002-3460-6946
Affiliation	University of Guelph National Research Council Canada. Digital Technologies
Funder	Search for: Natural Sciences and Engineering Research Council of Canada; Search for: National Research Council Canada
Format	Text, Article
Subject	privacy preserving XAI; attribute inference; PETs; synthetic data; differential privacy; noise
Abstract	Explainable artificial intelligence (XAI) plays a crucial role in mitigating the risks associated with the non-transparency of black-box artificial intelligence (AI) systems. However, despite its advantages, XAI methods have been shown to expose the privacy of individuals whose data are used to train or query the underlying models. Prior research has demonstrated privacy attacks that exploit explanations to infer sensitive personal information of individuals. At present, there is a lack of effective defenses against such privacy attacks targeting explanations, particularly when vulnerable XAI techniques are deployed in production environments or used in machine learning as a service systems. To address this gap, this study investigates the use of privacy enhancing technologies (PETs) as a defense mechanism against attribute inference attacks on explanations generated by feature-based XAI methods. We empirically evaluate three types of PETs, i.e., synthetic training data, differentially private training and noise addition, across two categories of feature-based XAI. Our findings reveal varying levels of effectiveness among the mitigation strategies, as well as trade-offs between privacy, utility and system performance. In the best scenario, integrating PETs into the explanation process reduced attack success by 49.47% while preserving model utility and explanation quality. Based on our evaluation, we propose strategies for effectively integrating PETs into XAI to maximize privacy protection and minimize the risk of sensitive information leakage.
Publication date	2025-12-24
Publisher	Elsevier B.V.
Copyright statement	© 2025 The Authors
Licence	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) https://creativecommons.org/licenses/by-nc-nd/4.0/
In	Knowledge-Based Systems 335, no. C, 115235 (24 December 2025): 1–23.
Related data	Source code for reproducing the experiments in "Towards Integration of Privacy Enhancing Technologies in Explainable AI published in Elsevier Knowledge Based Systems". https://github.com/dmpglab/ppxai.
Language	English
Peer reviewed	Yes
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	1c6a9bab-52b6-40b9-b595-cec94964241c
Record created	2026-01-22
Record modified	2026-04-17

Page details

From:

National Research Council Canada

Date modified:: 2026-04-20