| Abstract | Explainable artificial intelligence (XAI) plays a crucial role in mitigating the risks associated with the non-transparency of black-box artificial intelligence (AI) systems. However, despite its advantages, XAI methods have been shown to expose the privacy of individuals whose data are used to train or query the underlying models. Prior research has demonstrated privacy attacks that exploit explanations to infer sensitive personal information of individuals. At present, there is a lack of effective defenses against such privacy attacks targeting explanations, particularly when vulnerable XAI techniques are deployed in production environments or used in machine learning as a service systems. To address this gap, this study investigates the use of privacy enhancing technologies (PETs) as a defense mechanism against attribute inference attacks on explanations generated by feature-based XAI methods. We empirically evaluate three types of PETs, i.e., synthetic training data, differentially private training and noise addition, across two categories of feature-based XAI. Our findings reveal varying levels of effectiveness among the mitigation strategies, as well as trade-offs between privacy, utility and system performance. In the best scenario, integrating PETs into the explanation process reduced attack success by 49.47% while preserving model utility and explanation quality. Based on our evaluation, we propose strategies for effectively integrating PETs into XAI to maximize privacy protection and minimize the risk of sensitive information leakage. |
|---|