Abstract | In federated learning, standard machine learning (ML) techniques are modified so they can be applied to data held by separate participants without the need for exchanging said data and while preserving privacy. Other data modelling techniques, such as singular value decomposition, have been similarly federated, enabling federated principal component analysis (PCA), which is a popular preprocessing step for ML tasks. Supervised PCA improves on standard PCA by using labeled data to retain more relevant information for supervised ML problems. However, a federated version of supervised PCA does not exist in the literature. In this paper, we propose a federated version of supervised PCA and its dual and kernel variations, called FeS-PCA, dual FeS-PCA, and FeSK-PCA, respectively. We used random orthogonal matrix masking to keep FeS-PCA and dual FeS-PCA private, while FeSK-PCA was kept private using an approximation of the standard approach. We tested our proposed approaches by recreating visualization, classification, and regression experiments from the original unfederated supervised PCA paper. We further added a real-world federated dataset to test the scalability and fidelity of our approach. Our analysis and results indicate that FeS-PCA and dual FeS-PCA are faithful, lossless, and private versions of their unfederated counterparts. Furthermore, despite being an approximation, FeSK-PCA achieves nearly identical performance to standard kernel SPCA in many cases. This is in addition to the added benefit of a reduced runtime and smaller memory footprint. |
---|