Chongqing Institute of Green and Intelligent Technology , Chinese Academy of Sciences
Research Progress

Progress in Homomorphic Encryption-Based Logistic Regression Training at CIGIT

30, 2024

In the context of the rapid development of artificial intelligence technologies, the volume of data generation has sharply increased, accompanied by significant risks of privacy breaches. The advancement of quantum computing poses challenges to traditional cryptographic frameworks, and regions such as the European Union and the United States are actively promoting legislation for personal data privacy protection. In response to these challenges, technologies such as Secure Multi-Party Computation (MPC), Homomorphic Encryption (HE), and Differential Privacy have become effective tools for safeguarding user privacy.

Logistic regression, being a relatively simple algorithm, features a straightforward computational process and holds practical value in real-world applications. Most existing methods rely on first-order gradient descent algorithms, which lead to increased iteration counts. In the MPC setting, this increases communication overhead, while in the HE context, it elevates computational costs. Moreover, some methods rely on trusted third parties, which may raise concerns about privacy breaches, while others based on homomorphic encryption involve only a single participant. Additionally, most methods focus on binary classification, with limited discussion of multiclass problems.

Recently, the Center for Automated Reasoning and Cognition at our institute published a paper titled "Privacy-preserving Logistic Regression Model Training Scheme by Homomorphic Encryption" at ICICS 2024. This paper presents a novel privacy-preserving logistic regression solution that leverages data from both parties in a horizontally distributed scenario to achieve interactive computation between two users. The research employs Newton's method to solve the logistic regression problem, minimizing the number of iterations and reducing the communication overhead caused by interactions. Conjugate gradient methods are used to solve the Newton update direction, avoiding division operations needed for matrix inversion in the ciphertext domain. Furthermore, this solution can efficiently extend binary classification problems to multiclass settings.

 

Fig.1 Comparison of the communication  (different data dimensions)

Fig.2  Comparison of communication (Feature dimension = 9, different number of iterations)

(a) Fixed outer loop (b) Fixed inner loop

Fig.3   Comparison of communication (Feature dimension = 90, different number of iterations)

(a) Fixed outer loop (b) Fixed inner loop

This research provides a new approach for efficiently training privacy-preserving logistic regression models, improving the efficiency and practicality of model training while protecting user privacy. Master’s student Weijie Miao from Chongqing Research Institute is the first author of the paper, and Researcher Wenyuan Wu is the corresponding author. This work was supported by key projects from the Ministry of Science and Technology and the CAS Western Young Scholar Program. 

[Paper link](http://icics2024.aegean.gr/wp-content/uploads/2024/08/150560255.pdf)