Abstract
Generative Adversarial Networks (GANs) have demonstrated significant potential in generating synthetic data for various applications, including those involving sensitive information like healthcare and finance. However, two major issues arise when GANs are applied to sensitive datasets: (i) the model may memorize training samples, compromising the privacy of individuals, especially when the data includes personally identifiable information (PII), and (ii) there is a lack of control over the specificity of the generated samples, which limits their utility for tailored use-cases. To address these challenges, we propose a novel framework that integrates differential privacy with latent representation learning to ensure privacy while providing control over the specificity of generated data. Our approach ensures that the synthetic data does not reveal individual data points, and by learning effective latent codes, it allows for the generation of specific and meaningful data. We evaluate our method using the MNIST dataset, showing that it preserves privacy and demonstrates a privacy-utility trade-off, where increased privacy leads to decreased classification accuracy. Additionally, we highlight the computational challenges, as the training process incurs a tenfold increase in time compared to standard GAN models. Finally, we extend our approach to the CelebA dataset, demonstrating how privacy and specificity can be controlled to generate high-quality, private synthetic data.