Abstract
Generative Adversarial Networks (GANs) have demonstrated significant potential in
generating synthetic data for various applications, including those involving sensitive information
like healthcare and finance. However, two major issues arise when GANs are applied to sensitive
datasets: (i) the model may memorize training samples, compromising the privacy of individuals,
especially when the data includes personally identifiable information (PII), and (ii) there is a lack of
control over the specificity of the generated samples, which limits their utility for tailored usecases. To address these challenges, we propose a novel framework that integrates differential
privacy with latent representation learning to ensure privacy while providing control over the
specificity of generated data. Our approach ensures that the synthetic data does not reveal
individual data points, and by learning effective latent codes, it allows for the generation of specific
and meaningful data. We evaluate our method using the MNIST dataset, showing that it preserves
privacy and demonstrates a privacy-utility trade-off, where increased privacy leads to decreased
classification accuracy. Additionally, we highlight the computational challenges, as the training
process incurs a tenfold increase in time compared to standard GAN models. Finally, we extend our
approach to the CelebA dataset, demonstrating how privacy and specificity can be controlled to
generate high-quality, private synthetic data.