Controlled and ID-Aware Data Generation in Face Recognition
dataset
posted on 2025-07-01, 16:34authored byHaiyu Wu
While face recognition techniques have achieved remarkable performance in real- world applications, important issues still need to be addressed. Gender and race bias, as well as identity privacy problems, are among the top concerns due to their significant societal impact. Gender and race bias result in unequal accuracy between genders and across races. The identity privacy problem is related to the collection of training sets, as these sets are typically gathered without obtaining permission from the individuals represented in the dataset.
Our previous work has shown that facial attributes, such as facial hair, hairstyle, and face exposure, can significantly affect face recognition performance. We demon- strate that bias can be largely mitigated by balancing the distribution of these at- tributes in both the training set and the test set. The privacy problem has been exacerbated by government regulations (e.g., the General Data Privacy Regulation, or GDPR), which protect identity privacy but also hinder the development of more powerful face recognition techniques.
To address these problems, this proposed research aims to design a controlled face image generation model that can create images of non-existent identities to form a synthetic training set while controlling attribute distributions. After this, we notice that only pose and age variations are included in the test sets, which is insufficient to measure the intra-class variation of the generated training sets. To this end, we propose three test sets that focus on additional two attribute variations and identical twins. Lastly, we unlock the attribute control of the proposed model and conduct a comprehensive analysis to reveal the weaknesses of the existing synthetic face recognition datasets and provide insights for future work in this area.