Original Image
Corrected Image
Abstract
Gaze correction aims to redirect the person’s gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem. Gaze correction has a wide range of applications in real life, such as, the eye contract of remote users in video conference systems. In this paper, we propose a novel method which is based on the inpainting model to learn from the face image to filling-in eye regions with new contents representing corrected eye gaze. Moreover, our model does not require the training dataset labelled with the specific head pose and eye angle information; thus, the training data is easy to collect. To retain the identity information of the eye region in the original input, we propose a self-guided pretrained model to learn the angle-invariance feature. Experiments show our model achieves very compelling gaze-correction results in the wild dataset which is collected from the website and will be introduced in details.
Paper
arxiv 1906.00805 , 2019
Citation
Jichao Zhang∗, Meng Sun∗, Jingjing Chen∗, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe
(* indicates equal contributions)Bibtex
Code: Python
Network Architecture
More Details of the Network Architecture
Supplementary materials are shown below. The network architectures of GazeGAN are shown in Table 1 ,Table 2 and Table 3. The Self-Guided network, which is employed to preserve the identity information , takes as an input the local image rescaled to 128×128 pixels. Note that both local images share variables in Self-Guided network. For the completion network, we use an encoder-decoder architecture, which will add the angle-invariance feature learned by Self-Guided network. And these features are used in discriminator as additional information when determining if the generated image is real or fake.
Here are some notations should be noted. h: the height of input images. w: the width of input images. C:the number of output channels. K:the size of kernel. S:the size of stride. P: the padding method. IN:instance normalization. FC:Fully-Connected layers. SN:use spectral normalization
Table 1: Guided architecture
Table 2: Generator architecture
Table 3: Discriminator architecture
Dataset Introduction——NewGaze Dateset
To evaluate the proposed method and the overall framework, we have investigated the benchmark datasets. However, none of them meet our task for eye correction in the wild. Thus, we collected a new dataset called NewGaze dataset. NewGaze consists of a set of unpaired data, 40000 images.
Note that the unpaired data is not labelled with the specific eye angle and head pose information, thus, is very easy to be collected.
The unpaired data, which is collected from the CelebA-ID and the website,consists of two domains.
Domain X: 35000 face images with eyes staring at the camera;
Domain Y : 5000 face images with eyes not staring at the camera.
We crop all images (256×256) with face detection algorithm and compute the eye mask region by using facial landmarks detection algorithm. As described above,we use all data of domain X to train our model, while all data in domain Y are just as the test dataset.
Example of Domain X
Example of Domain Y
More Results
Notes that, all of our results are without any postprocessing argorithm.
Head pitch rotations greater than large degrees
Head yaw rotations greater than large degrees
More Results in GIF