Home | GazeGAN

Original Image

Corrected Image

Abstract

Gaze correction aims to redirect the person’s gaze into the camera by manipulating the eye region, and it can be considered as a speciﬁc image resynthesis problem. Gaze correction has a wide range of applications in real life, such as, the eye contract of remote users in video conference systems. In this paper, we propose a novel method which is based on the inpainting model to learn from the face image to ﬁlling-in eye regions with new contents representing corrected eye gaze. Moreover, our model does not require the training dataset labelled with the speciﬁc head pose and eye angle information; thus, the training data is easy to collect. To retain the identity information of the eye region in the original input, we propose a self-guided pretrained model to learn the angle-invariance feature. Experiments show our model achieves very compelling gaze-correction results in the wild dataset which is collected from the website and will be introduced in details.

Paper

arxiv 1906.00805 , 2019

Citation

Jichao Zhang∗, Meng Sun∗, Jingjing Chen∗, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe

(* indicates equal contributions)Bibtex

Code: Python

to About

to Services

to Work

Network Architecture

More Details of the Network Architecture

Supplementary materials are shown below. The network architectures of GazeGAN are shown in Table 1 ,Table 2 and Table 3. The Self-Guided network, which is employed to preserve the identity information , takes as an input the local image rescaled to 128×128 pixels. Note that both local images share variables in Self-Guided network. For the completion network, we use an encoder-decoder architecture, which will add the angle-invariance feature learned by Self-Guided network. And these features are used in discriminator as additional information when determining if the generated image is real or fake.

Here are some notations should be noted. h: the height of input images. w: the width of input images. C:the number of output channels. K:the size of kernel. S:the size of stride. P: the padding method. IN:instance normalization. FC:Fully-Connected layers. SN:use spectral normalization

Table 1: Guided architecture

$VQ0IN{H_TMRM5S({TZ)LTUL.png$

Table 2: Generator architecture

Table 3: Discriminator architecture

Dataset Introduction——NewGaze Dateset

To evaluate the proposed method and the overall framework, we have investigated the benchmark datasets. However, none of them meet our task for eye correction in the wild. Thus, we collected a new dataset called NewGaze dataset. NewGaze consists of a set of unpaired data, 40000 images.

Note that the unpaired data is not labelled with the speciﬁc eye angle and head pose information, thus, is very easy to be collected.