Vol. 47 No. SIG 10(CVIM 15) July 2006 SVD Singular Value Decomposition N SVD N SVD PCA Principal Component Analysis Gaze Estimation from Low Resolution Images Insensitive to Segmentation Error Yasuhiro Ono, Takahiro Okabe and Yoichi Sato We propose an appearance-based method for estimating gaze directions from low resolution images. In estimation of gaze directions from low resolution images, there exist inevitable errors in segmentation of eye regions. To improve the accuracy of gaze estimation, two key ideas are introduced in our method: using a set of training images of eye regions with artificially added segmentation error, and using N-mode SVD (Singular Value Decomposition) in order to separate image variation due to gaze directions from that due to segmentation errors. By using N-mode SVD, the feature vectors of the gaze direction can be extracted. In this paper, we describe the details of our proposed method and report experimental results demonstrating the advantage of our method over the conventional PCA (Principal Component Analysis)-based method and the subspace method in which a subspace is constructed for each class. 1. Institue of Industrial Science, The University of Tokyo 2 2),5),10),22) 6) 8),20) 3 173
174 July 2006 1),12),21) 14) 3 9) N SVD Singular Value Decomposition 15) 18) SVD 2 SVD 2 N SVD Vasilescu N SVD 16) N SVD 1 1 2 PCA Principal Component Analysis 2 N SVD 3 4 2. 1 (1- a) (1-b) N SVD 15) (1-c)
Vol. 47 No. SIG 10(CVIM 15) 175 i(1 i I) j(1 j J) k(1 k K) (2) 3 D SVD D D D D ijk 1 Fig. 1 Flowchart of our proposed method. (2-b) 3 (2-c) N SVD N SVD 2.1 N SVD N SVD 15) N SVD N 3 SVD 3 (1) 3 Vasilescu 3 D 3 D ijk k F k R I J D D gaze R I KJ D gaze =[F 1 F 2...F K ] (1) D A.1 (3) D gaze SVD D gaze = U gaze Σ gaze V gaze (2) U gaze R I I U cut R J J U pixel R K K A.1 U gaze U cut i 1 i I a i R I j 1 j J b j R J [a 1, a 2,..., a I ] def = Ugaze, [b 1, b 2,..., b J ] def = Ucut. (3) N SVD N 1 N D D gaze R I 1 L I 1 L, L def = I 2 I 3 I N SVD 2LI 2 1 +4I 3 1 4) I n 1 n N n I n n I n
176 July 2006 (4) U gaze U cut U pixel (U gaze ) il (U cut ) jm (U pixel ) kn 3 D ijk I J K D ijk = Z lmn (U gaze ) il l=1 m=1 n=1 (U cut ) jm (U pixel ) kn (4) Z R I J K D = Z 1 U gaze 2 U cut 3 U pixel (5) 1 U gaze 1 (4) 1 U gaze il Z Z lmn 1 l Z Z = D 1 U gaze 2 U cut 3 U pixel (6) a b d (7) 3 B (8) K def B ijk = Z ijl (U gaze ) kl (7) l=1 I J I J d k = B ijk a i b j = B k(ij) a i b j. (8) i=1 j=1 i=1 j=1 B k(ij) B B k(ij) B pixel (k, I (j 1) + i) B pixel B 2.2 (1) (2) (3) B k(ij) PC 3 D D D gaze 2.1 D gaze SVD U gaze (2) (3) a i 1 i I (19) U cut (3) b j 1 j J B k(ij) (7) B k(ij) 1 i I 1 j J 1 k K 2.3 B k(ij) (1) (2) (3) N SVD 2 1 Vasilescu 16) P V L E (P V L E) (P V L E) 2 Vasilescu 19)
Vol. 47 No. SIG 10(CVIM 15) 177 1 16) 2 19) K I J K IJ K =48 I =20 J =25 3 SVD a b (8) ˆd f(a, b ) (9) ( K I J ( ) ) 2 f(a, b ) def = ˆd k Bk(ij) a ib j. k=1 i=1 j=1 (9) (10) (â, ˆb) (â, ˆb) =arg min f(a, b ). (10) a R I,b R J (10) b a a b 3),13) K IJ Vasilescu 2 19) K (I + J) K Vasilescu 2 19) I + J (9) (10)1 f 1 f/ a i =0 1 i I f/ b j =0 1 j J (9) J a = M + def ˆd, (M)ki = B k(ij) b j (11) b = N + ˆd, def (N)kj = j=1 I B k(ij) a i. (12) i=1 M + + b b (0) (11) (12) 2 f(n) def = f(a (n), b (n) ) f(a (n 1), b (n 1) ) n a (n) b (n) L 2 a â (13) b b (0) (a (0), b (0) )=arg min a {a 1,a 2,...,a I },b {b 1,b 2,...,b J } ( K I J ( ) ) 2 ˆd k Bk(ij) a ib j. k=1 i=1 j=1 (13) i (1) = arg min i {1,2,...,I} â a i 2. (14) a i(1) i(1) 2 3 2 3 i(2) i(3) 2 3 a i(1) a i(2) a i(3) 3
178 July 2006 3 3 ɛ = â w p a i(p) 2 (15) p=1 3 w p=1 p =1 0 w p 1 p =1, 2, 3 g 3 g = w p g(p) (16) p=1 g(p) 1 p 3 3 a i(p) 2 2.4 2 1 1 2 (8) 3.3 11 3. 3.1 Oka 11) 3.2 3 IEEE1394 Point Grey Research Flea 3 1 5 5 PC OS: Windows XP CPU: Intel Pentium4 3.0 GHz 1,280 1,024 18 50 cm
Vol. 47 No. SIG 10(CVIM 15) 179 Fig. 4 4 Candidates of eye corners. 2 Fig. 2 Positions where crosshairs are displayed to grab training images (left) and test images (right). Fig. 5 5 A schematic diagram of segmented eye images. 3 Fig. 3 An example of grabbed face images. 3.2 5 (1) (2) (3) 1 2 1 20 2 32 144 144 3 1 144 144 144 144 72 72 36 36 Oka 11) ±1 4 1 36 36 72 72 144 144 1 2 4 2 2 25 1 2 5 3 3 4 144 144 72 72 36 36 48 16 24 8 12 4
180 July 2006 6 Fig. 6 Examples of training images for different gaze points. 8 Fig. 8 Gaze estimation error for different image resolutions. 7 Fig. 7 Examples of training images for different segmentations. 20 20 25 20 25 = 500 32 32 25 = 800 6 12 4=48 6 x y 6 7 7 2.2 a i 1 i I b j 1 j J B k(ij) 1 i I 1 j J 1 k K 2.3 â 3.3 3 SVD 3 SVD 9 Fig. 9 Gaze estimation error against the dimension of gaze coefficients. PCA A.2 A.3 3 SVD PCA 8 12 4=4824 8 = 192 48 16 = 768 3 5 8 3 SVD PCA 3.2 4 3 SVD PCA 9
Vol. 47 No. SIG 10(CVIM 15) 181 10 PCA Fig. 10 Gaze estimation error for each segmentation, estimated by our method and the PCA-based method. 5 12 4 9 3 11 3 SVD PCA 3 SVD PCA 10 5 10 3.2 12 4 10 3 SVD PCA 3 SVD PCA 11 11 3 SVD PCA A B C D E 12 3 SVD PCA 11 Fig. 11 Gaze estimation error for each segmentation, estimated by our method and the subspace method. 12 Fig. 12 Gaze estimation error for each individual (left), gaze estimation error averaged by all individuals (right). 12 12 4 12 3 SVD PCA 4. N SVD N SVD PCA 1
182 July 2006 C 2 13224051 1) Baluja, S. and Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks, CMU CS Technical Report, CMU-CS-94-102 (1994). 2) Beymer, D. and Flickner, M.: Eye Gaze Tracking Using an Active Stereo Head, Proc. IEEE CVPR 2003, pp.ii 451 458 (2003). 3) Buchanan, A. and Fitzgibbon, A.: Damped Newton Algorithms for Matrix Factorization with Missing Data, Proc. IEEE CVPR 2005, pp.316 322 (2005). 4) Chan, T.: An Improved Algorithm for Computing the Singular Value Decomposition, ACM Trans. MS, Vol.8, No.1, pp.72 83 (1982). 5) Hutchinson, T., White, Jr. K., Martin, W., Reichert, K. and Frey, L.: Human- Computer Interaction Using Eye-Gaze Input, IEEE Trans. SMAC, Vol.19, No.6, pp.1527 1534 (1989). 6) Ishikawa, T., Baker, S., Matthews, I. and Kanade, T.: Passive Driver Gaze Tracking with Active Appearance Models, Proc. WCITS 2004 (2004). 7) ConDensation CVIM 2005-150-3, pp.17 24 (2005). 8) Matsumoto, Y. and Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement, Proc.IEEE FG 2000, pp.499 504 (2000). 9) Vol.J64-D, No.3, pp.276 283 (1981). 10) Ohno, T. and Mukawa, N.: A Free-head, Simple Calibration, Gaze Tracking System That Enables Gaze-Based Interaction, Proc. ACM ETRA 2004, pp.115 122 (2004). 11) Oka, K., Sato, Y., Nakanishi, Y. and Koike, H.: Head pose estimation system based on particle filtering with adaptive diffusion control, Proc. IAPR MVA 2005, pp.586 589 (2005). 12) Stiefelhagen, R., Yang, J. and Waibel, A.: Tracking Eyes and Monitoring Eye Gaze, Proc. WPUI, pp.98 100 (1997). 13) Shum, H., Ikeuchi, K. and Reddy, R.: Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling, IEEE Trans.PAMI, Vol.17, No.9, pp.854 867 (1995). 14) Tan, K., Kriegman, D. and Ahuja, N.: Appearance-based Eye Gaze Estimation, Proc. IEEE WACV, pp.191 195 (2002). 15) Vasilescu, M.A.O. and Terzopoulos, D.: Multilinear Analysis of Image Ensembles: Tensor- Faces, Proc. ECCV 2002, pp.447 460 (2002). 16) Vasilescu, M.A.O. and Terzopoulos, D.: Multilinear Image Analysis for Facial Recognition, Proc. IAPR ICPR 2002, pp.ii-20511 20514 (2002). 17) Vasilescu, M.A.O.: Human Motion Signatures: Analysis, Synthesis, Recognition, Proc. IAPR ICPR 2002, pp.iii-30456 30460 (2002). 18) Vasilescu, M.A.O. and Terzopoulos, D.: TensorTextures: Multilinear Image-Based Rendering, Proc. ACM SIGGRAGH 2004, Vol.23, No.3, pp.336 342 (2004). 19) Vasilescu, M.A.O. and Terzopoulos, D.: TensorTextures: Multilinear Independent Components Analysis, Proc. IEEE CVPR 2005, Vol.1, pp.547 553 (2005). 20) Wang, J., Sung, E. and Venkteswarlu, R.: Eye gaze Estimation from a Single Image of One Eye, Proc.IEEE ICCV 2003, pp.i-136 143 (2003). 21) Xu, L., Machin, D. and Sheppard, P.: A Novel Approach to Real-time Non-intrusive Gaze Finding, British Machine Vision Conference, pp.428 437 (1998). 22) Yoo, D. and Chung, M.: Non-intrusive Eye Gaze Estimation without Knowledge of Eye Pose, Proc. IEEE FG 2004, pp.785 790 (2004). A.1 D D ijk 1 i I 1 j J 1 k Ki G i R J K D ijk j H j R K I G i H j (G i ) jk (H j ) ki 3 D D cut R J IK
Vol. 47 No. SIG 10(CVIM 15) 183 D cut =[G 1 G 2...G I ] (17) D pixel R K JI D pixel =[H 1 H 2...H J ] (18) D cut SVD D cut = U cut Σ cut V cut (19) U cut R J J D pixel SVD D pixel = U pixel Σ pixel V pixel (20) U pixel R K K A.2 PCA PCA i j d ij R K 1 i I 1 j J D D =[d 11, d 21,..., d I1, d 12, d 22,..., d I2,..., d 1J, d 2J,..., d IJ ] (21) SVD D = U ΣV U d ij c ij c ij = U d ij i ( J ) c i = c j=1 ij /J d U c c {c 1, c 2,..., c I } 3 2.3 A.3 PCA i j d ij R K 1 i I 1 j J D j R K I D j =[d 1j, d 2j,..., d Ij ] (22) D j SVD j L j U j R K I K >I [ ] D j = U j Σ j Vj, U j = u (1) j, u (2) j,..., u (I) j (23) K K I <K L j P j R K K P j = U ju j, U j [ ] def = u (1) j, u (2) j,..., u (K ) j (24) d ij j P j a ij = P j d ij d L j S j (d) = d P j d ĵ Lĵ ĵ Lĵ Lĵ Pĵ â = Pĵd Lĵ â 3 2.3 ( 17 9 20 ) ( 18 3 20 )
184 July 2006 1998 NEC 2003 10 1997 1999 2000 2001 2005 MIRU2004 MIRU2005 2004 PRMU IEEE 1997 Ph.D. in Robotics MIRU2000 MIRU MIRU2004 MIRU2005 1999 1999 2005 ACM IEEE