Vol. 1 No. 2 41 49 (July 2008) 3 1 1 3 2 1 1 3 Person-independent Monocular Tracking of Face and Facial Actions Yusuke Sugano 1 and Yoichi Sato 1 This paper presents a monocular method of tracking faces and facial actions using a multilinear face model that treats interpersonal and intrapersonal shape variations separately. We created this method by integrating two different frameworks: particle filter-based tracking for time-dependent facial action and pose estimation and incremental bundle adjustment for person-dependent shape estimation. This combination together with multilinear face models is the key to tracking faces and facial actions of arbitrary people in real time with no pre-learned individual face models. Experiments using real video sequences demonstrate the effectiveness of our method. 1 Institute of Industrial Science, The University of Tokyo 1. 3 HCI ITS 3 5),8),9),17),18) 5),8),9),17) 1 3 Bregler 1) Tomasi-Kanade 2 3 18) 3 6),16) 2 AAM Active Appearance Model Gross 6) 41 c 2008 Information Processing Society of Japan
42 3 3 Zhu 16) 3 3 3 1 2 2 Dornaika 4) 3D Vlasic 13) DeCarlo 2) 1 2 1 Fig. 1 System overview. 3 1 Estimation step 3 2 3 2 Modeling step 2 3 2
43 3 2 3 4 5 6 2. K 3 M R 3K T K =10 2 2 M N-mode SVD Singular Value Decomposition Vasilescu 11),12) N-mode SVD Vasilescu 3 T R 3K S A Feature points M Action Shape Shape A Action S N-mode SVD T T = C feature U feature shape U shape action U action (1) = M shape U shape action U action (2) i i i (2) SVD 3 2 Fig. 2 Example of facial deformation. 3 Fig. 3 Data tensor. U i i C R 3K S A SVD U feature C M T M shape Ǔ shape action Ǔ action (3) M
44 3 18) S K 2 1 1 5 A =10 S A M T (3) A A S S s R S a R A M M M = M + M shape s T action a T (4) (3) Ǔ shape =(š 1,...,š S ) T Ǔ action =(ǎ 1,...,ǎ A ) T S A š 1 š S s σ s ǎ 1 ǎ A ā σ a s a p 3. 1 1 2 3 14) 10) 3) 4 Fig. 4 Flow of incremental bundle adjustment. 2 1 1 3.1 i 6 p i a i s a i s (4) M i m i = P(p i, M i (a i, s)) (5) P M i p i M i m i K 2 2K i 2 ˆm i 4 F t = ˆm i m i (p i, a i, s) 2 (6) i f t f t 4 t n (6) f t
45 3 1 t f t 4 t p t a t Zhang 15) n 3.2 (6) C pi C ai C s LM Levenberg-Marquardt 7) min F t, p i C pi, a i C ai, s C s (7) {p i },{a i },s ˆp C pi = {p i ˆp i λ p p i ˆp i + λ p } (8) λ p C ai 2 C s = {s s 2σ s s s +2σ s } (9) 95% s s (t) s (t) = t 1 s (t 1) + 1 t t s (10) 4. 1 1 2 3 3 1 Pose estimation step Feature-point recalculation step 4.1 p t a t s (10) s (t 1) (4) a t M t = M + M t a t (M t = M shape s T (t 1)) (11) (6 + A ) x t =(p T t, a T t ) T {(u (i) t ; π (i) t )}(i =1...N) (6 + A ) N u (i) t π (i) t t 1 {(u (i) t 1 ; π(i) t 1 )}
46 3 N u (i) t = u t 1 + τv t 1 + ω (12) u t 1 {(u (i) t 1 ; π(i) t 1 )} τ v t 1 t 1 x a t v t 1 a t 0 ω 0 18) κσ a κ 0.2 u (i) t π (i) t ( ( ) 2 K N(u (i) ) ( t ) A π (i) t exp 2σ 2 exp 1 2 b=1 ( ) a (i) t,b ā 2 ) b ς b N (u (i) t ) T K K K 1 σ 1.0 2 a (i) t a (i) t,b ā b ς b a (i) t ā σ a b 1 π (i) t {(u (i) t ; π (i) t )} x t x 0 OKAO K n (6) 5 (13) 5 Fig. 5 Finding true feature points. ā s 4.2 2 4.1 5 m t 2 ˆm t (6) Gokturk 5) E t E t d ˆm ˆm t } E t = {ρ Î t Î t 1 2 + Î t Î 1 2 + ɛ ˆm t m t 2 (14) ROI 1 Î t R K ˆm t Î t k ˆm t k 2 ρ 4 16 16 2 m t 2 4.1 x t (5) 2 m t ˆm t ɛ 4000 5. 18)
47 3 1 Table 1 Comparison of estimation errors. [mm] x y z [deg.] roll pitch yaw Particle filter-based estimation using the generic PCA model Mean 6.14 4.71 51.32 Mean 0.34 6.54 3.34 Std. Dev. 4.88 4.09 38.29 Std. Dev. 0.29 4.71 2.73 Our method using the multilinear model Mean 3.26 4.37 20.18 Mean 0.41 3.12 2.33 Std. Dev. 2.62 2.83 11.18 Std. Dev. 0.27 2.49 1.98 PCA 20 26 Intel Core 2 Duo E6700 PC 1 3.0 GB OS Windows XP 2 IEEE1394 60 1800 640 480 T 16 16 1000 n =7 LM 10 5 90 [ms] 32 [ms/frame] 6 2 6 2 Fig. 6 Result images: the right column shows actual estimation results of our method using the multilinear model, and the center column shows results of the generic model-based method. The left column shows these results rendered from a different viewpoint. PCA 16) 26 10 15 5 7 1 6.
48 3 1 2 3 7 Fig. 7 x y z roll z yaw y pitch x Estimation results: x, y, andz are the horizontal, vertical, and depth-directional translation, and roll, pitch, andyaw are the rotation around the z, y, andx axes, respectively. The bottom graph shows the facial shape estimation error in the model coordinate system. 1) Bregler, C., Hertzmann, A. and Biermann, H.: Recovering non-rigid 3d shape from image streams, Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Vol.2, pp.690 696 (2000). 2)DeCarlo,D.and Metaxas,D.:Adjusting Shape Parameters using Model-based Optical Flow Residuals, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.24, No.6, pp.814 823 (2002). 3) Del Bue, A., Smeraldi, F., Agapito, L. and Mary, Q.: Non-rigid structure from motion using non-parametric tracking and non-linear optimization, Proc. IEEE Workshop on Articulated and Non-Rigid Motion, Vol.1 (2004). 4) Dornaika, F. and Davoine, F.: On appearance based face and facial action tracking, IEEE Trans. Circuits and Systems for Video Technology, Vol.16, No.9, pp.1107 1124 (2006). 5) Gokturk, S.B., Bouguet, J.Y. and Grzeszczuk, R.: A data-driven model for monocular face tracking, Proc. IEEE Int. Conf. Computer Vision, Vol.2, pp.701 708 (2001). 6) Gross, R., Matthews, I. and Baker, S.: Generic vs. person specific active appearance models, Image and Vision Computing, Vol.23, No.11, pp.1080 1093 (2005).
49 3 7) Lourakis, M.I.A.: levmar: Levenberg-Marquardt nonlinear least squares algorithms in C/C++ (2004). http://www.ics.forth.gr/ lourakis/levmar/ 8) Matthews, I. and Baker, S.: Active appearance models revisited, Int. J. Computer Vision, Vol.60, No.2, pp.135 164 (2004). 9) Munoz, E., Buenaposada, J.M. and Baumela, L.: Efficient model-based 3D tracking of deformable objects, Proc. IEEE Int. Conf. Computer Vision, pp.877 882 (2005). 10) Vacchetti, L., Lepetit, V. and Fua, P.: Stable real-time 3D tracking using online and offline information, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.26, No.10, pp.1380 1384 (2004). 11) Vasilescu, M.A.O. and Terzopoulos, D.: Multilinear analysis of image ensembles: Tensorfaces, Proc. European Conf. on Computer Vision, pp.447 460 (2002). 12) Vasilescu, M.A.O. and Terzopoulos, D.: Multilinear image analysis for facial recognition, Proc. Int. Conf. Pattern Recognition (ICPR 02 ), Vol.2, pp.511 514 (2002). 13) Vlasic, D., Brand, M., Pfister, H. and Popovic, J.: Face transfer with multilinear models, ACM Trans. Graphics (Proc. ACM SIGGRAPH 2005 ), Vol.24, No.3, pp.426 433 (2005). 14) Xin, L., Wang, Q., Tao, J., Tang, X., Tan, T. and Shum, H.: Automatic 3D face modeling from video, Proc. IEEE Int. Conf. Computer Vision, Vol.2, pp.1193 1199 (2005). 15) Zhang, Z. and Shan, Y.: Incremental motion estimation through modified bundle adjustment, Proc. IEEE Int. Conf. Image Processing, Vol.2, pp.343 346 (2003). 16) Zhu, J., Hoi, S.C.H. and Lyu, M.R.: Real-time non-rigid shape recovery via active appearance models for augmented reality, Proc. 9th European Conf. Computer Vision, pp.186 197 (2006). 17) Zhu, Z. and Ji, Q.: Robust Real-Time Face Pose and Facial Expression Recovery, Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp.681 688 (2006). 18) Vol.47, No.SIG10 (CVIM15), pp.185 194 (2006). ( 19 9 21 ) ( 20 3 10 ) 2005 2007 1990 1997 Ph.D. in Robotics. 2008 2006 1999 1999 ACM IEEE