1 1 2 TOF 2 (D-HOG HOG) Recall D-HOG 0.07 HOG 0.16 Pose Estimation by Regression Analysis with Depth Information Yoshiki Agata 1 and Hironobu Fujiyoshi 1 A method for estimating the pose of a human from depth image by using regression analysis is proposed. With conventional pose estimation methods that use appearance features, it is sometimes difficult to obtain correct results because only two-dimensional information is used. The proposed method uses depth information acquired from a TOF camera to achieve highly accurate pose estimation. For effective use of the depth information, we propose the Depth Difference Feature(DDF). Because the DDF is calculated as a difference is the average distance of two regions, it can be used to distinguish the body from occluding objects and the background behind the body. A comparison of accuracy with the results obtained by the conventional method using appearance features (D-HOG and HOG features) confirmed that the mean recall for the proposed method was 0.07 better than D-HOG and 0.16 better than HOG. 1. CG 1) HOG 2)3)4) 5)6) 5)7)8) Jamie 5) 50 3 TOF 1 2 2 2 3 3 4 1 Chubu University 1 c 2011 Information Processing Society of Japan
2 Fig. 2 3 3D human model. 1 Fig. 1 The flow of proposed method. 2. 2.1 1)9)10) 2.2 3)4)7) 5)11)12)13)14) 2.3 3 3. TOF 1 3 TOF 3 2 c 2011 Information Processing Society of Japan
4 Fig. 4 Division of block. 3 Fig. 3 Examples of training sample. 3.1 3 3 2 3 19 1 (x, y, z) 3 1 19 3 = 57 1m 4m 3 57 3 3 3.2 TOF TOF LED TOF MESA SR-4000 SR-4000 0.3m 5.0m TOF 2 TOF (176 144[pixel]) 15) 64 128[pixel] (16 16[pixel]) ( 4 ) 16 16[pixel] M 32 D 5 2 (1) ( 1 N D(i, j) = N n=1 d i n ) ( 1 N N n=1 d j n ) N d i j D = {D(i, j)} i=1,2,...,m 1,j=2,3,...,M 6 (1) 3 c 2011 Information Processing Society of Japan
A = (X T X) 1 X T Y (3) A 3.4 TOF X A Y Y (4) 5 Fig. 5 Depth difference feature. Y = A X (4) 57 3 4. 4.1 Recall( ) 7 3 Fig. 6 6 Examples of difficult situation by conventional method. 6 3.3 496 x = (x 1, x 2,..., x 496 ) 57 y = (y 1, y 2,..., y 57 ) n 496 n X = (x 1, x 2,..., x n) T 57 n Y = (y 1, y 2,..., y n) T (2) A := arg min AX Y 2 (2) A (3) 496 57 A true positive total pixel (5) Recall Recall = true positive total pixel Recall 1 4.2 100 1000 100 1 1 100 1 10 (5) 4 c 2011 Information Processing Society of Japan
Fig. 7 7 Evaluation method. Fig. 8 8 Precision for number of training sample. 19 E (6) E = 1 N N n=1 (x n x n) 2 + (y n y n) 2 + (z n z n) 2 (6) 1 Recall Table 1 Average recall of each actions. DDF D-HOG HOG WAVE 0.76 0.67 0.63 WALK 0.70 0.60 0.53 RUN 0.59 0.57 0.41 0.68 0.61 0.52 N 19 (x, y, z ) (x, y, z) 8 4.3 (HOG D-HOG) (DDF) 900 60 HOG (HOG) HOG (D-HOG) (DDF) 3 9 10 11 12 13 14 Recall Recall HOG HOG 14 TOF Recall 1 DDF HOG 0.13 D-HOG 0.09 HOG 0.17 D-HOG 0.1 HOG 0.17 D-HOG 0.01 15 HOG D-HOG DDF 5 c 2011 Information Processing Society of Japan
情報処理学会研究報告 図 9 手を振る動作の姿勢推定例 Fig. 9 Examples of estimated pose for hand-waving. 図 11 走る動作の姿勢推定例 Fig. 11 Examples of estimated pose for runing. 図 12 手を振る動作における特徴量毎の精度比較 Fig. 12 Precision for hand-waving. 図 10 歩行動作の姿勢推定例 Fig. 10 Examples of estimated pose for walking. 6 c 2011 Information Processing Society of Japan
13 Fig. 13 Precision for walking. Fig. 15 15 Examples of estimated pose for each features. 5. Recall 14 Fig. 14 Precision for runing. HOG 0.07 HOG 0.16 7 c 2011 Information Processing Society of Japan
1) MIRU, pp.70 77 (2006). 2) HOG 3 MIRU, pp.960 965 (2008). 3) 3 MIRU, pp.589 594 (2010). 4) tree based filtering MIRU, pp.63 69 (2006). 5) Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A.: Real-Time Human Pose Recognition in Parts from Single Depth Images, CVPR (2011). 6) Luo, X., Berendsen, B., Tan, R.T. and Veltkamp, R.C.: Human Pose Estimation for Multiple Persons Based on Volume Reconstruction, ICPR (2010). 7) Baysal, S., Kurt, M.C. and Duygulu, P.: Recognizing Human Actions Using Key Poses, ICPR (2010). 8) Jiang, H.: 3D Human Pose Reconstruction Using Millions of Exemplars, ICPR (2010). 9) Deutscher, J., Blake, A. and Reid, I.: Articulated Body Motion Capture by Annealed Particle Filtering, CVPR, pp.126 133 (2000). 10) Ye, L., Zhang, Q. and Guan, L.: Use Hierarchical Genetic Particle Filter to Figure Articulated Human Tracking, ICME, pp.1561 1564 (2008). 11) Andriluka, M., Roth, S. and Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation, CVPR, pp.1014 1021 (2009). 12) Bissacco, A., Yang, M.H. and Soatto, S.: Fast human pose estimation using appearance and motion via multi-dimensional boosting regression, CVPR, pp.1 8 (2007). 13) Ferrari, V., Marin-Jimenez, M. and Zisserman, A.: Pose search: retrieving people using their pose, CVPR, pp.1 8 (2009). 14) Xia, X., Yang, W., Li, H. and Zhang, S.: Part-based object detection using cascades of boosted classiers, ACCV, pp.556 565 (2009). 15) pp.355 364 (2010). 8 c 2011 Information Processing Society of Japan