NAIST-IS-MT0151097 3 2003 2 7
( )
3 3 3 (visual hull) 3 visual cone visual hull 3, NAIST-IS- MT0151097, 2003 2 7. i
,,, ii
Wearable Virtual Tablet:Fingertip Drawing Device using a stereo camera Toshiyuki Matsubara Abstract In this thesis, we propose the WVTWearable Virtual Tablet (WVT, in short) that acquires 3D information by emlpoying a pair of haed-mounted cameras. Since a human s fingertip does not have any textures, however, it is difficult to 1) make a correspondence between feature points in two images each of which are observed by different cameras and 2) reconstruct the 3D position of the feature point from the correspondence based on stereo vision. We, therefore, estimate the position and posture of a finger by comparing a 3D finger model with a 3D volumetric region obtained by the Volume Intersection Method. Since 1) the distance between two cameras mounted in a human s head is very short and 2) they look toward the alomost same direction, the reconstructed volume lengthens along the view direction of the camera. To avoid confusing the true and false shapes of the finger included in the reconstructed volume, we search for the finger region based on the contour information of the finger detected in the observed images. Our developed system can work well without being interfered by variations of illumination conditions, even in an outdoor scene. Keywords: Wearable Computer, Input Device, Stereo Camera, Volume Intersection Method Master s Thesis, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-MT0151097, February 7, 2003. iii
1. 1 1.1................................ 1 1.2................................ 4 1.2.1 4 1.2.2......... 5 1.3............................. 6 2. 9 2.1............................. 9 2.2............................. 9 2.3................................ 10 2.3.1.................. 10 2.3.2 RGB.......... 11 2.4.............................. 12 3. 14 3.1 3................. 14 3.1.1......................... 14 3.1.2....................... 16 3.1.3 3................... 20 3.1.4 3............. 21 3.2...................... 24 4. 3 27 4.1................................ 28 4.2.............................. 31 4.2.1............... 32 4.2.2..................... 32 4.3 3........................ 35 iv
4.3.1....................... 36 4.3.2 3..... 38 4.3.3 3..... 39 4.3.4 3..................... 40 4.3.5........................ 43 4.3.6 3.... 45 4.4................. 47 5. 49 5.1......................... 50 5.2................................ 52 5.3............. 54 6. 56 58 59 v
1............ 3 2. 3 3.................. 4 4.............. 5 5 CCD.................... 5 6......................... 6 7............................ 7 8 CANON VH-2002....................... 10 9 CANON VH-2002................... 10 10............. 12 11....................... 15 12....................... 15 13 0 [frame]....................... 16 14 1 [frame]....................... 16 15 ( )....................... 18 16....................... 18 17......................... 19 18....................... 20 19................. 21 20........................... 22 21...................... 26 22 RGB......................... 29 23............................. 29 24........................ 29 25....................... 29 26 1.............................. 30 27 2.............................. 30 28........................ 31 vi
29................... 32 30......................... 34 31......................... 36 32........................ 36 33.............................. 37 34................................ 37 35................. 39 36 visual hull................ 39 37 visual hull............. 40 38............................. 41 39 3............................. 41 40 visual cone.................... 42 41........................... 43 42............................ 43 43......................... 44 44 3............................. 46 45....................... 47 46......................... 49 47 ROC............................... 51 48.................. 53 49...................... 54 50....................... 54 51................. 55 1 true positive, false positive fraction................. 50 2 [pixel]........................ 52 3 [pixel]......... 55 vii
1. 1.1 PDA Personal Digital Assistant ON [1] 1
[2] [3] [4] [5] [6] [6] Head Mounted Display HMD 3 3 1 2 2
3 1 2 3
1.2 3 [6] 3 3 1.2.1 3 3 [9] 3 3 3 3 3 3 4
3 PC PC 3 3 1.2.2 [10] 60cm 5 4 CCD 3 4 4 5 CCD 2 3 3 3 [19] 5
6 camera 1 camera 2 image 1 object silhoutte visual cone 1 visual cone 2 image 2 camera 3 object image 3 visual hull 6 6 3 PC 1.3 3 6
3 2 3 ( 7 ) 7 3 2 3 3 3 3 3 visual cone 3 3 3 7
3 3 [6] 3 [6] 8
2. 2.1 3 3 3 3 2.2 HMD CANON VH-2002 VH-2002 HMD 8 HMD 9 [6] LCD VH-2002 LCD LCD 1 HMD 1 200[msec] 9
2 8 CANON VH-2002 9 CANON VH-2002 2.3 2.3.1 3 3 3 3 2 2 3 Tsai [11] 2 51 10
2 2 [12] 2.3.2 RGB 2 CCD RGB NTSC ( 1) Gray =0.299R +0.587G +0.114B (1) Gray R,G,B R G B 11
2.4 10 10 3 3 3 0 12
10 10 3 3 10 3 10 10 3 3 4 3 4.4 13
3. 3 3 3 3 3 3 3 3 3.1 3 3 3 3 3 3 3 3 3.1.1 Kanade-Lucas-Tomasi Feature Tracker[13] KLT KLT X Y (x, y) X Y g g = ( I x, I ) y W gg T w G = W (2) gg T wda (3) 14
w W G W G 3 KLT 3 KLT 11,12 11 12 13,14 13 13 15
13 0 [frame] 14 1 [frame] 3.1.2 3 Step1. : Step2. : 3 2 Step3. : 16
2 CDD RGB (1) (x l,y l ) W l (x r,y r ) W r SSD Sum of Squared Difference ( 4 ) SSD = P Q (I(xl,y l ) I(x r,y r )) 2 (4) I(x, y) (x, y) P, Q W l W r [14] W W sub SSD W W sub Step4. : (5)[18] 17
s(i, j) = ( 1 c i j 2π exp d j d i 2 ) 2c 2 i j 2 i d i i d j c 5) disparity gradient d j d i / i j i j Step5. : (5) 15 16 3 17 15 ( ) 16 15 16 17 17 18
17 19
3.1.3 3 3 18 L, R P X, Y, Z) (x l,y l ), (x r,y r ) L 3 bf r x l X = f r x l f l x r (6) bf r y l Y = f r x l f l x r (7) bf r f l Z = f r x l f l x r (8) f l,f r b 18 20
3.1.4 3 3 3 3 19 19 20 20 3 3 21
20 3 3 3 LMedS Least Median of Squares [15] 3 LMedS 9) LMedS 12) 5 LMedS LMedS = min med ɛ 2 i (9) ɛ i 3 3 ɛ i = ax i + by i + cz i + d a2 + b 2 + c 2 (10) ax + by + cz + d =0 3 (x i,y i,z i ) 3 LMedS 3 22
1. 3 3 2. 3 3 3. LMedS q 1 P P =1 {1 (1 ɛ) F } q (11) ɛ F F =3 ɛ =0.3 P =0.01 q 11 LMedS ɛ i ˆσ = C med ɛ 2 i (12) C =1.4826 [16] 2.5ˆσ ɛ i [16] 3 3 3 3 3 23
3.2 HMD 3 2 3 3 KLT 3 3 3 3 3 3 3 3 3 3 3 3 3 3 21 21 3 3 0[frame] 21(a) 3 21(b) 21 a 3 3 3 24
3 21(c),(d) 21(e) 3 21(f) 3 3 3 25
(a) 0 [frame] (b) 1 [frame] (c) 9 [frame] (d) 20 [frame] (e) 33 [frame] (f) 34 [frame] 21 26
4. 3 3 3 3 3 3 3 [6] CCD RGB 3 3 3 1. (4.1 ): 2. (4.2 ): 3. 3 (4.3 ): 3 3 27
4.1 3 3 k- [17] 22 K- 23 2 28
( 22) 22 RGB 23 CCD 24 24 5 5[pixel] 4 4 23 25 24 25 29
26 27 (a) 26 1 (b) (a) 27 2 (b) 27 K- 30
4.2 27 28 28 31
4.2.1 2 100 7 29 29 4.2.2 32
30 30 30 33
(a) 0 [frame] (b) 1 [frame] (c) 9 [frame] (d) 20 [frame] 30 34
4.3 3 3 2 3 3 3 3 visual hull visual hull visual hull 35
4.3.1 3 ATM ( 31 ) ( 32 ) 31 32 33 33 34 20 20pixel W tips 36
W tips W tips W tips 20cm 40cm 33 34 37
4.3.2 3 3 3 3 3 3 3 3 3 3 3 t V 13 V = V 1 V 2 t 1 t 2 (t t 2)+V 2 (13) (t 2,V 2 ) 2 (t 1,V 1 ) 1 38
4.3.3 3 3 visual cone visual hull 35 visual hull HMD visual hull 2 visual hull 35 36 HMD visual hull [mm] finger visual hull camera 1 camera 2 35 36 visual hull 39
visual hull visual hull visual hull ( 37 visual hull TLL TLR TRL finger TRR visual cone L visual cone R 37 visual hull 4.3.4 3 visual cone 37 visual hull visual hull visual cone 3 2 visual cone 2 37 T LL T RL T LR T RR 37 visual cone visual cone 2 visual cone visual cone 40
visual cone 3 3 3 visual cone 3 38 40 3 3 2 visual cone 3 39 40 center of intersection 3 3 outline of a finger outline Left outline Right center of intersection intersection Left finger intersection Right visual cone L visual cone R 38 39 3 40 outline Left L outline Right L, Plane Left L Plane Right L outline Left R,outline Right R,Plane Left R,Plane Right R 41
finger center of intersection intersection Left 3D Projected Plane LeftL 3D Projected PlaneRightL image L intersection Right 3D Projected Plane LeftR 3D Projected Plane RightR image R outline Left L outline Left R Projection CenterL outline Right R outline Right R Projection CenterR camera L camera R 40 visual cone Plane Left L Plane Left R (3 ) intersection Left intersection Right 42
4.3.5 visual cone 4.3.1 2 α β CCD α,β 3 41 offset fingertip true line of a finger false line of a finger offset false outline of a finger outline of a finger true outline of a finger 41 42 LMedS LmedS 42 LMedS 43
(x i,y i ) (min(x i ) < X < max(x i )), (min(y i ) <Y <max(y i )) (x i,y i ) (14) i d (x k,y k ) D = (14) 0 (x k,y k ) k=1 d (14) (15) ɛ i = ax i + by i + c a2 + b 2 + D (15) ax + by + c =0 ax i + by i + c / a 2 + b 2 (15) LMedS D 42 43 43 44
4.3.6 3 40 3 2 3 P 1(x 1,y 1,f),P2(x 2,y 2,f) O(0, 0, 0) 3 3 f outline Left L Plane Left L outline Left R Plane Left R Plane Left L Plane Left R 2 3 intersection Left intersection Right intersection Left intersection Right visual hull 3 3 intersection Left intersection Right visual hull visual cone intersection Left,intersection Right 2 3 3 3 visual hull 3 visual cone intersection Left,intersection Right XY Z =0 X 3 fingerline n =(l, m, n) 3 fingerline veca 0 = x 0,y 0, 0 2 3 3 44 3 visual hull 3 voxel voxel 3 voxel 2 voxel voxel 3 visual hull 45
44 3 6mm 3 6mm 45 3 visual hull 3 16 3 SSD R = SSD + D ist c (16) SSD 4 D ist 3 3 c 45 46
10mm 9mm 6mm center of the fingertip 45 3 3 4.4 3 3 distance t : distance t t 3 3 d d C distance t = max d t min d t + C (17) C : 3 3 distance t 3 47
2 48
5. Pentium4 2.8GHz 2 OS linux kernel 2.4.19 CANON VH-2002 2 320 240pixel 46 46 46 49
5.1 HMD HMD 30cm 46) 46 3pixel ROC Receiver Operating Charactaeristic ROC ROC false positive true positive true positive fraction,false positive function 5.1 true positive fraction false positive fraction 1 true positive, false positive fraction 46 47 ROC 50
47 ROC 47 3 4.3.2 2 3 2 47 3 51
5.2 46 30mm 46, 2 2 1.02 0.6 1.76 2 [pixel] 2 52
48 3pixel 3pixel 48 90% 1.8pixle 12mm 48 53
5.3 46 [6] 3 HMD 49 50 49 50 54
46, 3 0.77 0.50 0.98 3 [pixel] 51 3pixel 3pixel 51 3 51 55
6. [6] 3 4.3 3 3 3 3 3 4.5[frame/s] HMD HMD 56
57
Setalaphruk Vachirasuk CREST JST CREST 58
[1] half keyboard. http://halfkeyboard.com/ [2] Lightglove. http://www.lightglove.com/ [3] Jun Rekimoto. GestureWrist and GesturePad. Unobtrusive Wearable Interaction Devices. In ISWC 2001, 2001. [4] ar In WISS 2000 2000 [5]., VIS2001-103,Vol.25,No.85, pp.47-52 (2001) [6].. InMIRU2002 2002. [7] K. Oka, Y. Sato, and H. Koike, Real-time Tracking of Multiple Fingertips and Gesture Recognition for Augmented Desk Interface Systems, in Proc. of 2002 IEEE International Conference on Automatic Face and Gesture Recognition (FG 2002), May 2002. [8],, : 3 ; Vol.J80-D-II No.1, pp.44-55, 1997. [9] N.Shimada, Y.Shirai, Y.Kuno and J.Miura, 3-D Pose Estimation and Model Refinement of An Articulated Object from A Monocular Image Sequence, Proc. of The 3rd Conf.on Face and Gesture Recognition, pp.268 273,1998 [10] Etsuko Ueda, Yoshio Matsumoto, Masakazu Imai, Tsukasa Ogasawara, Hand Pose Estimation Using Multi-Viewpoint Silhouette Images, Proceedings of 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), pp.1989-1996, Maui, USA, Oct 29-Nov 03, 2001. 59
[11] R.Y.Tsai, An efficient and accurate camera calibration technicue for 3D machine vision,cvpr,pp.364-374,1986. [12],,, 99-CVIM-118-10 (1999-05), pp. 67-74, 1999. [13] Jianbo Shi and Carlo Tomasi. Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994. [14] R.Kumar,H.S.Sawhney,Y.Guo,S.Hsu and S.Samarasekere, 3D Manipulation of Motion Imagery,Proc.Int.Conf.on Image Processing,pp.17-20,2000. [15] Rousseeuw,R.J, Least Median of squares regression,j.american Stat. Assoc.,Vol.79,pp.871-880,1984. [16] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987. [17],,,No.134 [18] K.Prazdny, Detection of binocular disparities,biol. Cybern.,pp.93-99,1985 [19] A.Laurentini, How far 3d shapes can be understood from 2d silhouettes,ieee Transactions on Pattern Analysis and Machine Intelligence,17(2),pp.188-195,1995. 60