3: 2: 2. 2 Semi-supervised learning Semi-supervised learning [5,6] Semi-supervised learning Self-training [13] [14] Self-training Self-training Semi-s

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 599-8531 1-1 E-mail: tsukada@m.cs.osakafu-u.ac.jp, {masa,kise}@cs.osakafu-u.ac.jp Semi-supervised learning 2 2 Segmentation by RecognitionLabeling by RecognitionSemi-supervised Learning 1. [1, 2] 1 (i) (ii) (iii) (iv) NEOCR [3] Street View House Numbers Dataset [4] (ii) (iv) 1: (ii) (iii) (iv) Semi-supervised learning [5, 6] 2 2. Semi-supervised learning 2. 1 [7] [8 11] [12] [2] [2] [2] [2] 1 1 1

3: 2: 2. 2 Semi-supervised learning Semi-supervised learning [5,6] Semi-supervised learning Self-training [13] [14] Self-training Self-training Semi-supervised learning Self-training Nearest Neighbor Nearest Neighbor 3. 3. 1 2 (I) PCA-SIFT [15] Dense Sampling (II) [16] d nn d 2nn d nn d 2nn < t d (1) t d 3 (III)Reference Point(RP) [17] RP RP RP (IV) (i)rp d c RP (ii) RP n c (iii) RP x y (V) 4 2 x d x x d x d x d x x S x S x S x = S d x dx y 2

S dx dy S d y S dy S d x dx d x 4: d y 5: 3. 2 5 (I) (II) (III) (IV) (III) (V) (II)(IV) Reliability Check 3. 2. 1 Dense Sampling ID 3. 1 (1) 1/N i N i i ID 3. 2. 2 Reliability Check Reliability Check s 1 2 s 2 t l < s 2 s 1 < t u (2) t l t s 1 (1) 4. 1 3. 1 2 3. 2 3 4. 1 Street View House Numbers Dataset Street View House Numbers Dataset [4] Full Numbers Full Numbers 4. 2 1 train 3

70 60 # labeled data : 100 # labeled data : 500 # labeled data : 1000 Recall [%] 50 40 30 20 10 10 20 30 40 50 60 70 80 Precision [%] (a) 100 (b) 1000 6: 1Recall-Precision 7: 1 96 96[pixels] 200 100500 1,000 extra 1,000 t d d c n c 0.93035 RecallPrecision d c n c [18] 6 6 7 100 1,000 7 8 8 1 4. 3 2 2 3 1 3. 2 2 Reliability Check 3 10 10,000 train extra 8: 1 test t d t l t u 0.900.2 4. 3. 1 1 (2) a. b. (1) c. (1) d. b d a c b d (1) b d a c ab cd 4

1: 2-1 a. b. c. d. [%] 46.3 44.8 46.3 44.8 [%] 42.7 50.8 50.3 64.7 41,360 46,834 30,624 33,531 [%] 63.7 68.2 81.4 89.0 2: 3 [%] 77.5 [%] 78.8 4,037 [%] 73.0 4. 3. 2 Reliability Check 9(a) 1,000 10,000 1,000 (Ground Truth) Correct CheckCorrect Check (1) Reliability Check Ground Truth Correct Check Correct Check Reliability Check 9(b) 9(a) Reliability Check Recognition Rate [%] # of Retrained Data 100 90 80 70 60 50 Ground Truth Correct Check Proposed Method 40 0 2000 4000 6000 8000 10000 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 # of Unlabeled Data per Class (a) Ground Truth Correct Check Proposed Method 0 0 2000 4000 6000 8000 10000 # of Unlabeled Data per Class (b) 9: 2-2 4. 3. 3 10 5

Recognition Rate [%] 80 70 60 50 40 30 20 # of Labeled Data: 10 # of Labeled Data: 50 10 # of Labeled Data: 500 # of Labeled Data: 100 # of Labeled Data: 1000 0 0 100 500 1000 # of Labeled Data + Unlabeled Data per Class 10: 2-3 4. 4 3 3 2 Reliability Check Reliability Check (2) 2 2 train 1,000 extra 5,000 13,215 test t d d c n c t l t u 0.930550.30.7 2 1.3% 4,037 73% 2,947 2 Reliability Check Reliability Check 5. Reliabiilty Check JST CREST [1] M. Iwamura, T. Tsuji, and K. Kise, Memory-based recognition of camera-captured characters, Proc. DAS, 2010. [2] M. Iwamura, T. Kobayashi, and K. Kise, Recognition of multiple characters in a scene image using arrangement of local features, Proc. ICDAR, 2011. [3] A.D. Robert Nagy and K. Meyer-Wegener, NEOCR: A configurable dataset for natural image text recognition, Proc. CBDAR, 2011. [4] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A.Y. Ng, Reading digits in natural images with unsupervised feature learning, NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. [5] O. Chapelle, B. Schölkopf, and A. Zien eds., Semisupervised learning, Cambridge, MIT Press, Sept. 2006. [6] X. Zhu and A.B. Goldberg, Introduction to semi-supervised learning, Morgan and Claypool Publishers, Sept. 2009. [7] J.-J. Lee, P.-H. Lee, S.-W. Lee, A.L. Yuille, and C. Koch, Adaboost for text detection in natural scene, Proc. IC- DAR, 2011. [8] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural scenes with stroke width transform, Proc. CVPR, 2010. [9] P. Sanketi, H. Shen, and J.M. Coughlan, Localizing blurry and low-resolution text in natural images, Proc. IEEE Workshop on Applications of Computer Vision, 2011. [10] C. Yao, Z. Tu, and Y. Ma, Detecting texts of arbitrary orientations in natural images, Proc. CVPR, 2012. [11] L. Neumann and J. Matas, Real-time scene text localization and recognition, Proc. CVPR, 2012. [12] Y.-F. Pan, X. Hou, and C.-L. Liu, A hybrid approach to detect and localize texts in natural scene images, IEEE Trans. on Image Processing, vol.20, no.3, pp.800 813, March 2011. [13] M. Tsukada, M. Iwamura, and K. Kise, Expanding recognizable distorted characters using self-corrective recognition, Proc. DAS, 2012. [14] V. Frinken, M. Baumgartner, A. Fischer, and H. Bunke, Semi-supervised learning for cursive handwriting recognition using keyword spotting, Proc. ICFHR, 2012. [15] Y. Ke and R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, Proc. CVPR, pp.506 513, 2004. [16] pp.73 78PRMU2012-142Feb. 2013 [17] M. Klinkigt and K. Kise, Using a reference point for local configuration of sift-like features for object recognition with serious background clutter, IPSJ Trans. on Computer Vision and Applications, vol.3, pp.110 121, Dec. 2011. [18] C. Wolf and J.-M. Jolion, Object count/area graphs for the evaluation of object detection and segmentation algorithms, IJDAR, vol.8, no.4, 2006. 6