1 2 3 Incremental Linefeed Insertion into Lecture Transcription for Automatic Captioning Masaki Murata, 1 Tomohiro Ohno 2 and Shigeki Matsubara 3 The development of a captioning system that supports the real-time understanding of spoken documents such as lectures and commentaries is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on the screen, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper proposes a technique for incrementally inserting linefeeds into a Japanese spoken monologue as an elemental technique to generate the readable captions. Our method appropriately and incrementally inserts linefeeds into a sentence by machine learning, based on the information such as dependencies, clause boundaries, pauses and line length. An experiment using Japanese speech data has shown the effectiveness of our technique. 1. 1) 2) 4) 5) 6) 1 2 7) 1 Graduate School of Information Science, Nagoya University 2 Graduate School of International Development, Nagoya University 3 Information Technology Center, Nagoya University 1 c 200 Information Processing Society of Japan
1 7.35% 74.0% 2. 2 3 1 2 2 3 linefeed point 3. 1 1 5 4 2 c 200 Information Processing Society of Japan
1 戦争が 2 終わりまして 3 それから 今日までの : 文節 4 4 5 五十年間を 6 便宜的に 1 ~ 7 分けますと 8 私の : 図 5 における文節番号 4 1 5 5 ( 1 ) 考えでは 1 1 (a) ( 2 ) CBAP 8) (b) 3 2 3 ( 3 ) ) (c 1 2 ( 4 ) ) 1 (d 2 3 ( 5 ) ( 6 ) (e 1 3 (f) 1 2 2 3 1 2 1 2 3 (g) (l) (a) (f) 8 5. 3.1 1 4-i) 1 7 1 n B = b 1 b n R = r 1 r n r i b i r i r i = 1 = 0 m j L j = b j 1 bj n j (1 j m) 1 k < n j r j k = 0 k = n j r j k = 1 3.1.1 B P (R B) R P (R B) 1 3.1.2 3 c 200 Information Processing Society of Japan
(a) (d) (g) (j) 1 2 3 4 8 文節 4 8 4 8 1 2 3 1 2 3 7 4 5 6 8 改行挿入判定 改行挿入判定 直後の文節に係らない 改行挿入判定 改行挿入判定 直後の文節に係らない (b) (e) (h) (k) 4 8 4 8 1 2 3 1 2 3 7 4 5 6 8 節 節境界 改行改行改行改行改行ナシナシアリナシアリ 改行挿入判定 改行挿入判定 1 2 3 改行改行ナシアリ 改行挿入判定 改行挿入判定 1 2 3 7 4 5 6 8 (c) (f) (i) (l) 4 8 4 8 係り受け関係 1 2 3 1 2 3 7 4 5 6 8 改行挿入判定 改行挿入判定 改行挿入判定 改行挿入判定 今日今日までのまでの五十年間五十年間を便宜的に分けますと私の 5 4 c 200 Information Processing Society of Japan
P (R B) (1) =P (r 1 1 = 0,, r 1 n 1 1 = 0, r 1 n 1 = 1,, r m 1 = 0,, r m n m 1 = 0, r m n m = 1 B) =P (r 1 1 = 0 B) P (r 1 n 1 1 = 0 r 1 n 1 2 = 0,, r 1 1 = 0, B) P (r 1 n 1 = 1 r 1 n 1 1 = 0,, r 1 1 = 0, B) P (r m 1 = 0 r m 1 n m 1 = 1, B) P (r m n m 1 = 0 r m n m 2 = 0,, r m 1 = 0, r m 1 n m 1 = 1, B) P (r m n m = 1 r m n m 1 = 0,, r m 1 = 0, r m 1 n m 1 = 1, B) P (r j k = 1 rj k 1 = 0,, rj 1 = 0, rj 1 n j 1 = 1, B) 1 B j 1 b j k P (r j k = 0 rj k 1 = 0,, rj 1 = 0, rj 1 n j 1 = 1, B) b j k P (r m n m = 1 r m n m 1 = 0,, r m 1 = 0, r m 1 n m 1 = 1, B) = 1 1 3.1.2 P (r j k = 1 rj k 1 = 0,, rj 1 = 0, rj 1 n j 1 = 1, B) P (r j k = 0 rj k 1 = 0,, r j 1 = 0, rj 1 n j 1 = 1, B) 7) b j k b j k b j k b j k b j k b j k b j 1 bj k b j k bj k b j k bj k b j k 3 2 3 6 7 b j k 4 0.2 0.2 1.0 1.0 3.0 3.0 b j k - - - - - 4. 4.1 10) 6 16 1 15 16 2 14 20,707 5 c 200 Information Processing Society of Japan
6 11) 1,000 20 4.2 = = F = 2 + Julius 12) 4.3 1 F 7.35% (5,711/7,17) 74.0% (5,711/7,625) 77.06 81.21% (5,845/7,17) 7.47% (5,845/7,355) 80.33 100 0 80 ] % 70 [ 60 合 50 割積 40 累 30 20 10 0 本手法 0 2 4 6 8 10 12 14 16 18 20 22 24 7 遅延時間 [ 秒 ] 文単位の手法 1 7) 7 4 4 = / 1.5 7.14 6 c 200 Information Processing Society of Japan
2 F 8.24% (1,517/1,700) 100.00% (1,517/1,517) 4.31 76.30% (4,14/5,47) 68.66% (4,14/6,108) 72.28 3 4.4 (%) 83.2 (68/838) 8.81 (581/588).0 (10/110) 100.00 (31/31) 100.00 (12/12) F 2 183 1,700 3 3 61 65.5% 1,607 1,456 0.60% 3 2.88% 5 15% 151 2.72% 5. 7.35% 74.0% (B) (No. 21700157) 7 c 200 Information Processing Society of Japan
1) vol.1, no.12, pp.1024-102 (2008). 2) G. Boulianne, J.-F. Beaumont, M. Boisvert, J. Brousseau, P. Cardinal, C. Chapdelaine, M. Comeau, P. Ouellet and F. Osterrath: Computer-Assisted Closed- Captioning of Live TV Broadcasts in French, Proc. th ICSLP, no.mon2a2o-1, pp.273-276 (2006). 3) J. Xue, R. Hu and Y. Zhao: New Improvements in Decoding Speed and Latency for Automatic Captioning, Proc. th ICSLP, no.wed1cap-8, pp.1630-1633 (2006). 4) C. Munteanu, G. Penn and R. Baecker: Web-Based Language Modelling for Automatic Lecture Transcription, Proc. 8th Interspeech, no.thd.p3a-2, pp.2353-2356 (2007). 5) D vol.j0-d, no.3, pp.808-814 (2007). 6) vol.j84-d-ii, no.6, pp.888-87, 2001. 7) vol.nl-188, pp.37-44 (2008). 8) CBAP vol.11, no.3, pp.3-68 (2004). ) T. Ohno, S. Matsubara, H. Kashioka, T. Maruyama, H. Tanaka, Y. Inagaki: Dependency Parsing of Japanese Monologue Using Clause Boundaries, Language Resources and Evaluation, vol.40, no.3-4, pp.263-27 (2007). 10) S. Matsubara, A. Takagi, N. Kawaguchi and Y. Inagaki: Bilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research, Proc. 3rd LREC, pp.153-15 (2002). 11) L. Zhang: Maximum entropy modeling toolkit for python and c++, http://homepages.inf.ed.ac.uk/ s0450736/maxent toolkit.html (2007) [Online; accessed 6-September-2007]. 12) Julius vol.20 no.1 pp.41 4 (2005) 13) T. Kudo and Y. Matsumoto: Japanese Dependency Analyisis using Cascaded Chunking, Proc. 6th CoNLL, pp.63-6 (2002). 8 c 200 Information Processing Society of Japan