2012 2013 3 31 ( : A9TB2096)
Twitter i
1 1 1.1........................................... 1 1.2........................................... 1 2 4 2.1................................ 4 2.2............................... 4 3 6 3.1.................... 6 3.2..................................... 7 3.3.................................... 7 3.4...................................... 8 3.5........................................... 8 4 11 4.1......................................... 11 4.1.1...................................... 11 4.2...................................... 13 4.3......................................... 13 4.4......................................... 14 4.5........................................... 14 5 16 ii
1 1.1 Twitter 2011 ( *) ( )!! 1.2 1 ( * ) ( ) Mecab[1] 1
,,*,*,,,,,, /,,*,*,*,,,,,,,,*,*,*,*,*,,,,,,,*,*,*,*,,,,,,,*,*,*,*,,,,, 1,,*,*,*,*,1,,,,,,,*,*,*,,,,,,,*,*,,,,,, / / /,,,*,*,*,*,,,,, (,,*,*,*,*,"(","(","(",, *,,*,*,*,*,*,*,*,,,,*,*,*,*,,,,, ),,*,*,*,*,")",")",")",,,,,*,*,*,,,,,,,*,*,*,*,,,,,,,*,*,,,,,, /,,,*,*,*,*,,,,,,,*,*,*,*,*,,*,*,*,*,* (,,*,*,*,*,"(","(","(",,,,*,*,*,*,,,,, \_,,*,*,*,*,\_,\_,\_,,,,*,*,*,*,,,,, ),,*,*,*,*,")",")",")",,,,*,*,*,*,* EOS 1 2
,,*,*,,,,,, /,,*,*,*,,,,,,, 1,,*,*,*,*,1,,,,,,,*,*,*,,,,,,,*,*,,,,,, / / /,,,,*,*,*,,,,,,,*,*,*,*,,,,,,,*,*,,,,,, /,,,*,*,*,*,,,,, EOS 3
2 2.1 He is cooooooooooooooolll cooooooooooooooolll Brody [2] cooooool cooollll cool 2.2 [3] 20 (* *) ( )( ) 4
1 5
3 3.1 1 3 1 3.1 3.1 1 う に残残残削削削置換 おはよおぉぉぉ 3.1: 1 6
3.2 brat[4] brat 3.2 brat Mecab 1,,*,*,,,,,, /,,,*,*,*,*,*,,*,*,*,*,,,,,,,*,*,*,*,,,,, brat 3.3 3.3 verb noun part symb 3.3 3.4 noun( ) aux( ) 3.3 3.4 3.3 7
( A 3.4 CRF(Conditional Random Fields)[5] CRF CRFsuite[6] 3.5 3 2 2 : 1 : 1 : 2 : True False 8
3.2: brat 3.3: う に置換 3.4: 9
,,*,*,,,,,, /,,,*,*,*,*,*,,*,*,*,*,,,,, 10
4 4.1 ( ) 1 4.1.1 2 1 2 4.1 3 1 4.2 1 2 4.3 4 1 2 11
正規化前のテキスト おはようううございまつ おはよううございまつ う の削除 ( コスト : 1) おはようございまつ う の削除 ( コスト : 1) つ を す に置換 ( コスト : 1) 人手による正規化後のテキスト おはようございます 4.1: 1 モデルによる正規化後のテキスト 1 おはよううございます う の削除 ( コスト : 1) 人手による正規化後のテキスト おはようございます 4.2: 2 モデルによる正規化後のテキスト 2 うはようううございまつ おはようううございまつ う を お に置換 ( コスト : 1) おはよううございまつ う の削除 ( コスト : 1) おはようございまつ う の削除 ( コスト : 1) つ を す に置換 ( コスト : 1) 人手による正規化後のテキスト おはようございます 4.3: 3 12
1 2 4.2 2 1 1 2 1 2 4.3 Hottolink Twitter 2011 3 11 2011 3 29 2 1 1000 500 500 1000 1495 500 731 500 764 URL 3 3 13
4.1: 1 0.3796 2 0.4188 3 0.3691 4 0.4672 5 0.4463 3, 0.3469 3, 0.4450 3,, 0.4267 4.4 0.8770 1 0.7866 2 0.7657 4.1 3 4.5 4.1 1 2 3 14
500 15
5 Twitter 16
17
[1] Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. Applying conditional random fields to Japanese morphological analysis. Proceedings of EMNLP. 2004. [2] Samuel Brody, and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011. [3],,,.. 8.1 (2009): 23-28. [4] Pontus Stenetorp, Sampo Pyysalo, Goran Topi, Tomoko Ohta, Sophia Ananiadou, and Jun ichi Tsujii. BRAT: a Web-based Tool for NLP-Assisted Text Annotation. EACL 2012 (2012): 102. [5] John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001). [6] Naoaki Okazaki. CRFsuite: a fast implementation of conditional random fields (CRFs). URL http://www.chokkan.org/software/crfsuite (2007). 18