NAIST-IS-MT1151062 2013 3 15
( )
Twitter 2011 3 Twitter E Twitter Twitter Twitter, NAIST-IS-MT1151062, 2013 3 15. i
Extraction of Rumor Corrective Information from Twitter posts in Disaster Hiroshi Takahashi Abstract Twitter, a micro-blogging service, is widely used in Japan. Because information on Twitter is updated in real time, people used Twitter as a means of emergency communication instead of phone or E-mail, during the Great Eastern Japan Earthquake in 2011. However, at that time, many rumors, which have no information source were posted and spread on Twitter. False rumors may adversely affect the transmission of important information. Therefore, it is important to construct the environment that prevents people from spreading rumors. In order to prevent the spread of rumors, an extraction task of rumor corrective information by using rumor markers rumor have been studied. However previous research has not investigated the influence of rumor markers on a variety of extracted corrective information. In this study, we use additional rumor markers misinformation and lie to make clear this influence. Keywords: Text mining, Twitter, Natural Language Processing Master s Thesis, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-MT1151062, March 15, 2013. iii
v 1 1 1.1... 1 1.2... 2 2 3 2.1... 3 2.2 Twitter... 3 2.3 Twitter... 4 3 7 3.1... 7 3.2... 7 3.3... 7 3.4... 11 3.4.1... 11 3.4.2... 11 3.4.3 Twitter... 12 3.4.4... 12 3.4.5... 12 4 15 4.1... 15 4.2... 15 4.3... 15 4.4... 17 4.5... 19 5 21 5.1... 21
5.2... 23 6 27 29 31 A Twitter 33 B 35 vi
1 1 1.1 Twitter 1 Twitter Twitter 2012 12 2 2 Twitter 3 140 1 Twitter A Twitter Twitter Twitter 2011 3 11 Twitter Twitter [1] Twitter 1.1 1 https://twitter.com 2 https://twitter.com/twj/status/281052580849778688 3 http://twinavi.jp/guide
1.1: Twitter Twitter 1.2 [2] Twitter 1.1 Twitter web Twitter [3] [4] 2
3 2 Twitter Twitter 2.1 Twitter [4] 1 URL SVM F 88.8 2 2.2 Twitter
Castillo [5] CHAT NEWS NEWS TRUE FALSE CHAT NEWS F 92.4 TRUE FALSE F 86.0 Message-based User-based Propagation-based Topic-based Castillo Twitter Monitor [6] 1 Twitter API Tweet URL bursty [7] 2.3 Twitter Duan [9] pointless babble Cheng [10] 1 http://www.twittermonitor.net/ 4
Cheng 26% 0.42% 5
7 3 3.1 3.2 2 3.3 3
3.1: 3.1 RT @ username: 8
3.1: 3 http://www.xxx.xxx 2 30 3 0 6 http://xxx.xxx.xxx 1000 530 369 266 3.1 3.2 3.3 9
3.2: twitter 3.3: http://www.xxx.xxx.xxx 10
MARKER CONTEXT LENGTH P NAME O NAME L NAME P NOUN QUESTION EXCLAMATION 3.4: window size ±3 1 window size ±1 ±5 ±3 3.4 3.4.1 3 400 400 3.4.2 3.4 11
TAG RT CC URL 3.5: Twitter URL 3.4.3 Twitter 3.5 Twitter URL Twitter Twitter 140 URL Twitter 3.4.4 3.6 3.4.5 SVM Support vector machine SVM libsvm 1 RBF grid.py 1 http://www.csie.ntu.edu.tw/ cjlin/libsvm/ 12
FOLLOW FOLLOWER LISTED CREATED POST FOLLOW MAX FOLLOWER MAX LISTED MAX CREATED MAX POST MAX 3.6: 13
15 4 4.1 4.2 Project 311 1 Twitter Japan 1 8000 2011 3 11 2 2011 3 9 4 4 Twitter API 2011 3 2012 11 4.3 1 https://sites.google.com/site/prj311/
4.1: 4.1 1 8000 240 2012 11 Twitter API 75% 18% 7% 16
bot bot Twitter bot bot bot bot BOT 4.2 1. URL 2. 3. MeCab 0.994 IPA 2.7.0 2 11,700 134,300 227,700 4.4 [3] 2 http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html 17
4.2: 3 CONTEXT 18
4.5 F = (4.1) = (4.2) F = 2 + (4.3) F 19
21 5 5.1 8 Twitter 5.1 Tweet Twitter 1000 239 RT @smasuda: BLOG http://xxx.xxx 5.2 URL Twitter URL 2 5.3 Twitter
mixi Twitter URL http://xxx.xxx URL 2 Castillo [5] Twitter anpi NLP 1 Twitter 1 http://trans-aid.jp/anpi NLP/index.php/ 22
5.1: F 73.4 75.3 76.8 78.3 75.2 76.5 Twitter 75.2 77.4 77.6 78.2 5.2: F 69.2 69.4 75.2 76.8 72.0 73.7 Twitter 68.0 70.6 76.5 77.1 5.2 500 183 5.4 23
5.3: F 64.1 65.5 72.1 73.0 68.8 69.5 Twitter 65.1 67.1 71.2 73.4 5.4: F F F 58.8 66.7 69.2 69.3 69.2 70.8 7 26 1 11 5.6 5.7 5.5: F F F 82.4 90.3 82.4 77.8 82.4 82.4 24
5.6: 1 RT @anpui: 15 15 5.7: RT @xxxjorge666xxx: 25
27 6 Twitter Twitter Twitter Twitter
29 Twitter Japan
31 [1] 24 3 [2] : 1997 [3] 4, pp. 1-4, 2011. [4] 18, pp. 891-894, 2012. [5] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In WWW 2011, pp. 675-684, 2011. [6] M. Mathioudakis and N. Koudas. TwitterMonitor: trend detection over the twitter stream. In Proceedings of the 2010 international conference on Management of data, pages 1155-1158. ACM, 2010. [7] Twitter 26 CD-ROM, 2012. [8] Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP, pp. 1589-1599, 2011. [9] Yajuan Duan, Long Jiang, Tao Qin, Ming Zhou, and Heung-Yeung Shum. An emprical study on learning to rank of tweets. In COLING 2010, pp. 295-303, 2010.
[10] Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: A content-based approach to geo-locating twitter users. In CIKM 10, pp. 759-768, 2010. 32
33 A Twitter Twitter Tweet Twitter 140 Time Line Twitter Twitter follow follower Retweet Reply @ Mention @ Quote
図 A.1: タイムラインの例 する際に再投稿するツイートの先頭に自身のコメントを付け加えることが できる リスト Lists 好みに合わせたアカウントを登録し 登録したアカウントのツイートだけ を表示するタイムラインを作成する機能 ハッシュタグ Hash tag #から始まるタグであり ツイートが言及するトピックを投稿者が明示する ことができる 非公開 Protect アカウントを非公開にすると自分のツイートをフォロワにのみ公開するこ とができる 34
35 B Twitter web 1 1 http://www.kotono8.com/2011/04/08dema.html
B.1: B.2: WHO 36
B.3: 4 B.4: 11 200km B.5: 37
B.6: 2010 8 B.7: 4 2 50km B.8: BoA Twitter T 38
B.9: UNICEF B.10: B.11: 15 100 24 B.12: 105 AC 39