DEIM Forum 2016 C5-1 182-8585 1-5-1 E-mail: saitoh-ryoh@uec.ac.jp, terada.minoru@uec.ac.jp Twitter,, Twitter,,, Bag of Words, Latent Semantic Indexing,.,,,, Twitter,, Twitter,, 1. SNS, SNS Twitter 1,,, Twitter,,,,, Twitter,,,,,,,, Twitter, 1 https://twitter.com/ ( 1),,,.,, 1,
,,, Twitter,,, ( ), 2. [1],,, ( ),,.,, Sungho Jeon [2], Twitter 4 URL, SVM,, 12 176426, 5618. 1, URL F., SVM,, 4 SVM, F,.,,,,, [3], 1 [2] Step Entered feature Recall Precision F-score 1 0.3084 0.8291 0.4496 2 + 0.5653 0.8611 0.6826 3 +URL 0.7699 0.7957 0.7825 4 + 0.7697 0.7987 0.7839 ( ) [4] Twitter.,,, [5], TV,, 1,,,,. [6],,, 2,.,,, Samuel Brody [7],,,,,,, 3. 3. 1
(SVM),,, Bag of Words,,, SVM K (K-fold cross-validation) 3. 2 2015 7 2 Twitter, 5500,.,,,,,,,,,,, 1:10 Yuxin Peng [8], SVM 2,,, 10 3. 3, SVM,, URL,, # URL URL http:// https://,,, @ @,,, ( ) RT @ ( ) RT 2,, #. # 1 MeCab,,, [5],,,,,.,,,,,., 1.0%, 80% Bag of Words (BoW ), Bag of Words, Bag of Words 2,, Latent Semantic Indexing(LSI) 3 4 1 128 URL,, 3, LSI 3. 4, SVM 2 3. 2 2100 BoW 3 Latent Semantic Analysis(LSA) 4 http://lsa.colorado.edu/
3. 4. 1 URL 2. [2], URL,, Twitter, URL URL, 3. 4. 2 2. [7],,,, 4 3. 4. 3 10 = ( ) 2 3 3. 5 K K, K, K-1, 1 K, K-1 1, K.,,, 10,, Accuracy( ), Precision( ), Recall( ), F1-score(F ), Area Under the Curve(AUC) 4. 4. 1 LSI LSI 1 128,, 2 4 2,,, 3 4 ( ), Accuracy 32
2 ( ) 1 0.571 (+/- 0.023) 0.593 (+/- 0.035) 0.596 (+/- 0.029) 0.572 (+/- 0.023) 0.543 (+/- 0.024) 2 0.572 (+/- 0.030) 0.573 (+/- 0.039) 0.595 (+/- 0.031) 0.574 (+/- 0.027) 0.547 (+/- 0.036) 4 0.589 (+/- 0.034) 0.581 (+/- 0.048) 0.609 (+/- 0.038) 0.591 (+/- 0.032) 0.571 (+/- 0.036) 8 0.600 (+/- 0.038) 0.610 (+/- 0.052) 0.622 (+/- 0.040) 0.601 (+/- 0.036) 0.583 (+/- 0.042) 16 0.595 (+/- 0.039) 0.649 (+/- 0.043) 0.627 (+/- 0.050) 0.597 (+/- 0.039) 0.570 (+/- 0.043) 32 0.639 (+/- 0.021) 0.688 (+/- 0.029) 0.661 (+/- 0.027) 0.640 (+/- 0.026) 0.626 (+/- 0.025) 64 0.633 (+/- 0.029) 0.706 (+/- 0.044) 0.701 (+/- 0.037) 0.635 (+/- 0.026) 0.601 (+/- 0.031) 128 0.631 (+/- 0.027) 0.726 (+/- 0.031) 0.698 (+/- 0.013) 0.633 (+/- 0.018) 0.598 (+/- 0.031) 32, 32 32 Accuracy 32 AUC, Accuracy 32 F, 32, 32, F 32, 32 4. 2 URL 1 128 URL 3 5, 4. 1, 4. 1, 32 6 (+ ) 4. 4 1 128 5 7, 4. 1, 4. 3 1 16. Accuracy, AUC, F 32, 4. 1 4. 2 32 5 (+URL) 7 (+ ) 4. 3 1 128 4 6, 4. 1 1 16 Accuracy F 8, AUC 32, 4. 1 4. 2 4. 5 1 128 6 8, 4. 1,, 4
3 (+URL) 1 0.605 (+/- 0.035) 0.628 (+/- 0.043) 0.626 (+/- 0.034) 0.606 (+/- 0.029) 0.588 (+/- 0.033) 2 0.602 (+/- 0.020) 0.590 (+/- 0.016) 0.630 (+/- 0.015) 0.603 (+/- 0.013) 0.581 (+/- 0.017) 4 0.618 (+/- 0.023) 0.633 (+/- 0.023) 0.638 (+/- 0.022) 0.619 (+/- 0.019) 0.604 (+/- 0.022) 8 0.621 (+/- 0.030) 0.640 (+/- 0.030) 0.638 (+/- 0.032) 0.622 (+/- 0.028) 0.610 (+/- 0.030) 16 0.620 (+/- 0.037) 0.663 (+/- 0.038) 0.644 (+/- 0.037) 0.621 (+/- 0.027) 0.605 (+/- 0.031) 32 0.652 (+/- 0.025) 0.697 (+/- 0.025) 0.668 (+/- 0.031) 0.653 (+/- 0.026) 0.645 (+/- 0.025) 64 0.649 (+/- 0.028) 0.715 (+/- 0.034) 0.695 (+/- 0.033) 0.651 (+/- 0.027) 0.628 (+/- 0.030) 128 0.646 (+/- 0.025) 0.733 (+/- 0.026) 0.694 (+/- 0.023) 0.648 (+/- 0.015) 0.624 (+/- 0.022) 4 (+ ) 1 0.676 (+/- 0.029) 0.728 (+/- 0.023) 0.684 (+/- 0.029) 0.677 (+/- 0.030) 0.673 (+/- 0.031) 2 0.665 (+/- 0.034) 0.701 (+/- 0.029) 0.666 (+/- 0.035) 0.665 (+/- 0.034) 0.664 (+/- 0.034) 4 0.684 (+/- 0.030) 0.711 (+/- 0.043) 0.687 (+/- 0.030) 0.684 (+/- 0.030) 0.683 (+/- 0.030) 8 0.687 (+/- 0.027) 0.714 (+/- 0.038) 0.689 (+/- 0.029) 0.687 (+/- 0.028) 0.686 (+/- 0.027) 16 0.676 (+/- 0.029) 0.728 (+/- 0.023) 0.684 (+/- 0.029) 0.677 (+/- 0.030) 0.673 (+/- 0.031) 32 0.664 (+/- 0.026) 0.740 (+/- 0.032) 0.680 (+/- 0.032) 0.665 (+/- 0.027) 0.656 (+/- 0.025) 64 0.660 (+/- 0.019) 0.726 (+/- 0.011) 0.688 (+/- 0.023) 0.662 (+/- 0.018) 0.648 (+/- 0.021) 128 0.644 (+/- 0.033) 0.723 (+/- 0.028) 0.684 (+/- 0.030) 0.646 (+/- 0.027) 0.625 (+/- 0.036) 5 (+ ) 1 0.642 (+/- 0.028) 0.692 (+/- 0.023) 0.648 (+/- 0.026) 0.642 (+/- 0.026) 0.638 (+/- 0.028) 2 0.651 (+/- 0.033) 0.702 (+/- 0.036) 0.651 (+/- 0.034) 0.651 (+/- 0.034) 0.651 (+/- 0.034) 4 0.668 (+/- 0.034) 0.723 (+/- 0.046) 0.668 (+/- 0.034) 0.668 (+/- 0.033) 0.668 (+/- 0.034) 8 0.675 (+/- 0.030) 0.731 (+/- 0.031) 0.676 (+/- 0.030) 0.675 (+/- 0.029) 0.674 (+/- 0.029) 16 0.669 (+/- 0.022) 0.731 (+/- 0.023) 0.673 (+/- 0.022) 0.668 (+/- 0.021) 0.666 (+/- 0.021) 32 0.679 (+/- 0.033) 0.751 (+/- 0.037) 0.684 (+/- 0.032) 0.678 (+/- 0.032) 0.676 (+/- 0.034) 64 0.673 (+/- 0.032) 0.751 (+/- 0.038) 0.674 (+/- 0.032) 0.674 (+/- 0.032) 0.673 (+/- 0.032) 128 0.653 (+/- 0.024) 0.727 (+/- 0.021) 0.653 (+/- 0.024) 0.653 (+/- 0.024) 0.652 (+/- 0.024) 6 (+all) 1 0.723 (+/- 0.030) 0.793 (+/- 0.028) 0.728 (+/- 0.030) 0.723 (+/- 0.030) 0.722 (+/- 0.030) 2 0.738 (+/- 0.028) 0.809 (+/- 0.028) 0.739 (+/- 0.028) 0.738 (+/- 0.029) 0.738 (+/- 0.029) 4 0.742 (+/- 0.025) 0.813 (+/- 0.030) 0.743 (+/- 0.027) 0.742 (+/- 0.026) 0.742 (+/- 0.025) 8 0.736 (+/- 0.035) 0.809 (+/- 0.027) 0.738 (+/- 0.034) 0.736 (+/- 0.036) 0.735 (+/- 0.036) 16 0.718 (+/- 0.040) 0.797 (+/- 0.032) 0.724 (+/- 0.038) 0.718 (+/- 0.039) 0.716 (+/- 0.040) 32 0.705 (+/- 0.030) 0.789 (+/- 0.033) 0.706 (+/- 0.029) 0.705 (+/- 0.029) 0.704 (+/- 0.030) 64 0.692 (+/- 0.025) 0.775 (+/- 0.016) 0.692 (+/- 0.025) 0.692 (+/- 0.026) 0.692 (+/- 0.025) 128 0.673 (+/- 0.023) 0.747 (+/- 0.025) 0.673 (+/- 0.024) 0.674 (+/- 0.023) 0.673 (+/- 0.024) 5. 5. 1 LSI BoW, LSI 1 128,
8 (+all). 5. 2 [2] 1 F 6 4 F, 0.04., F 0.7, 5. 3 URL URL 1 128 2. URL 2. URL,,,,,, URL 5. 4,,, ( ) ( ), 5. 5,,,,,,,,, ( )( ) ( )( )( ),,, 5. 6 5. 6. 1 0-0, 0 (1 ) 0,,,,,,,,, 5. 6. 2,,,,,,,,,,.,,,
6., Accuracy, AUC(Area Under the Curve), Precision, Recall, F, URL,,,, 3,,,,, 3.,,,, 570, 2011. [8] Yuxin Peng, Jia Yao. AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets. Proceedings of the international conference on Multimedia information retrieval. ACM, 2010. [1] WISS2010, 41-46, 2010. [2] Sungho Jeon, Sungchul Kim, and Hwanjo Yu. Don t Be Spoiled by Your Friends: Spoiler Detection in TV Program Tweets. Seventh International AAAI Conference on Weblogs and Social Media, 2013. [3],,, Twitter 19, 138-141, 2013. [4],,,,, Twitter MVE, 110(457), 165-169, 2011. [5], 96 (GN), 2015. [6], SNS 96 (GN), 2015. [7] Samuel Brody, Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 562-