2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and en



Similar documents
¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

untitled

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

( )

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]

main.dvi


DEIM Forum 2019 H Web 1 Tripadvisor


kut-paper-template.dvi

(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

自然言語処理24_705

[1] SBS [2] SBS Random Forests[3] Random Forests ii

2016

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

Microsoft PowerPoint - SSII_harada pptx

( ) ( ) Modified on 2009/05/24, 2008/09/17, 15, 12, 11, 10, 09 Created on 2008/07/02 1 1) ( ) ( ) (exgen Excel VBA ) 2)3) 1.1 ( ) ( ) : : (1) ( ) ( )

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

Vol. 51 No (Mar. 2010) Maximal Marginal Relevance MMR Support Vector Machine SVM feature-based feature-based feature-based Featur

3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)

WII-D 2017 (1) (2) (1) (2) [Tanaka 07] [ 04] [ 10] [ 13, 13], [ 08] [ 13] (1) (2) 2 2 e.g., Wikipedia [ 14] Wikipedia [ 14] Linked Open

[1] B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i

自然言語処理21_249

27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

untitled

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

main.dvi

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

Run-Based Trieから構成される 決定木の枝刈り法

Transcription:

An Introduction to Natural Language Processing: Beyond Statistical Methods 1 Web (natural language) (natural language processing) 1990 Web (morphological analysis) (syntactic analysis) 1980 (semantic analysis) (intention understanding) (language understanding) 2 3 4 5 ACL LDC ELRA Web 3 1

2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and entailment recognition) 2

2.2 (ambiguity) (disambiguation) 1 (natural language analysys) 2.3 #1 go #2 order #3 #4 excellent 1: 3

V. (4). Web (5) (information extraction) X Y (6) (6) X Y (information retrieval) (question answering) (machine translation) (7a) (7b) (7c) (7b) X (7b) (7) a. b. 39 c. X 3, 30) 3 3.1 1 m s = c 1... c m w = w 1... w n w i t i t = t 1... t n s c i b i s b = b 1... b m (sequential labeling) w t 4

org-b org-i org-i O O O O O 2: 2 ORG-B ORG-I O (dependency analysis) (elipsis or zero-anaphora resolution) 3.2 s w, t Hidden- Markov Model; HMM arg max w,t P (w, t s) = arg max w,t P (w, t) n = arg max w,t i=1 P (w i t i )P (t i t i 1 ) P (w i t i ) P (t i t i 1 ) 4 Conditional Random Fields; CRF 12) (discriminative model) P (w, t s) CRF 1 x y f i (x, y) i = 1,..., n w i P (y x) = exp ( i w if i (x, y)) ŷ exp ( i w if i (x, ŷ)) HMM CRF HMM CRF 97 98% 90% 4 1 feature 5

(data sparseness) (semi-supervised learning) (active learning) 1. 2. 3. 4. 2 3 (named-entity recognition) (bootstrapping) 3.3 / (linguistic knowledge) (world knowledge) 1 (case frame) / / (predicate-argument structure analysis) (thesaurus) 2.1 (2) (3) 6

(cooccurrence) 14) 32, 20) (distributional hypothesis) 14) Web Web Web 18) 4 4.1 (lexicon) (corpus) 1 JUMAN IPADIC JUMAN 37) IPADIC IPADIC 27) ; thesaurus 35) 28, 29) IPAL EDR Web Wikipedia 24 Wikipedia 7

1: URI IPADIC http://chasen.naist.jp/hiki/chasen/ JUMAN http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html http://www.kokken.go.jp/katsudo/kanko/data/ IPAL EDR http://www.iijnet.or.jp/edr/ Wikipedia Web http://download.wikimedia.org/ http://www.nichigai.co.jp/sales/mainichi/mainichi-data.html http://www.kinokuniya.co.jp/02f/d13/2 13a001.htm http://sub.nikkeish.co.jp/gengo/zenbun.htm http://www.yomiuri.co.jp/cdrom/etc/oshirase.htm http://www.aozora.gr.jp/ http://genpaku.org/ http://nlp.kuee.kyoto-u.ac.jp/nl-resource/corpus.html RWC EDR http://www.iijnet.or.jp/edr/ http://www2.kokken.go.jp/ csj/public/index j.html Web http://nlp.kuee.kyoto-u.ac.jp/nl-resource/caseframe.html NAIST Text Corpus http://cl.naist.jp/nldata/corpus/ 4 5000 EDR EDR 20 Web 5 (case frame) 6) 4.2 4.1 ; morphological analyzer ; dependency analyzer ChaSen 38) IPADIC JUMAN MeCab 8

10) JUMAN IPADIC IPADIC 33) CaboCha support vector machines; SVM 7, 9) ChaSen MeCab KNP JUMAN 4.3 (suffix array) 13, 39) (String Search) sary SUFARY (double array) 26) (trie) 19) darts 4.2 ChaKi 15) 3 XML XML oxygen XML Editor oxygen Windows, Mac OS X, Linux, Solaris XML Eclipse XML DTD XSLT XQuery 4.4 9

3: ChaKi amis 16) libsvm SVM svm light 4) svm light transductive 2, 21) Tree Kernel BACT 11) BACT Decision Stumps Boosting YamCha SVM 8) SVM Lafferty (conditional random fields) 12) MALLET CRF++ CRF++ (marginal probability) CRF++ 10

prefixspan 17) prefixspan CloSpan 24) BIDE 22) 1, 25) FREQT gspan 23) 5 3 Web 11

2: URI JUMAN http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html ChaSen http://chasen.naist.jp/hiki/chasen/ MeCab http://mecab.sourceforge.jp/ KNP http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html CaboCha http://chasen.org/ taku/software/cabocha/ sary (Suffix Array) http://sary.sourceforge.net/ SUFARY (Suffix Array) http://nais.to/ yto/tools/sufary/ darts (Double Array) http://www.chasen.org/ taku/software/darts/ ChaKi http://chasen.naist.jp/hiki/chaki/ oxygen XML http://www.oxygenxml.com/ maxent http://maxent.sourceforge.net/ amis http://www-tsujii.is.s.u-tokyo.ac.jp/ yusuke/amis/ (feature forests) libsvm SVM ( ) http://www.csie.ntu.edu.tw/ cjlin/libsvm/ svm light SVM (trunsductive ) http://www.cs.cornell.edu/people/tj/svm light/ tree kernel for SVM (tree kernel) http://ai-nlp.info.uniroma2.it/moschitti/tree-kernel.htm svm light bact Boosting http://chasen.org/ taku/software/bact/ YamCha SVM http://chasen.org/ taku/software/yamcha/ MALLET http://mallet.cs.umass.edu/ CRF++ http://www.chasen.org/ taku/software/crf++/ prefixspan http://chasen.org/ taku/software/prefixspan/ FREQT http://chasen.org/ taku/software/freqt/ ILLIMINE (gspan ) http://illimine.cs.uiuc.edu/download/ 3 2.3 36) Web 12

NTT 1) K. Abe, S. Kawasoe, T. Asai, H. Arimura and S. Arikawa. Optimized Substructure Discovery for Semi-structured Data, In Proc. of PKDD-2002, p.p. 1 14, 2002. 2) M. Collins and N. Duffy. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, In Proc. of ACL-2002, p.p. 263 270, 2002. 3) I. Dagan, O. Glickman and B. Magnini. The PASCAL Recognising Textual Entailment Challenge, In Proc. of the PASCAL Challenges Workshop on Recognising Textual Entailment, 2005. 4) T. Joachims. Making large-scale SVM Learning Practical, In Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, p.p.41 56, 1999. 5) D. Kawahara and S. Kurohashi. A Fully- Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis, In Proc. of HLT-2006, p.p. 176 183, 2006. 6) D. Kawahara and S. Kurohashi. Case Frame Compilation from the Web using High- Performance Computing, In Proc. of LREC- 2006, p.p. 1344 1347, 2006. 7) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis Based on Support Vector Machines, In Proc. of EMNLP/VLC-2000, p.p. 18 25, 2000. 8) T. Kudo and Y. Matsumoto. Chunking with Support Vector Machines, In Proc. of NAACL-2001, p.p. 192 199, 2001. 9) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis using Cascaded Chunking, In Proc. of CONLL-2002, p.p. 63 69, 2002. 10) T. Kudo, K. Yamamoto and Y. Matsumoto. Applying Conditional Random Fields to Japanese Morphological Analysis, In Proc. of EMNLP-2004, p.p. 230 237, 2004. 11) T. Kudo and Y. Matsumoto. A Boosting Algorithm for Classification of Semi-Structured Text, In Proc. of EMNLP-2004, p.p. 301 308, 2004. 12) J. Lafferty, A. McCallum and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. of ICML-2001, p.p. 282 289, 2001. 13) U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches, SIAM Journal on Computing, 22 (5), p.p. 935 948, 1993. 14) Y. Matsumoto. Lexical Knowledge Acquisition, The Oxford Handbook of Computational Linguistics, Chapter. 21, p.p. 395 413, 2005. 15) Y. Matsumoto, M. Asahara, K. Hashimoto, Y. Tono, A Ohtani and T Morita. An Annotated Corpus Management Tool: ChaKi, In Proc. of LREC-2006, p.p.1418 1421, 2006. 16) Y. Miyao and J. Tsujii. Maximum Entropy Estimation for Feature Forests, In Proc. of HLT-2002, 2002. 17) J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth, In. Proc. of ICDE-2001, p.p. 215 224, 2001. 13

3: Foundations of Statistical Natural Language Processing. C. Manning and H. Schuetze. MIT PRESS. (1999). The Oxford Handbook of Computational Linguistics. R. Mitkov. Oxford Univ. Press. (2003). Handbook of Natural Language Processing. R. Dale, H. Moisl and H. Somers. Marcel Dekker Ltd. (2000). Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Acm Press. (1999). - - (1999). ( 15) ( ) (1996). (1999). (1999). (2005). (2003). ACL (The Association for Computational Linguistics ) ICCL (International Committee on Computational Linguistics) AFNLP (Asia Federation of Natural Language Processing) ( ) Computational Linguistics ACM Transactions on Speech and Language Processing ACM Transactions on Asian Language Information Processing Natural Language Engineering International Journal of Computer Processing of Oriental Languages ACL Anthology ( http://acl.ldc.upenn.edu/ LDC ELRA GSK LT-world http://nlp.kuee.kyoto-u.ac.jp/nlp Portal/lr-cat-j.html http://cl.naist.jp/ http://www.ldc.upenn.edu/ http://www.elra.info/ http://www.gsk.or.jp/ http://nlp.kuee.kyoto-u.ac.jp/nlp Portal/ http://www.ai-gakkai.or.jp/jsai/journal/mybookmark/ http://pub.bookmark.ne.jp/nlp/ http://www.lt-world.org/ 18) K. Shinzato and K. Torisawa, Acquiring Hyponymy Relations from Web Documents, In Proc. of HLT-NAACL-2004, p.p. 73 80, 2004 19) T. A. Standish, Data Structure Technique, Addison-Wesley, Addison-Wesley, Reading, Massachusetts, 1980. 20) K. Torisawa. Acquiring Inference Rules with Temporal Constraints by Using Japanese Coordinated Sentences and Noun-Verb Cooccurrences In Proc. of HLT-NAACL-2006, p.p. 57 64, 2006. 21) S. V. N. Vishwanathan and A. J. Smola. Fast Kernels on Strings and Trees, In Proc. of NIPS-2002, p.p. 585 592, 2003. 22) J. Wang, J. Han. BIDE: Efficient Mining of Frequent Closed Sequences, In Proc. of ICDE-2004, p.p.79 90, 2004. 23) X. Yan and J. Han. gspan: Graph-Based Substructure Pattern Mining, In Proc of ICDM-2002, p.p. 721 724, 2002. 24) X. Yan, J. Han and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large 14

Datasets, In Proc. of SDM-2003, p.p. 166 177, 2003. 25) M. J. Zaki. Efficiently Mining Frequent Trees in a Forest, Proc. of KDD-2002, p.p. 71 80, 2002. 26), D, Vol. J71-D, No. 9, p.p.1592 1600, 1988. 37).., 1992. 38),,,,,,. version 2.3.3,, 2003. 39). Suffix Array, Vol. 15, No. 6, p. 1142, 2000. 27),. ipadic version 2.7.0,, 2003. 28),,,,,,,.,, 1997. 29),,,,,,,. CD-ROM,, 1999. 30),., Vol. 11, No. 5, pp. 151 198, 2004. 31),., Vol.13, No.3, pp.201 241, 2006. 32),,.,Vol.45, No.3, pp.919 933, 2004. 33),, NLP-2005, p.p. 592 595, 2005. 34),.,, Vol.1, No.1, pp.35 57, 1994. 35). 14, 2004. 36),, 2006. 15