2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and en

Size: px

Start display at page:

Download "2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and en"

ゆずさすみい
7 years ago
Views:

1 An Introduction to Natural Language Processing: Beyond Statistical Methods 1 Web (natural language) (natural language processing) 1990 Web (morphological analysis) (syntactic analysis) 1980 (semantic analysis) (intention understanding) (language understanding) ACL LDC ELRA Web 3 1

(morphological analysis) (syntactic analysis) 1980 (semantic analysis)

2 2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and entailment recognition) 2

3 2.2 (ambiguity) (disambiguation) 1 (natural language analysys) 2.3 #1 go #2 order #3 #4 excellent 1: 3

4 V. (4). Web (5) (information extraction) X Y (6) (6) X Y (information retrieval) (question answering) (machine translation) (7a) (7b) (7c) (7b) X (7b) (7) a. b. 39 c. X 3, 30) m s = c 1... c m w = w 1... w n w i t i t = t 1... t n s c i b i s b = b 1... b m (sequential labeling) w t 4

5 org-b org-i org-i O O O O O 2: 2 ORG-B ORG-I O (dependency analysis) (elipsis or zero-anaphora resolution) 3.2 s w, t Hidden- Markov Model; HMM arg max w,t P (w, t s) = arg max w,t P (w, t) n = arg max w,t i=1 P (w i t i )P (t i t i 1 ) P (w i t i ) P (t i t i 1 ) 4 Conditional Random Fields; CRF 12) (discriminative model) P (w, t s) CRF 1 x y f i (x, y) i = 1,..., n w i P (y x) = exp ( i w if i (x, y)) ŷ exp ( i w if i (x, ŷ)) HMM CRF HMM CRF 97 98% 90% 4 1 feature 5

i t i 1 ) P (w i t i ) P (t i t i 1 ) 4 Conditional Random Fields; CRF 12) (discriminative model) P (w, t s) CRF 1 x

6 (data sparseness) (semi-supervised learning) (active learning) (named-entity recognition) (bootstrapping) 3.3 / (linguistic knowledge) (world knowledge) 1 (case frame) / / (predicate-argument structure analysis) (thesaurus) 2.1 (2) (3) 6

3 / (linguistic knowledge) (world knowledge) 1 (case frame) / /

7 (cooccurrence) 14) 32, 20) (distributional hypothesis) 14) Web Web Web 18) (lexicon) (corpus) 1 JUMAN IPADIC JUMAN 37) IPADIC IPADIC 27) ; thesaurus 35) 28, 29) IPAL EDR Web Wikipedia 24 Wikipedia 7

1 (lexicon) (corpus) 1 JUMAN IPADIC JUMAN 37)

8 1: URI IPADIC JUMAN IPAL EDR Wikipedia Web a001.htm RWC EDR csj/public/index j.html Web NAIST Text Corpus EDR EDR 20 Web 5 (case frame) 6) ; morphological analyzer ; dependency analyzer ChaSen 38) IPADIC JUMAN MeCab 8

htm http://www.yomiuri.co.jp/cdrom/etc/oshirase.htm http://www.aozora.gr.jp/ http://genpaku.org/ http://nlp.kuee.kyoto-u.ac.jp/nl-resource/corpus.html RWC EDR http://www.iijnet.or.jp/edr/ http://www2.

9 10) JUMAN IPADIC IPADIC 33) CaboCha support vector machines; SVM 7, 9) ChaSen MeCab KNP JUMAN 4.3 (suffix array) 13, 39) (String Search) sary SUFARY (double array) 26) (trie) 19) darts 4.2 ChaKi 15) 3 XML XML oxygen XML Editor oxygen Windows, Mac OS X, Linux, Solaris XML Eclipse XML DTD XSLT XQuery 4.4 9

3 (suffix array) 13, 39) (String Search) sary SUFARY (double array) 26)

10 3: ChaKi amis 16) libsvm SVM svm light 4) svm light transductive 2, 21) Tree Kernel BACT 11) BACT Decision Stumps Boosting YamCha SVM 8) SVM Lafferty (conditional random fields) 12) MALLET CRF++ CRF++ (marginal probability) CRF++ 10

Stumps Boosting YamCha SVM 8) SVM Lafferty (conditional

11 prefixspan 17) prefixspan CloSpan 24) BIDE 22) 1, 25) FREQT gspan 23) 5 3 Web 11

12 2: URI JUMAN ChaSen MeCab KNP CaboCha taku/software/cabocha/ sary (Suffix Array) SUFARY (Suffix Array) yto/tools/sufary/ darts (Double Array) taku/software/darts/ ChaKi oxygen XML maxent amis yusuke/amis/ (feature forests) libsvm SVM ( ) cjlin/libsvm/ svm light SVM (trunsductive ) light/ tree kernel for SVM (tree kernel) svm light bact Boosting taku/software/bact/ YamCha SVM taku/software/yamcha/ MALLET CRF++ taku/software/crf++/ prefixspan taku/software/prefixspan/ FREQT taku/software/freqt/ ILLIMINE (gspan ) ) Web 12

naist.jp/hiki/chaki/ oxygen XML http://www.oxygenxml.com/ maxent http://maxent.sourceforge.net/ amis http://www-tsujii.is.s.u-tokyo.ac.jp/ yusuke/amis/ (feature forests) libsvm SVM ( ) http://www.

13 NTT 1) K. Abe, S. Kawasoe, T. Asai, H. Arimura and S. Arikawa. Optimized Substructure Discovery for Semi-structured Data, In Proc. of PKDD-2002, p.p. 1 14, ) M. Collins and N. Duffy. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, In Proc. of ACL-2002, p.p , ) I. Dagan, O. Glickman and B. Magnini. The PASCAL Recognising Textual Entailment Challenge, In Proc. of the PASCAL Challenges Workshop on Recognising Textual Entailment, ) T. Joachims. Making large-scale SVM Learning Practical, In Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, p.p.41 56, ) D. Kawahara and S. Kurohashi. A Fully- Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis, In Proc. of HLT-2006, p.p , ) D. Kawahara and S. Kurohashi. Case Frame Compilation from the Web using High- Performance Computing, In Proc. of LREC- 2006, p.p , ) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis Based on Support Vector Machines, In Proc. of EMNLP/VLC-2000, p.p , ) T. Kudo and Y. Matsumoto. Chunking with Support Vector Machines, In Proc. of NAACL-2001, p.p , ) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis using Cascaded Chunking, In Proc. of CONLL-2002, p.p , ) T. Kudo, K. Yamamoto and Y. Matsumoto. Applying Conditional Random Fields to Japanese Morphological Analysis, In Proc. of EMNLP-2004, p.p , ) T. Kudo and Y. Matsumoto. A Boosting Algorithm for Classification of Semi-Structured Text, In Proc. of EMNLP-2004, p.p , ) J. Lafferty, A. McCallum and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. of ICML-2001, p.p , ) U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches, SIAM Journal on Computing, 22 (5), p.p , ) Y. Matsumoto. Lexical Knowledge Acquisition, The Oxford Handbook of Computational Linguistics, Chapter. 21, p.p , ) Y. Matsumoto, M. Asahara, K. Hashimoto, Y. Tono, A Ohtani and T Morita. An Annotated Corpus Management Tool: ChaKi, In Proc. of LREC-2006, p.p , ) Y. Miyao and J. Tsujii. Maximum Entropy Estimation for Feature Forests, In Proc. of HLT-2002, ) J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth, In. Proc. of ICDE-2001, p.p ,

The PASCAL Recognising Textual Entailment Challenge, In Proc. of the PASCAL Challenges Workshop on Recognising Textual Entailment, 2005. 4) T. Joachims.

14 3: Foundations of Statistical Natural Language Processing. C. Manning and H. Schuetze. MIT PRESS. (1999). The Oxford Handbook of Computational Linguistics. R. Mitkov. Oxford Univ. Press. (2003). Handbook of Natural Language Processing. R. Dale, H. Moisl and H. Somers. Marcel Dekker Ltd. (2000). Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Acm Press. (1999). - - (1999). ( 15) ( ) (1996). (1999). (1999). (2005). (2003). ACL (The Association for Computational Linguistics ) ICCL (International Committee on Computational Linguistics) AFNLP (Asia Federation of Natural Language Processing) ( ) Computational Linguistics ACM Transactions on Speech and Language Processing ACM Transactions on Asian Language Information Processing Natural Language Engineering International Journal of Computer Processing of Oriental Languages ACL Anthology ( LDC ELRA GSK LT-world Portal/lr-cat-j.html Portal/ ) K. Shinzato and K. Torisawa, Acquiring Hyponymy Relations from Web Documents, In Proc. of HLT-NAACL-2004, p.p , ) T. A. Standish, Data Structure Technique, Addison-Wesley, Addison-Wesley, Reading, Massachusetts, ) K. Torisawa. Acquiring Inference Rules with Temporal Constraints by Using Japanese Coordinated Sentences and Noun-Verb Cooccurrences In Proc. of HLT-NAACL-2006, p.p , ) S. V. N. Vishwanathan and A. J. Smola. Fast Kernels on Strings and Trees, In Proc. of NIPS-2002, p.p , ) J. Wang, J. Han. BIDE: Efficient Mining of Frequent Closed Sequences, In Proc. of ICDE-2004, p.p.79 90, ) X. Yan and J. Han. gspan: Graph-Based Substructure Pattern Mining, In Proc of ICDM-2002, p.p , ) X. Yan, J. Han and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large 14

( 15) ( ) (1996). (1999). (1999). (2005). (2003).

15 Datasets, In Proc. of SDM-2003, p.p , ) M. J. Zaki. Efficiently Mining Frequent Trees in a Forest, Proc. of KDD-2002, p.p , ), D, Vol. J71-D, No. 9, p.p , ).., ),,,,,,. version 2.3.3,, ). Suffix Array, Vol. 15, No. 6, p. 1142, ),. ipadic version 2.7.0,, ),,,,,,,.,, ),,,,,,,. CD-ROM,, ),., Vol. 11, No. 5, pp , ),., Vol.13, No.3, pp , ),,.,Vol.45, No.3, pp , ),, NLP-2005, p.p , ),.,, Vol.1, No.1, pp.35 57, ). 14, ),,

ipadic version 2.7.0,, 2003. 28),,,,,,,.,, 1997. 29),,,,,,,. CD-ROM,, 1999. 30),., Vol. 11, No. 5, pp. 151 198, 2004. 31),., Vol.13, No.3, pp.

( : A9TB2096)

2012 2013 3 31 ( : A9TB2096) Twitter i 1 1 1.1........................................... 1 1.2........................................... 1 2 4 2.1................................ 4 2.2...............................