2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and en
|
|
|
- ゆずさ すみい
- 9 years ago
- Views:
Transcription
1 An Introduction to Natural Language Processing: Beyond Statistical Methods 1 Web (natural language) (natural language processing) 1990 Web (morphological analysis) (syntactic analysis) 1980 (semantic analysis) (intention understanding) (language understanding) ACL LDC ELRA Web 3 1
2 2 2.1 (opinion mining) Web 31) (1) ˆˆ) (2) = = = (2) = = = = (3) (2) = = = (3) = = = = (information extraction and structurization) (paraphrase and entailment recognition) 2
3 2.2 (ambiguity) (disambiguation) 1 (natural language analysys) 2.3 #1 go #2 order #3 #4 excellent 1: 3
4 V. (4). Web (5) (information extraction) X Y (6) (6) X Y (information retrieval) (question answering) (machine translation) (7a) (7b) (7c) (7b) X (7b) (7) a. b. 39 c. X 3, 30) m s = c 1... c m w = w 1... w n w i t i t = t 1... t n s c i b i s b = b 1... b m (sequential labeling) w t 4
5 org-b org-i org-i O O O O O 2: 2 ORG-B ORG-I O (dependency analysis) (elipsis or zero-anaphora resolution) 3.2 s w, t Hidden- Markov Model; HMM arg max w,t P (w, t s) = arg max w,t P (w, t) n = arg max w,t i=1 P (w i t i )P (t i t i 1 ) P (w i t i ) P (t i t i 1 ) 4 Conditional Random Fields; CRF 12) (discriminative model) P (w, t s) CRF 1 x y f i (x, y) i = 1,..., n w i P (y x) = exp ( i w if i (x, y)) ŷ exp ( i w if i (x, ŷ)) HMM CRF HMM CRF 97 98% 90% 4 1 feature 5
6 (data sparseness) (semi-supervised learning) (active learning) (named-entity recognition) (bootstrapping) 3.3 / (linguistic knowledge) (world knowledge) 1 (case frame) / / (predicate-argument structure analysis) (thesaurus) 2.1 (2) (3) 6
7 (cooccurrence) 14) 32, 20) (distributional hypothesis) 14) Web Web Web 18) (lexicon) (corpus) 1 JUMAN IPADIC JUMAN 37) IPADIC IPADIC 27) ; thesaurus 35) 28, 29) IPAL EDR Web Wikipedia 24 Wikipedia 7
8 1: URI IPADIC JUMAN IPAL EDR Wikipedia Web a001.htm RWC EDR csj/public/index j.html Web NAIST Text Corpus EDR EDR 20 Web 5 (case frame) 6) ; morphological analyzer ; dependency analyzer ChaSen 38) IPADIC JUMAN MeCab 8
9 10) JUMAN IPADIC IPADIC 33) CaboCha support vector machines; SVM 7, 9) ChaSen MeCab KNP JUMAN 4.3 (suffix array) 13, 39) (String Search) sary SUFARY (double array) 26) (trie) 19) darts 4.2 ChaKi 15) 3 XML XML oxygen XML Editor oxygen Windows, Mac OS X, Linux, Solaris XML Eclipse XML DTD XSLT XQuery 4.4 9
10 3: ChaKi amis 16) libsvm SVM svm light 4) svm light transductive 2, 21) Tree Kernel BACT 11) BACT Decision Stumps Boosting YamCha SVM 8) SVM Lafferty (conditional random fields) 12) MALLET CRF++ CRF++ (marginal probability) CRF++ 10
11 prefixspan 17) prefixspan CloSpan 24) BIDE 22) 1, 25) FREQT gspan 23) 5 3 Web 11
12 2: URI JUMAN ChaSen MeCab KNP CaboCha taku/software/cabocha/ sary (Suffix Array) SUFARY (Suffix Array) yto/tools/sufary/ darts (Double Array) taku/software/darts/ ChaKi oxygen XML maxent amis yusuke/amis/ (feature forests) libsvm SVM ( ) cjlin/libsvm/ svm light SVM (trunsductive ) light/ tree kernel for SVM (tree kernel) svm light bact Boosting taku/software/bact/ YamCha SVM taku/software/yamcha/ MALLET CRF++ taku/software/crf++/ prefixspan taku/software/prefixspan/ FREQT taku/software/freqt/ ILLIMINE (gspan ) ) Web 12
13 NTT 1) K. Abe, S. Kawasoe, T. Asai, H. Arimura and S. Arikawa. Optimized Substructure Discovery for Semi-structured Data, In Proc. of PKDD-2002, p.p. 1 14, ) M. Collins and N. Duffy. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron, In Proc. of ACL-2002, p.p , ) I. Dagan, O. Glickman and B. Magnini. The PASCAL Recognising Textual Entailment Challenge, In Proc. of the PASCAL Challenges Workshop on Recognising Textual Entailment, ) T. Joachims. Making large-scale SVM Learning Practical, In Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, p.p.41 56, ) D. Kawahara and S. Kurohashi. A Fully- Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis, In Proc. of HLT-2006, p.p , ) D. Kawahara and S. Kurohashi. Case Frame Compilation from the Web using High- Performance Computing, In Proc. of LREC- 2006, p.p , ) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis Based on Support Vector Machines, In Proc. of EMNLP/VLC-2000, p.p , ) T. Kudo and Y. Matsumoto. Chunking with Support Vector Machines, In Proc. of NAACL-2001, p.p , ) T. Kudo and Y. Matsumoto. Japanese Dependency Analysis using Cascaded Chunking, In Proc. of CONLL-2002, p.p , ) T. Kudo, K. Yamamoto and Y. Matsumoto. Applying Conditional Random Fields to Japanese Morphological Analysis, In Proc. of EMNLP-2004, p.p , ) T. Kudo and Y. Matsumoto. A Boosting Algorithm for Classification of Semi-Structured Text, In Proc. of EMNLP-2004, p.p , ) J. Lafferty, A. McCallum and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. of ICML-2001, p.p , ) U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches, SIAM Journal on Computing, 22 (5), p.p , ) Y. Matsumoto. Lexical Knowledge Acquisition, The Oxford Handbook of Computational Linguistics, Chapter. 21, p.p , ) Y. Matsumoto, M. Asahara, K. Hashimoto, Y. Tono, A Ohtani and T Morita. An Annotated Corpus Management Tool: ChaKi, In Proc. of LREC-2006, p.p , ) Y. Miyao and J. Tsujii. Maximum Entropy Estimation for Feature Forests, In Proc. of HLT-2002, ) J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth, In. Proc. of ICDE-2001, p.p ,
14 3: Foundations of Statistical Natural Language Processing. C. Manning and H. Schuetze. MIT PRESS. (1999). The Oxford Handbook of Computational Linguistics. R. Mitkov. Oxford Univ. Press. (2003). Handbook of Natural Language Processing. R. Dale, H. Moisl and H. Somers. Marcel Dekker Ltd. (2000). Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Acm Press. (1999). - - (1999). ( 15) ( ) (1996). (1999). (1999). (2005). (2003). ACL (The Association for Computational Linguistics ) ICCL (International Committee on Computational Linguistics) AFNLP (Asia Federation of Natural Language Processing) ( ) Computational Linguistics ACM Transactions on Speech and Language Processing ACM Transactions on Asian Language Information Processing Natural Language Engineering International Journal of Computer Processing of Oriental Languages ACL Anthology ( LDC ELRA GSK LT-world Portal/lr-cat-j.html Portal/ ) K. Shinzato and K. Torisawa, Acquiring Hyponymy Relations from Web Documents, In Proc. of HLT-NAACL-2004, p.p , ) T. A. Standish, Data Structure Technique, Addison-Wesley, Addison-Wesley, Reading, Massachusetts, ) K. Torisawa. Acquiring Inference Rules with Temporal Constraints by Using Japanese Coordinated Sentences and Noun-Verb Cooccurrences In Proc. of HLT-NAACL-2006, p.p , ) S. V. N. Vishwanathan and A. J. Smola. Fast Kernels on Strings and Trees, In Proc. of NIPS-2002, p.p , ) J. Wang, J. Han. BIDE: Efficient Mining of Frequent Closed Sequences, In Proc. of ICDE-2004, p.p.79 90, ) X. Yan and J. Han. gspan: Graph-Based Substructure Pattern Mining, In Proc of ICDM-2002, p.p , ) X. Yan, J. Han and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large 14
15 Datasets, In Proc. of SDM-2003, p.p , ) M. J. Zaki. Efficiently Mining Frequent Trees in a Forest, Proc. of KDD-2002, p.p , ), D, Vol. J71-D, No. 9, p.p , ).., ),,,,,,. version 2.3.3,, ). Suffix Array, Vol. 15, No. 6, p. 1142, ),. ipadic version 2.7.0,, ),,,,,,,.,, ),,,,,,,. CD-ROM,, ),., Vol. 11, No. 5, pp , ),., Vol.13, No.3, pp , ),,.,Vol.45, No.3, pp , ),, NLP-2005, p.p , ),.,, Vol.1, No.1, pp.35 57, ). 14, ),,
¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ
2013 8 18 Table of Contents = + 1. 2. 3. 4. 5. etc. 1. ( + + ( )) 2. :,,,,,, (MUC 1 ) 3. 4. (subj: person, i-obj: org. ) 1 Message Understanding Conference ( ) UGC 2 ( ) : : 2 User-Generated Content [
A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹
A Japanese Word Dependency Corpus 2015 3 18 Special thanks to NTT CS, 1 /27 Bunsetsu? What is it? ( ) Cf. CoNLL Multilingual Dependency Parsing [Buchholz+ 2006] (, Penn Treebank [Marcus 93]) 2 /27 1. 2.
21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G
ol2013-nl-214 No6 1,a) 2,b) n-gram 1 M [1] (TG: Tree ubstitution Grammar) [2], [3] TG TG 1 2 a) ohno@ilabdoshishaacjp b) khatano@maildoshishaacjp [4], [5] [6] 2 Pitman-Yor 3 Pitman-Yor 1 21 Pitman-Yor
untitled
580 26 5 SP-G 2011 AI An Automatic Question Generation Method for a Local Councilor Search System Yasutomo KIMURA Hideyuki SHIBUKI Keiichi TAKAMARU Hokuto Ototake Tetsuro KOBAYASHI Tatsunori MORI Otaru
<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML
DEWS2008 C6-4 XML 606-8501 E-mail: [email protected], {iwaihara,yoshikawa}@i.kyoto-u.ac.jp XML XML XML, Abstract Person Retrieval on XML Documents by Coreference that Uses Structural Features
IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp
1. 1 1 1 2 treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corpus Management Tool: ChaKi Yuji Matsumoto, 1 Masayuki Asahara, 1 Masakazu Iwatate 1 and Toshio Morita 2 This paper
( )
NAIST-IS-MT1051071 2012 3 16 ( ) Pustejovsky 2 2,,,,,,, NAIST-IS- MT1051071, 2012 3 16. i Automatic Acquisition of Qualia Structure of Generative Lexicon in Japanese Using Learning to Rank Takahiro Tsuneyoshi
IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe
1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Speech Visualization System Based on Augmented Reality Yuichiro Nagano 1 and Takashi Yoshino 2 As the spread of the Augmented Reality(AR) technology and service,
[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]
1,a) 1 2 1 12 1 2Type Token 2 1 2 1. 2013 25.1% *1 2012 8 2010 II *2 *3 280 2025 323 65 9.3% *4 10 18 64 47.6 1 Center for the Promotion of Interdisciplinary Education and Research, Kyoto University 2
main.dvi
305 8550 1 2 CREST [email protected] 1 7% 2 2 3 PRIME Multi-lingual Information Retrieval 2 2.1 Cross-Language Information Retrieval CLIR 1990 CD-ROM a. b. c. d. b CLIR b 70% CLIR CLIR 2.2 (b) 2
1 IDC Wo rldwide Business Analytics Technology and Services 2013-2017 Forecast 2 24 http://www.soumu.go.jp/johotsusintokei/whitepaper/ja/h24/pdf/n2010000.pdf 3 Manyika, J., Chui, M., Brown, B., Bughin,
DEIM Forum 2019 H Web 1 Tripadvisor
DEIM Forum 2019 H7-2 163 8677 1 24 2 E-mail: [email protected], [email protected] Web 1 Tripadvisor 1 2 1 1https://www.tripadvisor.com/ 2https://www.jalan.net/kankou/ 1 2 3 4 5 6 7 2 2.
Natural Language Processing Series 1 WWW WWW 1. ii Foundations of Statistical NLPMIT Press 1999 2. a. b. c. 25 3. a. b. Web WWW iii 2. 3. 2009 6 v 2010 6 1. 1.1... 1 1.2... 4 1.2.1... 6 1.2.2... 12 1.2.3...
kut-paper-template.dvi
14 Application of Automatic Text Summarization for Question Answering System 1030260 2003 2 12 Prassie Posum Prassie Prassie i Abstract Application of Automatic Text Summarization for Question Answering
(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h
The Society for Economic Studies The University of Kitakyushu Working Paper Series No. 2011-12 (accepted in March 30, 2012) () (2009b) 19 (2003) 1980 PC 1990 (, 2009) (2001) (2004) KH coder (2009) TinyTextMiner
& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),
.... Deeping and Expansion of Large-Scale Random Fields and Probabilistic Image Processing Kazuyuki Tanaka The mathematical frameworks of probabilistic image processing are formulated by means of Markov
自然言語処理24_705
nwjc2vec: word2vec nwjc2vec nwjc2vec nwjc2vec 2 nwjc2vec 7 nwjc2vec word2vec nwjc2vec: Word Embedding Data Constructed from NINJAL Web Japanese Corpus Hiroyuki Shinnou, Masayuki Asahara, Kanako Komiya
[1] SBS [2] SBS Random Forests[3] Random Forests ii
Random Forests 2013 3 A Graduation Thesis of College of Engineering, Chubu University Proposal of an efficient feature selection using the contribution rate of Random Forests Katsuya Shimazaki [1] SBS
2016
2016 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution
Convolutional Neural Network 2014 3 A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolutional Neural Network Fukui Hiroshi 1940 1980 [1] 90 3
Microsoft PowerPoint - SSII_harada pptx
The state of the world The gathered data The processed data w d r I( W; D) I( W; R) The data processing theorem states that data processing can only destroy information. David J.C. MacKay. Information
( ) ( ) Modified on 2009/05/24, 2008/09/17, 15, 12, 11, 10, 09 Created on 2008/07/02 1 1) ( ) ( ) (exgen Excel VBA ) 2)3) 1.1 ( ) ( ) : : (1) ( ) ( )
() ( ) Modified on 2009/05/24, 2008/09/17, 15, 12, 11, 10, 09 Created on 2008/07/02 1 1) () ( ) (exgen Excel VBA ) 2)3) 1.1 ( ) () : : (1) ( ) ( ) (2) / (1) (= ) (2) (= () =) 4)5) () ( ) () (=) (1) : (
..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i
25 Feature Selection for Prediction of Stock Price Time Series 1140357 2014 2 28 ..,,,,. 2013 1 1 12 31, ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i Abstract Feature Selection for Prediction of Stock Price Time
Vol. 51 No (Mar. 2010) Maximal Marginal Relevance MMR Support Vector Machine SVM feature-based feature-based feature-based Featur
Vol. 51 No. 3 1094 1106 (Mar. 2010) 1 1 2 1 Maximal Marginal Relevance MMR Support Vector Machine SVM feature-based feature-based feature-based Feature-based 3 1 Cue Phrase for important sentences; CP
3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)
(MIRU2012) 2012 8 820-8502 680-4 E-mail: {d kouno,shimada,endo}@pluto.ai.kyutech.ac.jp (1) (2) (3) (4) 4 AdaBoost 1. Kanade [6] CLAFIC [12] EigenFace [10] 1 1 2 1 [7] 3 2 2 (1) (2) (3) (4) 4 4 AdaBoost
WII-D 2017 (1) (2) (1) (2) [Tanaka 07] [ 04] [ 10] [ 13, 13], [ 08] [ 13] (1) (2) 2 2 e.g., Wikipedia [ 14] Wikipedia [ 14] Linked Open
Web 2017 Original Paper Supporting Exploratory Information Access Based on Comic Content Information 1 Ryo Yamashita Byeongseon Park Mitsunori Matsushita Nomura Research Institute, LTD. [email protected]
[1] B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i
1,a) 2,b) 3,c) 1,d) CYK 552 1. 2 ( 1 ) ( 2 ) 1 [1] 2 [2] [3] 1 Graduate School of Information Science, Nagoya University, Japan 2 Information Technology Center, Nagoya University, Japan 3 Information &
自然言語処理21_249
1,327 Annotation of Focus for Negation in Japanese Text Suguru Matsuyoshi This paper proposes an annotation scheme for the focus of negation in Japanese text. Negation has a scope, and its focus falls
27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U
YouTube 2016 2 16 27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM UGC UGC YouTube k-means YouTube YouTube
HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus
HASC2012corpus 1 1 1 1 1 1 2 2 3 4 5 6 7 HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus: Human Activity Corpus and Its Application Nobuo KAWAGUCHI,
untitled
K-Means 1 5 2 K-Means 7 2.1 K-Means.............................. 7 2.2 K-Means.......................... 8 2.3................... 9 3 K-Means 11 3.1.................................. 11 3.2..................................
(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s
1 1 1, Extraction of Transmitted Light using Parallel High-frequency Illumination Kenichiro Tanaka 1 Yasuhiro Mukaigawa 1 Yasushi Yagi 1 Abstract: We propose a new sharpening method of transmitted scene
main.dvi
DEIM Forum 2018 J7-3 305-8573 1-1-1 305-8573 1-1-1 305-8573 1-1-1 () 151-0053 1-3-15 6F URL SVM Identifying Know-How Sites basedonatopicmodelandclassifierlearning Jiaqi LI,ChenZHAO, Youchao LIN, Ding YI,ShutoKAWABATA,
Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]
30 4 2016 3 pp.195-209. 2014 N=23 (S)AdvOV (S)OAdvV 2 N=17 (S)OAdvV 2014 3, 2008 Koizumi 1993 3 MP IP VP 1 MP 2006 2002 195 Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb
Run-Based Trieから構成される 決定木の枝刈り法
Run-Based Trie 2 2 25 6 Run-Based Trie Simple Search Run-Based Trie Network A Network B Packet Router Packet Filtering Policy Rule Network A, K Network B Network C, D Action Permit Deny Permit Network
