* Tsunenori ISHIOKA * (Research Division, the National Center for University Entrance Examinations; Komaba, Meguro-ku, Tokyo

Similar documents
[1], B0TB2053, i

<4D F736F F D2090CE89AA90E690B F18D908F912E646F63>

@08470030ヨコ/篠塚・窪田 221号



2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

11_寄稿論文_李_再校.mcd

‰gficŒõ/’ÓŠ¹

% 95% 2002, 2004, Dunkel 1986, p.100 1

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

1 1 tf-idf tf-idf i

201/扉

udc-4.dvi

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

TALC Teaching and Language Corpora Wichmann et al. ; Kettemann & Marko ; Burnard & McEnery ; Aston ; Hunston ; Granger et al. ; Tan ; Sinclair ; Aston


(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h

自然言語処理21_249

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

パーソナリティ研究 2005 第13巻 第2号 170–182

66-1 田中健吾・松浦紗織.pwd




1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

( )

udc-2.dvi

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

( ) ( ) Modified on 2009/05/24, 2008/09/17, 15, 12, 11, 10, 09 Created on 2008/07/02 1 1) ( ) ( ) (exgen Excel VBA ) 2)3) 1.1 ( ) ( ) : : (1) ( ) ( )

NO95-1_62173.pdf

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]


自然言語処理16_2_45



,,,,., C Java,,.,,.,., ,,.,, i

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

untitled

e-learning station 1) 2) 1) 3) 2) 2) 1) 4) e-learning Station 16 e-learning e-learning key words: e-learning LMS CMS A Trial and Prospect of Kumamoto

大学論集第42号本文.indb

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

100 SDAM SDAM Windows2000/XP 4) SDAM TIN ESDA K G G GWR SDAM GUI

ハイコミットメントモデルの有効性についての考察 

PMI2005北米大会報告書

* 1 e CD-ROM e e e 3 e e e CD-ROM DVD CBT(Computer Based Training) e 2002 e e electronic( ) WBT Web Based Training on-demand IT e e 1 y

kut-paper-template.dvi

日本人の子育て観-JGSS-2008 データに見る社会の育児能力に対する評価-

56 56 The Development of Preschool Children s Views About Conflict Resolution With Peers : Diversity of changes from five-year-olds to six-year-olds Y

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan


272 11) 12) 1 Barrera 13) 1fl social embeddedness 2fl perceived support 3fl enacted support 3 14) 15) 3 2fl 13) 16;17) 1 14;15;18 21) 2 22;23) 4 24;25

【教】⑩山森直人先生【本文】/【教】⑩山森直人先生【本文】

2016

Japanese Journal of Family Sociology, 29(1): (2017)

学位研究17号

WII-D 2017 (1) (2) (1) (2) [Tanaka 07] [ 04] [ 10] [ 13, 13], [ 08] [ 13] (1) (2) 2 2 e.g., Wikipedia [ 14] Wikipedia [ 14] Linked Open

untitled

202

A Study of Effective Application of CG Multimedia Contents for Help of Understandings of the Working Principles of the Internal Combustion Engine (The

JAPAN MARKETING JOURNAL 122 Vol.31 No.22011

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

26 Development of Learning Support System for Fixation of Basketball Shoot Form

2

2 251 Barrera, 1986; Barrera, e.g., Gottlieb, 1985 Wethington & Kessler 1986 r Cohen & Wills,

3_39.dvi

Maynard Zimmerman Maynard & Zimmerman Maynard & Zimmerman Maynard & Zimmerman

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

橡LET.PDF


SEJulyMs更新V7

専門力_総合力

:

Characteristics of WPPSI Intelligence Test Profiles of Hearing-Impaired Children Tsutomu Uchiyama, Ryoko Ijuin and Hiroko Tokumitsu Abstract: We analy

01年譜ほか.indd

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

10_村井元_0227.indd

untitled

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ


MYP 5) MYP MYP IB IB45 6) IB MYP areas of interaction global context unit planner 7) 1 IB DP 8) MYP MYP 9) MYP MYP PYP 10) Jefferey Jones IB IB 11) IB


本文/YAZ325T


DEIM Forum 2009 E

(a) (b) 1 JavaScript Web Web Web CGI Web Web JavaScript Web mixi facebook SNS Web URL ID Web 1 JavaScript Web 1(a) 1(b) JavaScript & Web Web Web Webji


Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

Admissions Assistance Office

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

IPSJ SIG Technical Report Vol.2010-CVIM-170 No /1/ Visual Recognition of Wire Harnesses for Automated Wiring Masaki Yoneda, 1 Ta

MDD PBL ET 9) 2) ET ET 2.2 2), 1 2 5) MDD PBL PBL MDD MDD MDD 10) MDD Executable UML 11) Executable UML MDD Executable UML

IT,, i

経済論集 44‐1(よこ)/2.李

日立評論2007年3月号 : ソフトウェア開発への

Holcombe Sidman & Tailby ABC A B B C B AA C

新製品開発プロジェクトの評価手法

Transcription:

Latest Trends in Automated Essay-Scoring Systems 1

* Tsunenori ISHIOKA * 153-8501 2-19-23 (Research Division, the National Center for University Entrance Examinations; 2-19-23 Komaba, Meguro-ku, Tokyo 153-8501, Japan. E-mail: tunenori@rd.dnc.ac.jp) 2

Abstract With the aim of removing human errors and providing critical feedback and suggestions for improvement, considerable research has be done on computer-based automated essayscoring systems. Examples of these include e-rater, PEG, IEA, IntelliMetric, and BETSY. This paper summarizes how these systems work in an attempt to comprehend their features. They are also compared. An automated Japanese essay-scoring system named Jess is introduced, including our analysis of its performance. Lastly, difficulties caused by its treatment of Japanese passages and related problems are discussed. 3

1 1.1 1960 Page(1966) Page Project Essay Grade, PEG ( ) PEG 0.78 0.85 ( ) (uncommon) Page proxies PEG PEG (contents) (Organization) (Style) 1980 Writers Workbench (WWB) (readability) WWB WWB 1 WWB NTT REVISE(, 1987) VOICE-TWIN(, 1993) COMET(, 1986) St.WORDS (, 1992) FleCS (, 1992),,, VOICE-TWIN,,,, /,,,. /, 1990 (Natural Language Processing, NLP) (Information Retrieval, IR) Graduate Management Admission Test, GMAT Analytical Writing Assessment, AWA 4

(syntax variety), (topic content), (organization of idea) Jill Burstein ETS 3 NLP IR NLP IR GMAT AWA 2 e-rater 400 6 2 10% 2 e-rater (Burstein, Kukich, Wolff, Lu, Chodorow, Braden-Harder, & Harris, 1998) PEG (Page, 1994) " Landauer TREC(Text REtrieval Conference) Latent Semantic Analysis Intelligent Essay Assessor, IEA (Foltz, Laham, & Landauer, 1999) IEA ( ) 3 15 3,296 2 0.86 IEA 0.85 IEA 0.83, 0.68, 0.66 2000 BETSY (Rudner & Liang, 2002) IntelliMetric (Elliot, 2003) Jess (, 2003b) ( ) ( ; ) ( ) (corpus, pl. corpora) (, rawcorpus) (, tagged corpus;, analyzed corpus) 5

(1999) N (1996) 600 1.2 (Keith, 1998; Page, Lavoie, & Keith, 1996) (Myford & Cline, 2002) Bennet & Bejar (1998) (rubric) (acceptable) 1 2 3 ( ) ( ) Bereiter(2003) ( ) ( ) Fridman 1980 (Bereiter) (rater) Jess e-rater (Kukich, 2000) 1 ( I concentrates", "this conclusions" ) 6

(pollution) N (, 1999 ) / N =20 ALEK(Assessment of Lexical Knowledge) 79% / ( ) ( ) ( ) 2 (centering theory) (rough-shift) (Grosz, Joshi, & Weinstein, 1995) (continue) > (retain) > (smooth-shift) > (rough-shift) 100 AWA 2 ( ) ( ) ( ) Calfee (2000) WWB 1.3 Shermis(2002) 3 1 1492 2 7

2 Electronic Essay Rater, e-rater (Burstein et al., 1998), Project Essay Grade, PEG (Page, Poggio, & Keith, 1997), Intelligent Essay Assessor, IEA (Foltz et al, 1999), IntelliMetric (Elliot, 2003), Bayesian Essay Test Scoring system, BETSY (Rudner et al., 2002),. 3 (, ), Jess (, 2003b). 4,,. 2 2.1 Electronic Essay Rater, e-rater ( ) Graduate Management Admission Test, GMAT,,. Educational Testing Service, ETS Burstein, 2000 ETS Technologies,. E-rater GMAT,.,, 6 2.. 1 4. e-rater 1, Burstein&Wolska(2003) 97%. Burstein et al.(1998), 89%,,. E-rater 3. (Structure):,,. MSNLP(2004),,,,., (would, could, should, might, may).,,, 8

, 1. (Organization):.,., (discourse).,., e-rater cue word (Quirk, Greenbaum, Leech, & Svartvik, 1985)., In summary" In conclusion", perhaps" possibly",. this" these",.., (APA, Annotation Program).., e-rater,,. (Contents):.,,.,,. e-rater,, 1 6,,., TF(Term Frequency), IDF(Inverse Document Frequency) TF IDF., 1 6,.,.,,,. E-rater,.,,, 57.,, 8-12., 8-12,, 75. 75,. 1. 9

2. 3. 4. 5. 6. 7. (complement clause) 8. (summary words) 9. (detail words) 10. (rhetorical words),,.,., e-rater http://www.ets.org/research/erater. html. Burstein et al.(1998) Kukichi(2000) e- rater (2001). E-rater Powers, Burstein, Chodorow, Fowles, & Kukich (2000) 2 GPA / 9 ( e-rater ) GRE N =721ο890 e-rater 5% GRE 6 5 6 1 2 e-rater 9 E-rater, Critique (Critique Writing Analysis Tool) Criterion (Online Essay Evaluation Service). E-rater ( ), Critique,,,,,,. Critique, Burstein(2003),, http://www.ets.org/critique/research.html., ETS Criterion, c-rater (short answer), (free answer) (conceptional information).,. 2.2 Project Essay Grade, PEG, Page (Page,1966). PEG, SAT. PEG, 10

.,. Page, 2 ( ). trins,,,, trinsic., proxes. trins (approximation), trins. ( ) proxes., trins, proxes (Page, 1994). tris, proxes. proxes,., Microsoft WORD Grammatik-5,,,., (proxes ) Page. 1994 26,, 4,,. 26 PEG 0.8.,, 5. 1. Content 2. Organization 3. Style 4. Mechanics 5. Creativity PEG (Page,1966), 2. 1993 PEG, http://134.68.49.185/pegdemo/ref.asp. 2.3 Intelligent Essay Assessor, IEA Landauer Foltz http://lsa.colorado.edu/, Knowledge Analysis Technologies (K-A-T)., IEA, K-A-T. IEA,,. IEA, Deerwester, Dumais, Furnas, Landauer, & Harshman (1990) Latent Semantic Analysis ( Latent Semantic Indexing, LSI )., ( bag of words" ). IEA LSI t d X(t, d ) X = T 0 S 0 D 0 0. (coccurrence) 11

X - T 0 D 0, T 0T 0 0 = T 0 T 0 0 = I t D 0 D 0 0 = D 0 D 0 0 = I d., I t I d t, d. 0» d» t. 0, S 0. S 0 k, S., T 0 D 0 k, T D., bx = TSD 0 (1), X b X. T t k, S k k, D 0 k d., X 0 X, (1), TS, D 0., T 1, SD 0. ( Bellcore, ENCY 56; 530 25; 629 NEWS 35; 796 19; 660), Deerwester et al.(1990) k 50 ο 100, IEA 100 ο 300. 1 ( ) e, t x e,, D 1 k d e = x 0 ets 1. q k d q., r(d e ;d q ),. r(d e ;d q )= (d e;d q ) (2) kd e kkd q k, k k. (2) d e d q,, (2) (cosine similarity)., r(d e ;d q ) r(x e ;x q ) TF(term frequency) (Luhn,1957). TF, (inverse document frequency) Jones(1972) IDF TF IDF, ( Allan, Carbonell, Doddington, Yamron, &Yang, 1998 ). (e-rater ) TF IDF. IEA, e, IEA 12

10, min(kd e k kd q k) 10 Landauer 20 http://lsa.colorado. edu/paper.html pdf Foltz et al.(1999) IEA LSI (Landauer, Laham, & Foltz, 2003), 3 ffl Content: ffl Style: ffl Mechanics: Overall( ) A-D, F Landauer et al.(2003) GMAT N =2; 263 0.86 IEA (single raters) 0.85 N =1; 033 0.75 IEA 0.73 IEA (N =2; 263) 0.88 (N =1; 033) 0.78 (N =3; 394) IEA 0.85 3 IEA 0.83 0.68, 0.66 3 70% 80% 10% 20% 11% Chung & O'Neil (1997) IEA K-A-T IEA,. 1 (prompt) 300 500, 20 30, IEA 1 100.,. Foltz et al.(1999), 20. OS 1999 e-rater Sun Workstation, Ultra-2, Solaris, 137MHz 2 LSI, Web,., Jess LSI, Intel Pentium III, 800MHz, RedHat7.2 0.5.. LSI 1999, 2001 (, 1999b). (Jess) 2002 (, 2002) 13

LSI,,.,. 2.4 IntelliMetric / Vantage Learning,. 1991 Vantage R & D, 1995,,,, 1997 7,. 1997 11 IntelliMetric, 1998 2., Vantage Learning,. http://www.intellimetric.com/demosite/demo.html /. 11 (10 million dollars)., Vantage Learning,.,,.,. Vantage Learning (CogniSearch), (Quantum Reasoning), (IntelliMetric),,.,. (decision tree; Berry & Linoff, 1997 ) CART (Classification and regression trees), CHAID(Chi-squared automatic interaction detection) C4.5 C5.0 IntelliMetric,, 5. ffl Focus & Meaning:,. ffl Development & Content: ffl Organization: ffl Language Use & Style:, ffl Mechanics & Conventions:, 1 6, 6.,, 4,,, 1 4 4 (Sepos, 2000)., 72 (Features).,,. Vantage Learning IntelliMetric, 2. 14

1. 2, IntelliMetric,. 1,202 6, 2 1 95%, IntelliMetric 99%. 2.,.,. IntelliMetric,,. Vantage Learning, 1., IntelliMetric., /,,. IntelliMetric,,,. Writing Ability t, i y i, e i y i = t + e i, IntelliMetric e i.,, 2..,,. Elliot(2003) ffl 300 ( 50) ffl 6 1 6 20 ffl 2,,,.,. 2001, (Eleanor Chute) IntelliMetric, 6 4, (Chute, 2001)., (Chief Operating Officer) Scott Elliott, 3% 7%, (too unusual to grade).,. IntelliMetric Elliot(1999) 7 (International, Age7), 11, 14 (External Measures of Writing) IntelliMetric 2 IntelliMetric 0.60 0.64 IntelliMetric 15

0.58 0.60 Elliot ( ) 2.5 Beyesian Essay Test Scoring system, BETSY BETSY Rudner, (Rudner et al., 2002).,, 4 6,.,, (Appropriate), (Partial), (Inappropriate) 3., 3., / /. P i (u i =1jA); P i (u i =1jR); P i (u i =1jI). i,, u i. A; R; I, / /.., : P i (u i =1jA) =0:7; : P i (u i =1jR) =0:6; : P i (u i =1jI) =0:1.,, / /. Ability,, P (A) =P (R) =P (I) =0:33., P (A);P(R);P(I). P (Aju i =1)=P (u i =1jA) Λ P (A)=P (u i =1). P (Aju i =1)=0:7 0:33=P (u i =1)=0:233=P (u i =1). P (Rju i =1)=0:6 0:33=P (u i =1)=0:200=P (u i =1); P (Rju i =1)=0:1 0:33=P (u i = 1) = 0:033=P (u i = 1). P (u i = 1), P (Aju i = 1) = 0:233=(0:233 + 0:200 + 0:033) =0:5; P (Rju i =1)= 0:200=(0:233+0:200+0:033) = 0:429; P (Iju i =1)=0:033=(0:233+0:200+0:033) = 0:071., (I)., P (A); P (R); P (I).. 2 (McCallum & Nigam, 1998) Bernoulli, d i c j P (d i jc j )= VY t=1 [B it P (w t jc j )+(1 B it )(1 P (w t jc j ))]., V, B it 2 (0; 1) t i. P (w t jc j ) w t c j 1 16

P (w t jc j )= 1+ D X j i=1 J + D j D j c j J multinomial,. P (d i jc j )= VY t=1 B it P (w t jc j ) N it N it, w t i. P (w t jc j ) w t c j P (w t jc j )= 1+ V + N it! D X j N it i=1 XD: N it D: (unigram) " Mitchell(1997) BETSY 2 462 80 ( 40 ) 80 64 (80%), BETSY. 2.6 (holistic) 2 e-rater, PEG 10 IEA IntelliMetric BETSY, IntelliMetric BETSY ( ) A,B,C,D,E,F 6 17 1=1

A F A B C A C A C B Rudner & Gagne (2001) PEG, e-rater, IEA IEA e-rater PEG (writing quality) 0.75 0.85 ( Page, 1996 Page, 1997) Chung et al.(1997) (IEA) Shermis(2002) (holistic score) 1 (single raters) ( ) 500 1,000 (descriptive) Elliot(2003) IntelliMetric 2 ( e- rater PEG ) IntelliMetric 2 6 1, Wresch(1993), 1. 2, 3. 4,. 5, BETSY e-rater PEG / IEA PEG IntelliMetric BETSY 18

1: (Wresch,1993 ) ( ) e-rater,, tricked" Powers et al, (2000) PEG,,,, / IEA,, LSI / IntelliMetric,,,, BETSY ; Page et al.(1996, 1997) Landauer et al. (2003) Elliot (2003) Rudner et al. (2002) 3 Jess 3.1,,,,,,.,,.,,.,., 2002, 2001., ( ),. http://www.aozora.gr.jp/.,, JUMAN (, http://chasen.aist-nara.ac.jp/;, ), Breakfast, NTT, KNP SAX, BUP, MSLR.,,.,, ( )., (1999a,b),, 19

,. Jess ( ), Jess e-rater,,, (1), (2), (3) 3. 3 ( )., 5,2,3, 10. (1988),. 6.. 68% 0, Jess. 3.2, 3.3, 3.4. 3.5,. 3.6. 3.2 Jess ( / ) (1995), (1996), (1), (2), (3) (big word, ), (4),.,, CD-ROM,.,.,,,,. 1.5. (1). 1., 2., 3., 4. / 5. ( ) 6. (2) (Yule,1944), K. K, n f[n], : K = T S S 2 10; 000 20

, S = X n n=1 (n f[n]); T = X n n=1 (n 2 f[n]). S 1. T 2, n 2,,, T. T, 1 K 0 S, ( T S ) S 2. 10,000. K,,., K 87.3, 101.3.,, K. Tweedie & Baayen(1998). (3),,.,.,,,.,.,, 4, 3 ( 25%) 5. 6,., 25%,. (4),.,. 3.3,.,,.,.,,.,.,.., (1997).. :.... 21

:,,,., ( ), (,, ), (,, ). :.,,,,,,,.. :,,.. : A B, B. A B, A, B. :, A.,. :,.,. :,,.,,, 4, 8. Jess, (discourse, ),.,,,.,,.,, (,1999). Jess,,. 3.4, TREC(Text REtrieval Conference) Latent Semantic Indexing, LSI. IEA,. X (sparse matrix).,, X,., Berry(1992) SVDPACK. 8, - (1999a),. X Duff, Grimes, & Lewis (1989) Harwell-Boeing sparse matrix format 22

.,,.,,., IPA (THiMC097), ( ), - ( ), - (, ), - - ( ), - - ( ), (,,,, ), (,,, ).,,,,,, -,,,. 3.5 e-rater http://www.etctechnologies.com/html/eraterdemo.html, 7 (7 )., 6, 6, 5, 4, 2 1, 3 3. Web, Jess. 2. 2 e-rater, 3 Jess, 4. 2: e-rater Jess CPU( ) A 4 6.9(4.1) 687 1.00 B 3 5.1(3.0) 431 1.01 C 6 8.3(5.0) 1,884 1.35 D 2 3.1(1.9) 297 0.94 E 3 7.9(4.7) 726 0.99 F 5 8.4(5.0) 1,478 1.14 G 3 6.0(3.6) 504 0.95 Jess 5, 2, 3 10, e-rater, 6. e-rater Jess,. e-rater ( ),, Jess. C, e-rater 6, Jess,, 6 5., 7 ( ), e-rater/jess., (, ), e-rater/jess. 2 5 Jess (CPU ). Plat'Home Standard System 801S; Intel Pentium III 800MHz; RedHat7.2. Jess C, jgawk, jsed, C, 1., 23

, / kakasi(http://kakasi.namazu.org/). UNIX. Web http://zaza.rd.dnc.ac.jp/ jess/. Windows 2004 Windows 3.6 Jess., 800 1,600,,.,, (, 2003a).,, Jess.,,,,.,,, -. 4,,.,. 4.1,,,.,, 600 800., GMAT AWA, 2, 30. 1. (issue) (analysis of an issue):.. 2. (argument) (analysis of an argument):,.. 30 300 400, 800., 400 200, 800 1,600., 1,600 400 4. 24

., 6 ( 4 )5 6., 600 800, (Writung Ability),. 600 800,., ( ) 850, 1 365 20., 20, /. 850 ( ).,,,.,,, 0.5 (, 2003a ). (Powers et al., 2000),. 1,600,.,. 4.2 (discourse analysis) Marcu (2000) 4.3,,. UNIX (EUC), EUC, JIS, JIS, Web., JIS ( ). Windows( JIS) 1 2 3.,. Jess,,,. 25

5 2 ( ) 7 (C)( 16500628) Allan, J., Carbonell, J., Doddington, G., Yamron, J. and Yang, Y.(1998):Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 194 218. Available online: http://ciir.cs.umass.edu/pubfiles/ir-137.pdf Bennet, R.E. & Bejar, I.I. (1998). Validity and automated scoring: It's not only the scoring, Educational Measurement: Issues and Practice, 17(4), 9 17. Bereiter, C.(2003). Foreword. In Shermis, M. & Burstein, J. eds. Automated essayscoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Berry, M.W.(1992). Large scale singular value computations, International Journal of Supercomputer Applications, 6 (1), 13 49. Berry, M.J.A. & Linoff, G.S.(1997). Data Minig Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc. Beyesian Essay Test Scorint system, BETSY, http://edres.org/betsy/ Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M.D. (1998). Automated Scoring Using A Hybrid Feature Identification Technique. In the Proceedings of the Annual Meeting of the Association of Computational Linguistics, August, 1998. Montreal, Canada. Available online: http://www.ets.org/research/erater.html Burstein, J. & Wolska, M. (2003). Toward evaluation of writing style: Finding overly repetitive word use in student essays. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary. Calfee, R. (2000). To Grade or Not To Grade, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 35 37. 26

Chase, C.I.(1979). The impact of achievement expectations and handwriting quality on scoring essay tests, Journal of Educational Measurement, 16 (1), 293 297. Chase, C.I.(1986). Essay test scoring : interaction of relevant variables, Journal of Educational Measurement, 23 (1), 33 41. Chung, G. & O'Neil,Jr. H. F. (1997). Methodological Approaches to Online Scoring of Essays, CSE Technical Report 461, Center for the Study of Evaluation, National Center for Research on Evaluation, Standards and Student Testing, Available online: http://www.cse.ucla.edu/cresst/reports/tech461.pdf Chute, E.(2001). PG writers take intellimetric software for a test drive, PG news, post-gazette.com, Available online: http://www.post-gazette.com/regiostate/ 20011216essaysidep9.asp Cooper, P.L.(1984). The assessment of writing ability: a review of research, GRE Board Research Report, GREB No.82-15R. Available online: http://www.gre.org/ reswrit.html#theassessmentofwriting Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R.(1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (7), 391 407. Duff, I.S., Grimes, R.G., & Lewis, J.G.(1989). Sparse matrix test problem, ACM Trans. Math. Software, 15, 1 14. Electronic Essay Rater, e-rater, http://www.ets.org/erater/index.html Elliot, S.(1999). Construct validity of IntelliMetric with international assessment, Yardley, PA: Vantage Technologies (RB-323). Elliot, S.(2003). IntelliMetric: From Here to Validity, 71 86. In Shermis, M. & Burstein, J. eds. Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Foltz, P.W., Laham, D., & Landauer, T.K.(1999). Automated Essay Scoring: Applications to Educational Technology. In Proceedings of EdMedia '99. (1986). COMET,, OS 86-21, 15 22. (1992). St.WORDS, 45, 6C-1. 275 276. Grosz, B. J., Joshi, A. K., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203 225. Huang, X.D., Ariki. Y., & M.A. Jack, M.A. (1990). Hidden Marcov Models for Speech Recognition, Edinburgh University Press, Edinburgh. Hughes, D.C., Keeling, B., & Tuck, B.F. (1983). The effects of instructions to scorers intended to reduce context effects in essay scoring, Educational and Psychological Measurement, 43, 1047 1050. Intelligent Essay Assessor, IEA, http://www.knowledge-technologies.com/ (1987). (REVISE), 36(9), 1159 1167. 27

(1993).,, 34 (10), 1249 1258. IntelliMetric, http://www.intellimetric.com/ (1999a).,,, 28 (2), 107 121. Available online: http://www.rd.dnc.ac.jp/~tunenori/ doc/jjassvd.fpdf,psg (1999b). /, /, : 11-188613, : 2001-14341. (2001). e-rater,, 24, 71 76. (2002).,,, 2002-31300. (2003a). Jess:, 2003,, 298 299. (2003b). Jess,, 16 (1), 3 18. Available online: http://www.rd.dnc.ac.jp/ ~tunenori/doc/jess kt.fpdf,psg Ishioka,T. & Kameda,M. (2004). Automated Japanese Essay Scoring System : Jess, DEXA 2004 (15th International Conference on Database and Expert Systems Applications), Zaragoza Spain, 4 8. Available online: http://www.rd.dnc.ac.jp/ ~tunenori/doc/ishioka T Jess.ps Jones, K.S.(1972). A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation, 28 (1), 11 21. Keith, T. Z.(1998) Construct Validity of PEG, American Educatiotional Research Association, San Diego, CA. (1999)., 4,. Kukich, K. (2000). Beyond Automated Essay Scoring, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 22 27. Landauer, T.K., Laham, D., & Foltz, P.W. (2000). The Intelligent Essay Assessor, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 27 31. Landauer, T.K., Laham, D., & Foltz, P.W. (2003). Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor, 87 112. In Shermis, M. & Burstein, J. eds. Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Luhn, H.P.(1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4), 307 317. (1995)., 1000 3,. Marc, D.(2000). The Theory and Practice of Discourse Parsing and Summarization, MIT Press, Cambridge, Massachusetts. 28

Marshall, J.C. & Powers, J.M. (1969). Writing neatness, composition errors and essay grades, Journal of Educational Measurement, 6 (2), 97 101. McCallum, A. & Nigam, K. (1998). A comparison of event models for Naive Bayes Text Classification. AAA-98 Workshop on Learning for Text Categorization." Available online: http://citeseer.nj.nec.com/mccallum98comparison.html Meyer, G. (1939). The choice of questions on essay examinations, Journal of Educational Psychology, 30 (3), 161 171. Mitchell, T. (1997). Machine Learning, WCB/McGraw-Hill. MSNLP (2004). http://research.microsoft.com/nlp/ Myford, C.M. & Cline, F. (2002) Looking for Patterns in Disagreements: A Facets Analysis of Human... Rater's and e-rater Scores on Essay Written for the Graduate management Admission Test (GMAT), Annual Meeting of the American Educational Research Association, April 1-5, 2002, New Orleans, LA. Available online: http://www.ets.org/research/dload/aera2002-myf.pdf ( )(1996)., 15,. (1997).,,. (1992). FleCS : 45, 151 152. Page, E.B.(1966). The imminence of Grading Essays by Computer, Phi Delta Kappan, 238 243. Page, E.B.(1994). New Computer Grading of Student Prose, Using Modern Concepts and Software, Journal of Experimental Education, 62(2), 127 142. Page, E.B., Lavoie, M.J., & Keith, T.Z.(1996). Computer Grading of Essay Traits in Student Writing, Annual Meeting of the National Council on Measurement in Education, New York. Page, E.B., Poggio, J.P., & Keith, T.Z.(1997). Computer analysis of student essays: Finding trait differences in the student profile. AERA/NCME Symposium on Grading Essays by Computer. Powers, D.E., Burstein, J.C., Chodorow, M., Fowles, M.E., & Kukich, K. (2000). Comparing the validity of automated and human essay scoring (GRE No. 98-08a). Princeton, NJ: Educational Testing Service. Project Essay Grade, PEG, http://134.68.49.185/pegdemo/ Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language, Longman. Rudner, L. & Gagne, P. (2001). An overview of three approaches to scoring written essays by computer. Practical Assessment, Research & Evaluation, 7(26). Available online: http://pareonline.net/getvn.asp?v=7&n=26 Rudner, L.M. & Liang, L. (2002). Automated essay scoring using Bayes' theorem, National Council on Measurement in Education, New Orleans, LA. Available online: http://ericae.net/betsy/papers/n2002e.pdf 29

Shermis, M.D., Koch, C.M., Page, E., Keith, T.Z., & Harrington, S. (2002). Trait Rating for Automated Essay Grading, Educational and Psychological Measurement, 62, [1], 5 18. Sepos, M.(2000). Grading essay tests is going online in PA., 2000-11-06, Philadelphia Business Journal, Available online: http://philadelphia.bizjournals.com/ philadelphia/stories/2000/11/06/fofus7.html Tweedie, F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective, Computers and the Humanities, 32, 323 352. (1988).,, 28, 143 164. Williams, R. (2001). Automated Essay Grading: An evaluation of four conceptual models, Teaching and Learning Forum 2001. Available online: http://lsn.curtin. edu.au/tlf/tlf2001/williams.html Wresch, W. (1993). The Imminence of Grading Essays by Computer - 25 Years Later. Computers and Composition, 10(2), 45-58. Available online: http://corax.cwrl. utexas.edu/cac/archiveas/v10/10 2 html/10 2 5 Wresch.html Yule, G.U.(1944). The Statistical Study of Literary Vocabulary, Cambridge University Press, Cambridge. 30

( ) (1) (2) (1985), (1992) (3) ffl Evaluation of criteria for information retrieval, Sytem and Computers in Japan, 35 (1), 42 49, 2004. (Translated from Denshi Joho Tsushin Gakkai Ronbunshi, J86-D-I (5), 293 300, 2003) ffl Jess,, 16 (1), 3 18, 2003. ffl Maximum likelihood estimation of Weibull parameters for two independent competing risks, IEEE Trans. on Reliability, R-40 (1), 71 74, 1991. 31