* Tsunenori ISHIOKA * (Research Division, the National Center for University Entrance Examinations; Komaba, Meguro-ku, Tokyo

Latest Trends in Automated Essay-Scoring Systems 1

* Tsunenori ISHIOKA * 153-8501 2-19-23 (Research Division, the National Center for University Entrance Examinations; 2-19-23 Komaba, Meguro-ku, Tokyo 153-8501, Japan. E-mail: tunenori@rd.dnc.ac.jp) 2

Abstract With the aim of removing human errors and providing critical feedback and suggestions for improvement, considerable research has be done on computer-based automated essayscoring systems. Examples of these include e-rater, PEG, IEA, IntelliMetric, and BETSY. This paper summarizes how these systems work in an attempt to comprehend their features. They are also compared. An automated Japanese essay-scoring system named Jess is introduced, including our analysis of its performance. Lastly, difficulties caused by its treatment of Japanese passages and related problems are discussed. 3

1 1.1 1960 Page(1966) Page Project Essay Grade, PEG ( ) PEG 0.78 0.85 ( ) (uncommon) Page proxies PEG PEG (contents) (Organization) (Style) 1980 Writers Workbench (WWB) (readability) WWB WWB 1 WWB NTT REVISE(, 1987) VOICE-TWIN(, 1993) COMET(, 1986) St.WORDS (, 1992) FleCS (, 1992),,, VOICE-TWIN,,,, /,,,. /, 1990 (Natural Language Processing, NLP) (Information Retrieval, IR) Graduate Management Admission Test, GMAT Analytical Writing Assessment, AWA 4

(syntax variety), (topic content), (organization of idea) Jill Burstein ETS 3 NLP IR NLP IR GMAT AWA 2 e-rater 400 6 2 10% 2 e-rater (Burstein, Kukich, Wolff, Lu, Chodorow, Braden-Harder, & Harris, 1998) PEG (Page, 1994) " Landauer TREC(Text REtrieval Conference) Latent Semantic Analysis Intelligent Essay Assessor, IEA (Foltz, Laham, & Landauer, 1999) IEA ( ) 3 15 3,296 2 0.86 IEA 0.85 IEA 0.83, 0.68, 0.66 2000 BETSY (Rudner & Liang, 2002) IntelliMetric (Elliot, 2003) Jess (, 2003b) ( ) ( ; ) ( ) (corpus, pl. corpora) (, rawcorpus) (, tagged corpus;, analyzed corpus) 5

(1999) N (1996) 600 1.2 (Keith, 1998; Page, Lavoie, & Keith, 1996) (Myford & Cline, 2002) Bennet & Bejar (1998) (rubric) (acceptable) 1 2 3 ( ) ( ) Bereiter(2003) ( ) ( ) Fridman 1980 (Bereiter) (rater) Jess e-rater (Kukich, 2000) 1 ( I concentrates", "this conclusions" ) 6

(pollution) N (, 1999 ) / N =20 ALEK(Assessment of Lexical Knowledge) 79% / ( ) ( ) ( ) 2 (centering theory) (rough-shift) (Grosz, Joshi, & Weinstein, 1995) (continue) > (retain) > (smooth-shift) > (rough-shift) 100 AWA 2 ( ) ( ) ( ) Calfee (2000) WWB 1.3 Shermis(2002) 3 1 1492 2 7

2 Electronic Essay Rater, e-rater (Burstein et al., 1998), Project Essay Grade, PEG (Page, Poggio, & Keith, 1997), Intelligent Essay Assessor, IEA (Foltz et al, 1999), IntelliMetric (Elliot, 2003), Bayesian Essay Test Scoring system, BETSY (Rudner et al., 2002),. 3 (, ), Jess (, 2003b). 4,,. 2 2.1 Electronic Essay Rater, e-rater ( ) Graduate Management Admission Test, GMAT,,. Educational Testing Service, ETS Burstein, 2000 ETS Technologies,. E-rater GMAT,.,, 6 2.. 1 4. e-rater 1, Burstein&Wolska(2003) 97%. Burstein et al.(1998), 89%,,. E-rater 3. (Structure):,,. MSNLP(2004),,,,., (would, could, should, might, may).,,, 8

, 1. (Organization):.,., (discourse).,., e-rater cue word (Quirk, Greenbaum, Leech, & Svartvik, 1985)., In summary" In conclusion", perhaps" possibly",. this" these",.., (APA, Annotation Program).., e-rater,,. (Contents):.,,.,,. e-rater,, 1 6,,., TF(Term Frequency), IDF(Inverse Document Frequency) TF IDF., 1 6,.,.,,,. E-rater,.,,, 57.,, 8-12., 8-12,, 75. 75,. 1. 9

2. 3. 4. 5. 6. 7. (complement clause) 8. (summary words) 9. (detail words) 10. (rhetorical words),,.,., e-rater http://www.ets.org/research/erater. html. Burstein et al.(1998) Kukichi(2000) e- rater (2001). E-rater Powers, Burstein, Chodorow, Fowles, & Kukich (2000) 2 GPA / 9 ( e-rater ) GRE N =721ο890 e-rater 5% GRE 6 5 6 1 2 e-rater 9 E-rater, Critique (Critique Writing Analysis Tool) Criterion (Online Essay Evaluation Service). E-rater ( ), Critique,,,,,,. Critique, Burstein(2003),, http://www.ets.org/critique/research.html., ETS Criterion, c-rater (short answer), (free answer) (conceptional information).,. 2.2 Project Essay Grade, PEG, Page (Page,1966). PEG, SAT. PEG, 10

.,. Page, 2 ( ). trins,,,, trinsic., proxes. trins (approximation), trins. ( ) proxes., trins, proxes (Page, 1994). tris, proxes. proxes,., Microsoft WORD Grammatik-5,,,., (proxes ) Page. 1994 26,, 4,,. 26 PEG 0.8.,, 5. 1. Content 2. Organization 3. Style 4. Mechanics 5. Creativity PEG (Page,1966), 2. 1993 PEG, http://134.68.49.185/pegdemo/ref.asp. 2.3 Intelligent Essay Assessor, IEA Landauer Foltz http://lsa.colorado.edu/, Knowledge Analysis Technologies (K-A-T)., IEA, K-A-T. IEA,,. IEA, Deerwester, Dumais, Furnas, Landauer, & Harshman (1990) Latent Semantic Analysis ( Latent Semantic Indexing, LSI )., ( bag of words" ). IEA LSI t d X(t, d ) X = T 0 S 0 D 0 0. (coccurrence) 11

X - T 0 D 0, T 0T 0 0 = T 0 T 0 0 = I t D 0 D 0 0 = D 0 D 0 0 = I d., I t I d t, d. 0» d» t. 0, S 0. S 0 k, S., T 0 D 0 k, T D., bx = TSD 0 (1), X b X. T t k, S k k, D 0 k d., X 0 X, (1), TS, D 0., T 1, SD 0. ( Bellcore, ENCY 56; 530 25; 629 NEWS 35; 796 19; 660), Deerwester et al.(1990) k 50 ο 100, IEA 100 ο 300. 1 ( ) e, t x e,, D 1 k d e = x 0 ets 1. q k d q., r(d e ;d q ),. r(d e ;d q )= (d e;d q ) (2) kd e kkd q k, k k. (2) d e d q,, (2) (cosine similarity)., r(d e ;d q ) r(x e ;x q ) TF(term frequency) (Luhn,1957). TF, (inverse document frequency) Jones(1972) IDF TF IDF, ( Allan, Carbonell, Doddington, Yamron, &Yang, 1998 ). (e-rater ) TF IDF. IEA, e, IEA 12

10, min(kd e k kd q k) 10 Landauer 20 http://lsa.colorado. edu/paper.html pdf Foltz et al.(1999) IEA LSI (Landauer, Laham, & Foltz, 2003), 3 ffl Content: ffl Style: ffl Mechanics: Overall( ) A-D, F Landauer et al.(2003) GMAT N =2; 263 0.86 IEA (single raters) 0.85 N =1; 033 0.75 IEA 0.73 IEA (N =2; 263) 0.88 (N =1; 033) 0.78 (N =3; 394) IEA 0.85 3 IEA 0.83 0.68, 0.66 3 70% 80% 10% 20% 11% Chung & O'Neil (1997) IEA K-A-T IEA,. 1 (prompt) 300 500, 20 30, IEA 1 100.,. Foltz et al.(1999), 20. OS 1999 e-rater Sun Workstation, Ultra-2, Solaris, 137MHz 2 LSI, Web,., Jess LSI, Intel Pentium III, 800MHz, RedHat7.2 0.5.. LSI 1999, 2001 (, 1999b). (Jess) 2002 (, 2002) 13

LSI,,.,. 2.4 IntelliMetric / Vantage Learning,. 1991 Vantage R & D, 1995,,,, 1997 7,. 1997 11 IntelliMetric, 1998 2., Vantage Learning,. http://www.intellimetric.com/demosite/demo.html /. 11 (10 million dollars)., Vantage Learning,.,,.,. Vantage Learning (CogniSearch), (Quantum Reasoning), (IntelliMetric),,.,. (decision tree; Berry & Linoff, 1997 ) CART (Classification and regression trees), CHAID(Chi-squared automatic interaction detection) C4.5 C5.0 IntelliMetric,, 5. ffl Focus & Meaning:,. ffl Development & Content: ffl Organization: ffl Language Use & Style:, ffl Mechanics & Conventions:, 1 6, 6.,, 4,,, 1 4 4 (Sepos, 2000)., 72 (Features).,,. Vantage Learning IntelliMetric, 2. 14

1. 2, IntelliMetric,. 1,202 6, 2 1 95%, IntelliMetric 99%. 2.,.,. IntelliMetric,,. Vantage Learning, 1., IntelliMetric., /,,. IntelliMetric,,,. Writing Ability t, i y i, e i y i = t + e i, IntelliMetric e i.,, 2..,,. Elliot(2003) ffl 300 ( 50) ffl 6 1 6 20 ffl 2,,,.,. 2001, (Eleanor Chute) IntelliMetric, 6 4, (Chute, 2001)., (Chief Operating Officer) Scott Elliott, 3% 7%, (too unusual to grade).,. IntelliMetric Elliot(1999) 7 (International, Age7), 11, 14 (External Measures of Writing) IntelliMetric 2 IntelliMetric 0.60 0.64 IntelliMetric 15

0.58 0.60 Elliot ( ) 2.5 Beyesian Essay Test Scoring system, BETSY BETSY Rudner, (Rudner et al., 2002).,, 4 6,.,, (Appropriate), (Partial), (Inappropriate) 3., 3., / /. P i (u i =1jA); P i (u i =1jR); P i (u i =1jI). i,, u i. A; R; I, / /.., : P i (u i =1jA) =0:7; : P i (u i =1jR) =0:6; : P i (u i =1jI) =0:1.,, / /. Ability,, P (A) =P (R) =P (I) =0:33., P (A);P(R);P(I). P (Aju i =1)=P (u i =1jA) Λ P (A)=P (u i =1). P (Aju i =1)=0:7 0:33=P (u i =1)=0:233=P (u i =1). P (Rju i =1)=0:6 0:33=P (u i =1)=0:200=P (u i =1); P (Rju i =1)=0:1 0:33=P (u i = 1) = 0:033=P (u i = 1). P (u i = 1), P (Aju i = 1) = 0:233=(0:233 + 0:200 + 0:033) =0:5; P (Rju i =1)= 0:200=(0:233+0:200+0:033) = 0:429; P (Iju i =1)=0:033=(0:233+0:200+0:033) = 0:071., (I)., P (A); P (R); P (I).. 2 (McCallum & Nigam, 1998) Bernoulli, d i c j P (d i jc j )= VY t=1 [B it P (w t jc j )+(1 B it )(1 P (w t jc j ))]., V, B it 2 (0; 1) t i. P (w t jc j ) w t c j 1 16

P (w t jc j )= 1+ D X j i=1 J + D j D j c j J multinomial,. P (d i jc j )= VY t=1 B it P (w t jc j ) N it N it, w t i. P (w t jc j ) w t c j P (w t jc j )= 1+ V + N it! D X j N it i=1 XD: N it D: (unigram) " Mitchell(1997) BETSY 2 462 80 ( 40 ) 80 64 (80%), BETSY. 2.6 (holistic) 2 e-rater, PEG 10 IEA IntelliMetric BETSY, IntelliMetric BETSY ( ) A,B,C,D,E,F 6 17 1=1

A F A B C A C A C B Rudner & Gagne (2001) PEG, e-rater, IEA IEA e-rater PEG (writing quality) 0.75 0.85 ( Page, 1996 Page, 1997) Chung et al.(1997) (IEA) Shermis(2002) (holistic score) 1 (single raters) ( ) 500 1,000 (descriptive) Elliot(2003) IntelliMetric 2 ( e- rater PEG ) IntelliMetric 2 6 1, Wresch(1993), 1. 2, 3. 4,. 5, BETSY e-rater PEG / IEA PEG IntelliMetric BETSY 18

1: (Wresch,1993 ) ( ) e-rater,, tricked" Powers et al, (2000) PEG,,,, / IEA,, LSI / IntelliMetric,,,, BETSY ; Page et al.(1996, 1997) Landauer et al. (2003) Elliot (2003) Rudner et al. (2002) 3 Jess 3.1,,,,,,.,,.,,.,., 2002, 2001., ( ),. http://www.aozora.gr.jp/.,, JUMAN (, http://chasen.aist-nara.ac.jp/;, ), Breakfast, NTT, KNP SAX, BUP, MSLR.,,.,, ( )., (1999a,b),, 19

,. Jess ( ), Jess e-rater,,, (1), (2), (3) 3. 3 ( )., 5,2,3, 10. (1988),. 6.. 68% 0, Jess. 3.2, 3.3, 3.4. 3.5,. 3.6. 3.2 Jess ( / ) (1995), (1996), (1), (2), (3) (big word, ), (4),.,, CD-ROM,.,.,,,,. 1.5. (1). 1., 2., 3., 4. / 5. ( ) 6. (2) (Yule,1944), K. K, n f[n], : K = T S S 2 10; 000 20

, S = X n n=1 (n f[n]); T = X n n=1 (n 2 f[n]). S 1. T 2, n 2,,, T. T, 1 K 0 S, ( T S ) S 2. 10,000. K,,., K 87.3, 101.3.,, K. Tweedie & Baayen(1998). (3),,.,.,,,.,.,, 4, 3 ( 25%) 5. 6,., 25%,. (4),.,. 3.3,.,,.,.,,.,.,.., (1997).. :.... 21

:,,,., ( ), (,, ), (,, ). :.,,,,,,,.. :,,.. : A B, B. A B, A, B. :, A.,. :,.,. :,,.,,, 4, 8. Jess, (discourse, ),.,,,.,,.,, (,1999). Jess,,. 3.4, TREC(Text REtrieval Conference) Latent Semantic Indexing, LSI. IEA,. X (sparse matrix).,, X,., Berry(1992) SVDPACK. 8, - (1999a),. X Duff, Grimes, & Lewis (1989) Harwell-Boeing sparse matrix format 22

.,,.,,., IPA (THiMC097), ( ), - ( ), - (, ), - - ( ), - - ( ), (,,,, ), (,,, ).,,,,,, -,,,. 3.5 e-rater http://www.etctechnologies.com/html/eraterdemo.html, 7 (7 )., 6, 6, 5, 4, 2 1, 3 3. Web, Jess. 2. 2 e-rater, 3 Jess, 4. 2: e-rater Jess CPU( ) A 4 6.9(4.1) 687 1.00 B 3 5.1(3.0) 431 1.01 C 6 8.3(5.0) 1,884 1.35 D 2 3.1(1.9) 297 0.94 E 3 7.9(4.7) 726 0.99 F 5 8.4(5.0) 1,478 1.14 G 3 6.0(3.6) 504 0.95 Jess 5, 2, 3 10, e-rater, 6. e-rater Jess,. e-rater ( ),, Jess. C, e-rater 6, Jess,, 6 5., 7 ( ), e-rater/jess., (, ), e-rater/jess. 2 5 Jess (CPU ). Plat'Home Standard System 801S; Intel Pentium III 800MHz; RedHat7.2. Jess C, jgawk, jsed, C, 1., 23

, / kakasi(http://kakasi.namazu.org/). UNIX. Web http://zaza.rd.dnc.ac.jp/ jess/. Windows 2004 Windows 3.6 Jess., 800 1,600,,.,, (, 2003a).,, Jess.,,,,.,,, -. 4,,.,. 4.1,,,.,, 600 800., GMAT AWA, 2, 30. 1. (issue) (analysis of an issue):.. 2. (argument) (analysis of an argument):,.. 30 300 400, 800., 400 200, 800 1,600., 1,600 400 4. 24

., 6 ( 4 )5 6., 600 800, (Writung Ability),. 600 800,., ( ) 850, 1 365 20., 20, /. 850 ( ).,,,.,,, 0.5 (, 2003a ). (Powers et al., 2000),. 1,600,.,. 4.2 (discourse analysis) Marcu (2000) 4.3,,. UNIX (EUC), EUC, JIS, JIS, Web., JIS ( ). Windows( JIS) 1 2 3.,. Jess,,,. 25

5 2 ( ) 7 (C)( 16500628) Allan, J., Carbonell, J., Doddington, G., Yamron, J. and Yang, Y.(1998):Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 194 218. Available online: http://ciir.cs.umass.edu/pubfiles/ir-137.pdf Bennet, R.E. & Bejar, I.I. (1998). Validity and automated scoring: It's not only the scoring, Educational Measurement: Issues and Practice, 17(4), 9 17. Bereiter, C.(2003). Foreword. In Shermis, M. & Burstein, J. eds. Automated essayscoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Berry, M.W.(1992). Large scale singular value computations, International Journal of Supercomputer Applications, 6 (1), 13 49. Berry, M.J.A. & Linoff, G.S.(1997). Data Minig Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc. Beyesian Essay Test Scorint system, BETSY, http://edres.org/betsy/ Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M.D. (1998). Automated Scoring Using A Hybrid Feature Identification Technique. In the Proceedings of the Annual Meeting of the Association of Computational Linguistics, August, 1998. Montreal, Canada. Available online: http://www.ets.org/research/erater.html Burstein, J. & Wolska, M. (2003). Toward evaluation of writing style: Finding overly repetitive word use in student essays. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary. Calfee, R. (2000). To Grade or Not To Grade, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 35 37. 26

Chase, C.I.(1979). The impact of achievement expectations and handwriting quality on scoring essay tests, Journal of Educational Measurement, 16 (1), 293 297. Chase, C.I.(1986). Essay test scoring : interaction of relevant variables, Journal of Educational Measurement, 23 (1), 33 41. Chung, G. & O'Neil,Jr. H. F. (1997). Methodological Approaches to Online Scoring of Essays, CSE Technical Report 461, Center for the Study of Evaluation, National Center for Research on Evaluation, Standards and Student Testing, Available online: http://www.cse.ucla.edu/cresst/reports/tech461.pdf Chute, E.(2001). PG writers take intellimetric software for a test drive, PG news, post-gazette.com, Available online: http://www.post-gazette.com/regiostate/ 20011216essaysidep9.asp Cooper, P.L.(1984). The assessment of writing ability: a review of research, GRE Board Research Report, GREB No.82-15R. Available online: http://www.gre.org/ reswrit.html#theassessmentofwriting Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R.(1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (7), 391 407. Duff, I.S., Grimes, R.G., & Lewis, J.G.(1989). Sparse matrix test problem, ACM Trans. Math. Software, 15, 1 14. Electronic Essay Rater, e-rater, http://www.ets.org/erater/index.html Elliot, S.(1999). Construct validity of IntelliMetric with international assessment, Yardley, PA: Vantage Technologies (RB-323). Elliot, S.(2003). IntelliMetric: From Here to Validity, 71 86. In Shermis, M. & Burstein, J. eds. Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Foltz, P.W., Laham, D., & Landauer, T.K.(1999). Automated Essay Scoring: Applications to Educational Technology. In Proceedings of EdMedia '99. (1986). COMET,, OS 86-21, 15 22. (1992). St.WORDS, 45, 6C-1. 275 276. Grosz, B. J., Joshi, A. K., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203 225. Huang, X.D., Ariki. Y., & M.A. Jack, M.A. (1990). Hidden Marcov Models for Speech Recognition, Edinburgh University Press, Edinburgh. Hughes, D.C., Keeling, B., & Tuck, B.F. (1983). The effects of instructions to scorers intended to reduce context effects in essay scoring, Educational and Psychological Measurement, 43, 1047 1050. Intelligent Essay Assessor, IEA, http://www.knowledge-technologies.com/ (1987). (REVISE), 36(9), 1159 1167. 27

(1993).,, 34 (10), 1249 1258. IntelliMetric, http://www.intellimetric.com/ (1999a).,,, 28 (2), 107 121. Available online: http://www.rd.dnc.ac.jp/~tunenori/ doc/jjassvd.fpdf,psg (1999b). /, /, : 11-188613, : 2001-14341. (2001). e-rater,, 24, 71 76. (2002).,,, 2002-31300. (2003a). Jess:, 2003,, 298 299. (2003b). Jess,, 16 (1), 3 18. Available online: http://www.rd.dnc.ac.jp/ ~tunenori/doc/jess kt.fpdf,psg Ishioka,T. & Kameda,M. (2004). Automated Japanese Essay Scoring System : Jess, DEXA 2004 (15th International Conference on Database and Expert Systems Applications), Zaragoza Spain, 4 8. Available online: http://www.rd.dnc.ac.jp/ ~tunenori/doc/ishioka T Jess.ps Jones, K.S.(1972). A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation, 28 (1), 11 21. Keith, T. Z.(1998) Construct Validity of PEG, American Educatiotional Research Association, San Diego, CA. (1999)., 4,. Kukich, K. (2000). Beyond Automated Essay Scoring, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 22 27. Landauer, T.K., Laham, D., & Foltz, P.W. (2000). The Intelligent Essay Assessor, The Debate on Automated Essay Grading, IEEE Intelligent Systems, 15(5), 27 31. Landauer, T.K., Laham, D., & Foltz, P.W. (2003). Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor, 87 112. In Shermis, M. & Burstein, J. eds. Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Luhn, H.P.(1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4), 307 317. (1995)., 1000 3,. Marc, D.(2000). The Theory and Practice of Discourse Parsing and Summarization, MIT Press, Cambridge, Massachusetts. 28

Marshall, J.C. & Powers, J.M. (1969). Writing neatness, composition errors and essay grades, Journal of Educational Measurement, 6 (2), 97 101. McCallum, A. & Nigam, K. (1998). A comparison of event models for Naive Bayes Text Classification. AAA-98 Workshop on Learning for Text Categorization." Available online: http://citeseer.nj.nec.com/mccallum98comparison.html Meyer, G. (1939). The choice of questions on essay examinations, Journal of Educational Psychology, 30 (3), 161 171. Mitchell, T. (1997). Machine Learning, WCB/McGraw-Hill. MSNLP (2004). http://research.microsoft.com/nlp/ Myford, C.M. & Cline, F. (2002) Looking for Patterns in Disagreements: A Facets Analysis of Human... Rater's and e-rater Scores on Essay Written for the Graduate management Admission Test (GMAT), Annual Meeting of the American Educational Research Association, April 1-5, 2002, New Orleans, LA. Available online: http://www.ets.org/research/dload/aera2002-myf.pdf ( )(1996)., 15,. (1997).,,. (1992). FleCS : 45, 151 152. Page, E.B.(1966). The imminence of Grading Essays by Computer, Phi Delta Kappan, 238 243. Page, E.B.(1994). New Computer Grading of Student Prose, Using Modern Concepts and Software, Journal of Experimental Education, 62(2), 127 142. Page, E.B., Lavoie, M.J., & Keith, T.Z.(1996). Computer Grading of Essay Traits in Student Writing, Annual Meeting of the National Council on Measurement in Education, New York. Page, E.B., Poggio, J.P., & Keith, T.Z.(1997). Computer analysis of student essays: Finding trait differences in the student profile. AERA/NCME Symposium on Grading Essays by Computer. Powers, D.E., Burstein, J.C., Chodorow, M., Fowles, M.E., & Kukich, K. (2000). Comparing the validity of automated and human essay scoring (GRE No. 98-08a). Princeton, NJ: Educational Testing Service. Project Essay Grade, PEG, http://134.68.49.185/pegdemo/ Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language, Longman. Rudner, L. & Gagne, P. (2001). An overview of three approaches to scoring written essays by computer. Practical Assessment, Research & Evaluation, 7(26). Available online: http://pareonline.net/getvn.asp?v=7&n=26 Rudner, L.M. & Liang, L. (2002). Automated essay scoring using Bayes' theorem, National Council on Measurement in Education, New Orleans, LA. Available online: http://ericae.net/betsy/papers/n2002e.pdf 29

Shermis, M.D., Koch, C.M., Page, E., Keith, T.Z., & Harrington, S. (2002). Trait Rating for Automated Essay Grading, Educational and Psychological Measurement, 62, [1], 5 18. Sepos, M.(2000). Grading essay tests is going online in PA., 2000-11-06, Philadelphia Business Journal, Available online: http://philadelphia.bizjournals.com/ philadelphia/stories/2000/11/06/fofus7.html Tweedie, F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective, Computers and the Humanities, 32, 323 352. (1988).,, 28, 143 164. Williams, R. (2001). Automated Essay Grading: An evaluation of four conceptual models, Teaching and Learning Forum 2001. Available online: http://lsn.curtin. edu.au/tlf/tlf2001/williams.html Wresch, W. (1993). The Imminence of Grading Essays by Computer - 25 Years Later. Computers and Composition, 10(2), 45-58. Available online: http://corax.cwrl. utexas.edu/cac/archiveas/v10/10 2 html/10 2 5 Wresch.html Yule, G.U.(1944). The Statistical Study of Literary Vocabulary, Cambridge University Press, Cambridge. 30

( ) (1) (2) (1985), (1992) (3) ffl Evaluation of criteria for information retrieval, Sytem and Computers in Japan, 35 (1), 42 49, 2004. (Translated from Denshi Joho Tsushin Gakkai Ronbunshi, J86-D-I (5), 293 300, 2003) ffl Jess,, 16 (1), 3 18, 2003. ffl Maximum likelihood estimation of Weibull parameters for two independent competing risks, IEEE Trans. on Reliability, R-40 (1), 71 74, 1991. 31