78 RitsumeikanSocialSciencesReview(Vol52.No.3) TheanalysisprocedureofStep1issimilartothemethod textmining andisperformedalmost automaticaly.thesameres

Similar documents
138 RitsumeikanSocialSciencesReview(Vol53.No.1) OncesuchacodingruleisenteredintoKH Coder,notonlythedocumentscontaining Gilbert butalso thosecontaining

open / window / I / shall / the? something / want / drink / I / to the way / you / tell / the library / would / to / me

2 KH Coder KH Coder KH Coder KH Coder

GOT7 EYES ON YOU ミニアルバム 1. ノハナマン What? I think it s stuck ノマンイッスミョンデェヌンゴヤ Yeah モドゥンゴルジュゴシポソ Yo baby ノワオディトゥンジカゴシポ everywhere ナンニガウォナンダミョンジュゴシポ anythin

橡kenkyuhoukoku8.PDF

-2-

C. S2 X D. E.. (1) X S1 10 S2 X+S1 3 X+S S1S2 X+S1+S2 X S1 X+S S X+S2 X A. S1 2 a. b. c. d. e. 2

本チュートリアルの内容 1. KH Coderの準備 2. プロジェクト作成と前処理 3. 頻出語と共起 4. それぞれの部 ( 上 中 下 ) に特徴的な語 5. コーディングによるコンセプトの抽出 2

L3 Japanese (90570) 2008


平成29年度英語力調査結果(中学3年生)の概要

Answers Practice 08 JFD1

NO


MEET 270


m

other month other thing 東京中学定期試験予想問題 month other sing 中 1 英語標準 other thing 2 month early month sing 教科書 p 学習日 12 月 4日 other learn thing other

2

Hi. Hello. My name is What s your name? Nice to meet you. How are you? I m OK. Good morning. How are you? I am fine, thank you. My name is. Nice to me

178 New Horizon English Course 28 : NH 3 1. NH 1 p ALT HP NH 2 Unit 2 p. 18 : Hi, Deepa. What are your plans for the holidays? I m going to visi

本文/羽田野貴仁(p119‐136)

untitled

AERA_English_CP_Sample_org.pdf




< D8291BA2E706466>

ポイント 1 文型の意義? The ship made for the shore. She made a good wife. I will make a cake for her birthday. make (SVC) (SVO) S V X = [ ] 5

南山会報88入稿.indd

2010EIGOKYOIKU.indd

H24_後期表紙(AB共通)

Marilla: Well, what s your name? Anne: 1( )( )( ) call me Cordelia? Marilla: Call you Cordelia? Anne: 2( )( )( ) it s a ( )( )? Marilla: Is that your


P


There are so many teachers in the world, and all of them are different. Some teachers are quiet and dont like to talk to students. Other teachers like


untitled

54 55

What s your name? Help me carry the baggage, please. politeness What s your name? Help me carry the baggage, please. iii

14 RitsumeikanSocialSciencesReview(Vol52.No.2) PVenergywilbeexamined.Griddatainthe50Hertzzoneisusedinthispartofthestudy.Second,thekey rulesthatenabler

GN doc

Week1.pptx

E

\615L\625\761\621\745\615\750\617\743\623\6075\614\616\615\606.PS

1-2 4


H8.6 P

鹿大広報149号

3

aeronca_537_color.indd

03_論文_中嶋

1_sugata

荳也阜轣ス螳ウ蝣ア蜻・indd

MacOSX印刷ガイド

Œ{Ł¶/1flà

9 chapter

J.S


WLBARGS-P_-U_Q&A


.H..01..


CompuSec SW Ver.5.2 アプリケーションガイド(一部抜粋)

No.262全ページ

65歳雇用時代の賃金制度のつくり方

5

1986 NHK NTT NTT CONTENTS SNS School of Information and Communi


untitled

Read the following text messages. Study the names carefully. 次のメッセージを読みましょう 名前をしっかり覚えましょう Dear Jenny, Iʼm Kim Garcia. Iʼm your new classmate. These ar

L1 What Can You Blood Type Tell Us? Part 1 Can you guess/ my blood type? Well,/ you re very serious person/ so/ I think/ your blood type is A. Wow!/ G

<8F6F93588E9197BF2E6169>

2 RitsumeikanSocialSciencesReview(Vol53.No.2) injapanandanexceptionalcaseintheu.s.showedthatalbroadcastersaccustomedtothefree marketplacephilosophywou

20 want ~ がほしい wanted[-id]want to 21 her 彼女の, 彼女を she 22 his 彼の, 彼のもの he 23 how どのように, どうなのか, どれくらい how to <How!> How many? How much? How long? How ol

6 7 22

‚æ01Łª“û†œ070203/1‘Í

untitled

イングリッシュ

Dr. Spencer says that if the headaches persist, I might lose it completely. What if I can t run this place? 5Rachel has kindly offered to let me live

平成23年度 児童・生徒の学力向上を図るための調査 中学校第2 学年 外国語(英語) 調査票

駒田朋子.indd

Microsoft Word - k89.doc


評論・社会科学 116号(P)Y☆/1.郭

Jpn. J. Health & Med. Soc., 26(2) (2016)

学校保健304号

学校保健特別増刊号

学校保健290号


2

2


2

1


作業手順手引き

S1Šû‘KŒâ‚è

9(2007).ren

Transcription:

第 52 巻第 3 号 A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) RitsumeikanSocialSciencesReview 2016 年 12 月 77 ATwo-StepApproachtoQuantitativeContentAnalysis: KH CoderTutorialusingAnneofGreenGables(PartI) HIGUCHIKoichi ⅰ Abstract:Thisarticleintroducesatwo-stepapproachtoperformingquantitativecontentanalysisoftext data.first,anoutlineoftheapproachisbrieflydescribed.second,theprocedureofusingtheapproach toanalyzethenovelanneofgreengablesisdescribedasatutorial.third,thefeaturesoftheapproach arediscussedwithreferencetotheresultsoftheanalysis. Thetutorialsectionofthisarticlealowsreaderstosimulatethesameanalysisontheirown personalcomputers.weusefreesoftwareandmostofthenecessaryoperationsareilustratedinfigures. ThesubjectoftheanalysisisthepopularnovelAnneofGreenGables.Itispointedoutthattheheroine Anne sfostermothermarilaplaysanessentialroleinthenovelandthatmarilaismoreimportantthan Anne sbestfrienddiana,andgilbertwithwhom Annehasafaintromance.Intheanalysisofthe tutorial,weconfirm whetherthequantitativeanalysisbasedonthetwo-stepapproachalsoilustratesthe importanceofmarila. Thefirsthalfofthisarticleispublishedhere.Itisplannedthatthesecondhalfwilbepublishedin thisbuletininthenearfuture. Keywords: quantitativecontentanalysis,kh Coder,AnneofGreenGables,tutorial 1Introduction 1.1Two-StepApproach Thisarticleintroducesatwo-stepapproachtoquantitativecontentanalysisoftextdata.Contentanalysis hasbeenextensivelyemployedtoanalyzequalitativedata,suchastextinthefieldofsocialsciencesand humanities.inthisarticle,first,anoutlineofthetwo-stepapproachisdescribed.second,theprocedureof applyingtheapproachtothenovelanneofgreengablesisdescribedasatutorial,alowingreadersto simulatethesameanalysisontheirownpersonalcomputers.third,thefeaturesoftheapproachare discussedwithreferencetotheresultsoftheanalysis. Theauthorhasproposedaquantitativecontentanalysisapproachthatcomprisesthefolowingtwo steps(higuchi2004,2014). Step1:Extractwordsautomaticalyfrom dataandstatisticalyanalyzethem toobtainawholepictureand explorethefeaturesofthedatawhileavoidingtheprejudicesoftheresearcher. Step2:Specifycodingrules,suchas ifthereisaparticularexpression,weregarditasanappearanceof theconcepta,andextractconceptsfrom thedata.then,statisticalyanalyzetheconceptsto deepentheanalysis. ⅰ AssociateProfessor,FacultyofSocialSciences,RitsumeikanUniversity

78 RitsumeikanSocialSciencesReview(Vol52.No.3) TheanalysisprocedureofStep1issimilartothemethod textmining andisperformedalmost automaticaly.thesameresultscanthereforebeobtainednomaterwhoanalyzesthedata,andtheresults arehardlycontaminatedbytheprejudicesorhypothesesoftheresearcher.meanwhile,itissometimes dificulttouseone sownperspectiveorpursueone sownresearchquestions.insuchcases,theresearcher canproceedtostep2andperform coding,whichisaprocedureconventionalyusedforcontentanalysis. Byperformingcoding,theresearchercantakeacloserlookatanyaspectofinterestinthedata. 1.2KH Coder:PracticalFreeSoftware Toalow anyonetoeasilycarryoutanalysisbyadoptingtheabovetwo-stepapproach,theauthorhas beendevelopinganddistributingfreesoftwarecaledkh Coder.ThesoftwarecouldanalyzeJapanesetext onlywhenitwasfirstpublishedin2001.currently,inadditiontoenglishandjapanese,itsupportscatalan, Chinese,French,German,Italian,Korean,Portuguese,Russian,Slovene,andSpanishtext.Asfarasthe authorknows,morethan1000studiesusingkh CoderhavebeenpublishedasofNovember2016.While mostofthesestudieshavebeenpublishedinjapanese,morethan100studieshavebeenpublishedinenglish. KH CoderusesStanfordPOSTaggertoextractwordsfrom Englishdata,Rforstatisticalanalysis,and MySQLtoorganizeandretrievethedata.Thesesoftwareprograms,includingKH Coder,havebeenused bymanyresearchers.additionaly,sincethesourcecodeisopentothepublic 1,anyonecancheckwhatthe softwaredoesifnecessary.inotherwords,kh Coderisnotaclosedblackboxbutisopentoverification bythirdparties.thisopennessisdesirableespecialyforacademicuses. 2AnneofGreenGablesastheSubjectofAnalysis 2.1PurposeofAnalysis ThenovelAnneofGreenGablesdescribeshow anorphan,anne,growsupafterbeingadoptedbya familycomprising60-year-oldmathew andhisyoungersistermarila.annebecomesgoodfriendswith Diana,agirlofthesameagelivingintheneighborhood,andcompeteswithaboynamedGilbertatschool. Annespeaksandlaughsalotandisgradualyacceptedasamemberofthefamily. IthasbeenpointedoutthatthefostermotherMarilaplaysanimportantroleinthisstory. ThedevelopmentofthestoryrealyfolowstheeducationofMarila.TherelationshipbetweenAnneand Marilaisthecentral,mostcomplexrelationshipinthenovel,towhichevenAnne srelationshipwithmathew anddiana(andwithgilbert,whichfolowsbehindalofthese)mustyield(doody1997). Doody(1997)statesthatMarilaismoreimportantthanAnne sbestfrienddiana,andgilbertwith whom Annehasafaintromance.Doody(1997)focusesonthechangesinMarila,whogradualylearnsto loveachildthroughherexperienceofbringingupalitlegirl,anne,andpointsoutthatthechangesin Marilaarethecenterofthestory 2. Themainpurposeofconductingtheanalysisdescribedinthisarticleistoconfirm whetherthe quantitativeanalysiscanalsoilustratetheimportanceofmarila.additionaly,readerswhohappentohave readanneofgreengableswilseetheanalysisresultsofaknownstory,andsowilbeabletoconfirm the featuresandreliabilityoftheanalysisapproachbycomparingtheresultswiththestorytheyremember. Furthermore,themosteficientwayforreaderstolearnhowtoanalyzedatausingKH Coderistosimulate thesameanalysisasdescribedinthisarticleontheirownpersonalcomputers.itisalsousefultocheck how theresultschangeifthereaderchoosesoptionsotherthantheexamplesgiveninthisarticleinthe analysiswindow.

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 79 2.2PreparationofData ToanalyzetextdatausingKH Coder,youmustpreparethedataasaplaintextfileorasanExcelfile intheformatshowninfigure1. Figure1:Preparingdata(tutorial_en\anne.xls) Inthedatausedhere(Figure1),thetextofAnneofGreenGablesisenteredinthefirstcolumn (columna).eachcelisfiledwithoneparagraphinthesameorderastheparagraphsarewriteninthe originalnovel.thiscolumnisnamed text.thechapternumberthatcontainstheparagraphisenteredin eachcelinthesecondcolumn(columnb),whichisnamed chapter.moreover,thewholestoryisdivided intofourparts(chapters1to7,8to19,20to28,and29to36),andthepartthatcontainstheparagraph isenteredinthe part column(columnc). InKH Coder,columnssuchas chapter and part arecaled variables.bypreparingthesecolumns, youcanfindthecharacteristicwordsofeachchapterorpart(figure11).additionaly,whenyouretrievea sentence,youcancheckwhichchapterandwhichpartcontainthesentence(figure7).preparinguseful informationasvariableswilgreatlyhelpyouranalysis. 3InstalationandSetupofKH Coder 3.1DownloadandInstalation ThissectiondescribestheinstalationprocedureassumingapersonalcomputerrunningaWindows operatingsystem;theproceduredoesnotapplytolinuxormacintoshoperatingsystems.linuxand Macintoshusersshouldrefertotherelevantpartofthemanualratherthanthissection.Onceinstalationis completed,however,theprocedureforanalysis,whichisdescribedfrom thenextsection,appliestoal Linux,MacintoshandWindowsplatforms. ItisrecommendedtoinstalspreadsheetsoftwareinadvanceforbrowsingtablescreatedbyKH Coder. YoucanusefreesoftwaresuchasLibreO ficeoropeno ficeaswelasmicrosoftexcel. Atpresent,KH Codercanbedownloadedfrom htp://khc.sourceforge.net/en/.thelatestversionat thetimeofwritingis3.alpha.08.downloadthefileforwindows(*.exefile)from thisurlandunzipitas

80 RitsumeikanSocialSciencesReview(Vol52.No.3) Figure2:InstalingKH Coder showninfigure2.then,double-clicktheunzipped kh_coder.exe tostartthesoftware(figure3) 3. IfthemenuandothertextintheKH Coderwindow aredisplayedinjapanese,lookfortheindication of InterfaceLanguage:Japanese onthelowerrightofthewindow.thistextisalwaysdisplayedinenglish. TochangetheinterfacelanguagetoEnglish,click Japanese inthistextandchangeitto English. Figure3:StartingKH Coder

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 81 3.2DesignatingEnglishStopWords Commonwordsfoundinalkindsofwriting,suchas a, an,and the,arenotimportantwordsfor contentanalysis.itmaybedesirabletoremovesuchwordsfrom analysisresults,suchasawordfrequency list.kh Coderalowsuserstoremovesuchwordsfrom thescopeofanalysisandretrievalbydesignating them as stopwords.figure4showstheprocedurefordesignatingstopwords. InFigure4,theexamplelistofstopwordsincludedwithKH Coderisused.However,thewordstobe designatedasstopwordsmayvarydependingonthepurposeofanalysisanddata.insuchcases,youcan addanddeletestopwordsinthewindowshowninfigure4 4. Figure4:DesignatingEnglishstopwords 4Step1:Overview ofthenovel 4.1CreatingaProjectandPre-processing Toperform analysisusingkh Coder,youshouldregisterthedatafiletobeanalyzedinKH Coderas a project andexecutepre-processing.beawarethatifyoumoveordeletethedatafileaftercreatinga project,youwilbeunabletocontinueanalysis. Theprojectcreationprocedurefrom (1)to(4)showninFigure5onlyneedstobecarriedoutonce. Afterward,alistofalready-createdprojectswilbedisplayedifyouclick Project,then Open onthemenu afterstartingkh Coder,alowingyoutochoosetheprojectforwhichyouwanttoresumeanalysisfrom the list. Byprocedure(5)showninFigure5,wordsareextractedfrom thedataandprocessedintoadatabase withtheirposnamesidentified.thisprocessingmaytakeseveraltensofseconds.youcanstartactual

82 RitsumeikanSocialSciencesReview(Vol52.No.3) Figure5:Preparingfordataanalysis analysesafterthepre-processingiscompleted. 4.2FrequentWordsandTheirContexts Asthefirststepofanalysis,letuscheckthewordsfrequentlyappearinginAnneofGreenGables.By theprocedureshowninfigure6,alistofthe150mostfrequentlyoccurringwordsisdisplayed.table1 showsthetop30wordsamongthem;thewordstowhichtheauthorpaysmostatentionarehatchedor underlinedinthistable. Table1showsthatthewordsrepresentingthemaincharactersappearfrequently: ANNE appears 1138times, MARILLA 849times, Diana 414times,and Mathew 361times.Thecharacternamethat mostfrequentlyappearsnexttotheheroine ANNE isnotherbestfriendofthesameage Diana but Anne sfostermother MARILLA.Furthermore,thefrequencyof MARILLA (849times)ismorethan doublethatof Diana (414times).Judgingonlyfrom theappearancefrequency,itisobviousthatmarila playsanimportantroleinthisstory.additionaly,judgingfrom thefactthatthenumberofoccurrencesof Figure6:Creatingawordfrequencylist

Gilbert islessthanthatof Lynde,whichisthefamilynameofAnne sneighbor,gilbertisconsideredto playonlyalimitedrole.however,suchanissueshouldbediscussednotonlyonthebasisofthenumberof occurrencesbutalsothroughamoredetailedanalysis. Table1alsoshowsseveralcommonwordssuchas say and think,whicharelikelytoappear frequentlyinanystory.additionaly,someotherwordshelpusinferthethemeofthestory.theyrepresent thatanorphan girl or child heroinegetsadopted,findsa home,andgoesto school,andrepresent thecolorofher hair aboutwhichsheoncehadacomplex. WheninterpretinganyanalysisresultofKH Coder,notlimitedtoawordfrequencylistsuchasTable 1,besuretoconfirm howeachwordisusedintheoriginaldata.evenifyouobtainanalysisresultssuchas acertainwordfrequentlyappears or acertainwordischaracteristicofacertainpart,theymeannothing unlessyouunderstandthemeaningofthewordinthespecificdatayouanalyze.thisisbecauseeventhe samewordmayhavediferentmeaningsindiferentcontextsorusage.inaddition,ifthelistcontainsa strangewordthatmakesyouthink Whydoessuchawordappearfrequently? or Whyisthiswordlisted asacharacteristicword?,theremaybeachanceofmakingadiscovery.byinvestigatinghowsuchaword isusedinthetext,youmaydiscovercharacteristicsofthedatathatyouhadnotrealizedbefore. Consideringtheabove,theauthorhasdevelopedfunctionalitiesnotonlyforstatisticalanalysisbutalso forflexibledataretrieval.amongsuchretrievalfunctions,thekwicconcordanceisconvenientforchecking thecontextinwhichthewordisused.figure7showshow toretrievedatausingthisfunction.the Document window displayingthewholeparagraph(figure7)alsoshowsthevaluesofthevariablesthat indicateinwhichchapterandwhichparttheparagraphiscontained. NotethatKH Coderextractsandcountseverywordafterconvertingittoitsoriginalform.For example,the952occurrencesof say intable1includethoseof say, says, saying and said. Furthermore,retrievalisperformedfortheoriginalformsinprinciple.Thisisthereasonwhythesearch resultsof child alsoincludethoseof children infigure7.unlessthesetingsaremodified,thewords designatedasstopwords,prepositions,conjunctions,andthelikeareexcludedfrom thescopeofanalysis andretrieval. A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 83 Table1:Listofthe30mostfrequentwords Freq Words Freq Words Freq Words 149 want 283 litle 1138 ANNE 136 home 267 girl 952 say 134 child 260 thing 849 MARILLA 132 Barry 252 tel 486 think 128 school 246 look 414 Diana 126 sit 225 good 364 know 117 night 215 feel 361 Mathew 116 realy 208 time 358 just 114 hair 152 eye 353 come 113 Gilbert 151 Lynde 286 make

84 RitsumeikanSocialSciencesReview(Vol52.No.3) Figure7:Checkingthecontextwherethewordisused 4.3Co-occurrenceNetworkofWords Wenextexplorewhatwordsareusedtogetherfrequentlybygeneratingaco-occurrencenetworkof majorwords.generalyspeaking,youwilbeabletoreadthemainthemesofthedatabyseeingthegroups offrequentlyoccurringwordsthatareoftenusedtogether.forexample,ifthethreewords new, dress, and prety frequentlyco-occurinthedata,itcanbesupposedthatthereisathemeoffashionordressing upinthedata.inthecaseofanneofgreengablesdata,youcanalsoseethelinksbetweenthecharacters toinfertheroleofeachcharacter. Theco-occurrencenetworkhasbeentraditionalyusedincontentanalysistostatisticalyexpressthe data(osgood1959,danowski1993).inthisprocedure,wevisualizetheco-occurrencestructureindataby drawinganetworkconnectingwordsthattendtobeusedtogether.sinceitisanetwork,wemustsee whetherwordsareconnectedbylines.thereisnotmuchmeaningtothepositionsofwords.eveniftwo wordsarenearby,itdoesnotmeanthatthedegreeofco-occurrenceisstrongunlessthosewordsare connectedbyaline 5. Figure8showstheprocedureforgeneratingaco-occurrencenetworkusingKH Coder.InFigure8, 123frequentlyoccurringwordsthatappear50timesormorearedesignatedasthescopeofanalysis.By default,kh Codergeneratesanetworkbyconnecting60pairsofthemoststronglyco-occurringwordsby lines.wordsnotconnectedbylinesareremovedfrom theresultdiagram.todisplaymoreco-occurrences

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 85 Figure8:Generatingaco-occurrencenetwork (lines)andwords,changethenumberintheboxindicatedas Top240 infigure8.infigure8,this numberischangedfrom thedefaultvalueof60to240.insomecases,however,ifyouincreasethenumber ofco-occurrences(lines)tobedisplayedasabove,thelinesmaybeconcentratedonasmalpartofthe networkdependingonthedatastructure,makingtheresultshardtoread.insuchcases,youcanmakethe partwithdenselineseasiertoreadbyusingthe Drawtheminimum spanningtreeonly option.inthecooccurrencenetworkdisplayedasaresult,severalgroupsofwordsstronglyconnectedwithoneanotherare automaticalydetectedanddisplayedwithdiferentcolors.infigure9,tomakeiteasytodistinguishthe groupsinblackandwhite,theboundariesbetweenthegroupsareindicatedwiththickdashedlinesand eachgroupisgivenanumberinparentheses. AlthoughwehaveconfigureddetailedoptionsinFigure2,itisagoodideatojustclickthe OK buton withoutchangingtheoptionandlookattheresultinactualanalysis.wemaythentryincreasingthe numberofco-occurrences(lines)tobedisplayedandselectingotheroptionstoseehowtheresultchanges. WithoutrepeatingtheoperationasshowninFigure8from thebeginning,wecanchangeoptionsby clickingthe Config butononthescreendisplayingtheresult.throughtrialanderror,itwouldbe beneficialtopursueavisualizationsuitedtothecharacteristicsofthedataandthepurposeofanalysis.a diagram thatcontainsnecessaryinformation,thatiseasytoread,andthathasfunctionalbeautywouldbe ideal. Regardingthelinksbetweenthecharacters, Diana, Marila,and Mathew areconnectedcloseto Anne inpart(1)oftheco-occurrencenetwork(figure9).thissuggeststhatthestorydepictstheclose relationshipsbetweenanneandherbestfrienddianna,fostermothermarila,andfosterfathermathew, whereas Gilbert isinratherremotepart(9)andconnectedto Anne via school. Jane, Ruby,and

86 RitsumeikanSocialSciencesReview(Vol52.No.3) Figure9:Co-occurrencenetworkoffrequentlyoccurringwords Josie areinpart(8)andalsoconnectedvia school asanne sschoolmates.itissupposedthatthe themesdescribedinthesceneswhereanne sschoolmatesappearincludestudy,whichisrepresentedby read and book,andfashion,whichisrepresentedby dress, prety,and new.anne sgossipy neighbormrs. Lynde isfoundinpart(2)andisconnectedtoannevia know and tel. Theabovelinksbetweencharactersshow thatmarilaisconnectedmorecloselytoannethanto Gilbertandtheschoolgirls,meaningthatmanyparagraphsareusedtodepicttherelationshipbetween AnneandMarila.ThisisfurtherevidencethatMarilaplaysanimportantroleinthestory.However,Anne s bestfrienddianaisalsoclosetoanne.therefore,infigure9,itisnotclearwhethermarilaordianaplays themoreimportantrole. Next,regardingthelinksofwordsotherthancharacters, feel and imagine aredirectlyconnected withrelativelymanyotherwordsinpart(5).suchco-occurrencewithmanyotherwordsisthoughttomean thatthestorydepictssceneswherethecharacters feel and imagine invariousways,bycombining feel and imagine withmanyotherwords.theword feel isalsofoundinthelistoftop30frequently occurringwords(table1).from theseresults,wecaninferthatfeelingsofthecharacterswereessentialin thestory. 4.4CorrespondenceAnalysis:CharacteristicsofEachPart Thelinksbetweenthemaincharacterswerediscussedintheprevioussection,butinwhichpartofthe noveldoeseachcharacterappear?byseeinghow themajorwords,notlimitedtothenamesofcharacters, changewiththeprogressofthestory,wecanoverview thestoryflow.thissectionthereforedividesthe

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 87 Figure10:Executingcorrespondenceanalysis dataintofourpartsandvisualizesthecharacteristicwordsofeachpartemployingcorrespondenceanalysis (Greenacre2007). Figure10showstheprocedureforcorrespondenceanalysis.ThecorrespondenceanalysisusesColumn C(i.e.,the part column)infigure1.thiscolumnincludesfourkindsofvalues,suchas 01-07 and 08-19.Forexample, 01-07 meansthatthetextinthatlineiscontainedinoneofchapters1through7. ThesecolumnsarecaledvariablesinKH Coder.InFigure10, WordsxVariables andthen part are selected.theanalysisuses123frequentlyoccurringwordsthatappear50timesormoreandespecialy focuseson40wordsforwhichthenumberofappearancesappreciablychangesbetweenparts 6. Figure11showstheresultsofcorrespondenceanalysis.Thevaluesofthe part variable,suchas 01-07,andwords,suchas Anne and Marila,areplotedwithsquaresandcircles,respectively.Using correspondenceanalysis,youcanexplorethecorrespondencebetweenthevariableandwordsbyploting them onthesamediagram.theareaofeachcircleisproportionaltothenumberofoccurrencesofeach word.therefore,themorefrequentlythewordappears,thelargerthecirclebecomes.theareaofeach squareisproportionaltothenumberofwordsinthetextofthatvalue. Incorrespondenceanalysis,uncharacteristicwordsuniformlyfoundinalpartsareplotednearthe origin(0,0)(i.e.,thepointatwhichtheordinateandabscissaarebothzero)whereaswordshavingstrong characteristicsarelocatedawayfrom theorigin.forexample, Cuthbert isplotedfarfrom theoriginin theupperleftoffigure11,meaningthatthewordhasstrongcharacteristics.wethenask,whatarethe characteristicsoftheword Cuthbert?Thewordisfarawayinthedirectionof 01-07 asseenfrom the origin,whichmeansthewordappearsespecialyfrequentlyinpart 01-07.Readingthecharacteristicsof

88 RitsumeikanSocialSciencesReview(Vol52.No.3) Figure11:Correspondenceanalysisofwordsandvariables eachpartfrom thewordsplotedinasimilardirectionasseenfrom theoriginasaboveisthebasicwayof interpretingcorrespondenceanalysis. Inaddition,itisalsoefectivetoseewhereeachvalueofthevariableislocated.InFigure11, 01-07 and 08-19 areawayfrom othervalues,but 20-28 and 29-36 areclosetoeachother.thismeansthat frequentlyoccurringwordsaresimilarfor 20-28 and 29-36,whichsuggeststhatthesetwopartshave similarcontents. WecanreadthefolowingcharacteristicsofeachpartbyviewingthewordsplotedinFigure11as describedaboveandseeinghowtheyareusedinthetext.first,part 01-07 describesthelivelihoodofthe Cuthbert siblings,marilaand Mathew.Theydecidedtoadopta boy from anorphanagetohelpthem runtheirfarm,butagirlnamedannewassentbymistake.atthisstage,anneisoftencaled child insteadofbyherownname.anne,wholikesto imagine variousthings,iseventualyalowedto stay withthe Cuthbert family. Next,in 08-09,Annebecomesagoodfriendwith Diana Barry,agirllivinginherneighborhood, andstartsgoingto school.at school, Gilbert teasesanneaboutherredhair,soshecomestohate him.later,in 20-28 and 29-36,AnneandDianagoseparateways,andAnne sschoolmates,suchas Josie, Jane,and Ruby,becomecharacteristic.Annealsolearnsalotthroughinteractionswithadult womensuchasmrs. Alan,awifeofaminister,andMiss Stacy,Anne sschoolteacher. TheorphanAnneisacceptedfirstbythe Cuthbert family,nextbythe Barry familyinher neighborhood,andeventualybythelocalsocietyincludingherschool.figure11showsthatthestoryasa wholeprogressesinthisway.thefactthat MARILLA islocatedneartheorigininfigure11indicatesthat sheappearsalmostevenlythroughoutthefourparts.

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 89 4.5ClosingRemarksforStep1 Intheprevioussections,weconfirmedfrequentlyappearingcharactersandwordsfrom thefrequency list(table1)andsawlinksmainlybetweenthecharactersintheco-occurrencenetwork(figure9).wealso readtheflow ofthestorythroughoutthenovelfrom thecorrespondenceanalysis(figure11).marila appearsfarmorefrequentlythanalothercharactersexcepttheheroineanne(table1),andher relationshipwithanneappearstobealmostasstrongasdiana s(figure9).sheappearsnotsporadicaly butthroughoutalfourpartsofthestory(figure11). Evenfrom onlytheresultsofstep1oftheanalysis,whichautomaticalyextractswordsandstatisticaly analyzesthem,wecanunderstandtheimportanceofmarilatosomeextent.instep2,theresearchercan focusonanaspectofhis/herowninteresttoseewhatrolemarilaplaysinthestoryinmoredetail. Theanalysissofarhasnotinterpretedthemeaningsorrolesofalthewordsinthefiguresandtables. ThisdoesnotmeanthattheyareomitedbecausehereAnneofGreenGablesisjustanexampleforthe tutorial.eveninactualresearch,wecannotinterpretthemeaningsofalthewords,becausethewordsin thefiguresandtablesareextractedmechanicaly,andthereforeinevitablyincludewordsnotrelatedtothe purposeoftheanalysisandwordsthatdonotinteresttheresearcher.althoughitisdesirabletointerpretas manywordsaspossible,itisimpossibletointerpretal50or100wordsincludingsuchirrelevantwords. Consequently,whatwordsdraw atentionandhow theyareinterpretedwilvarylargelydependingon theresearcher sinterest,whileanyonecanobtainthesamefiguresandtablesifthesetingsarethesame. Sincethisisnotautomaticsummarizationbutanalysis,suchvariationnaturalyoccursdependingonthe researcher spointofview.additionaly,suchvariationwilleadtocreativeandoriginalanalysis. Notes 1 Asourcecodeisaform ofsoftwarethatiseasyforahumantocheckandedit.sourcecodesarenormaly keptsecretinthecaseofcommercialsoftware. 2 TheimportanceofMarilaisdiscussedalsoinJapan(Kawabata2008,Matsumoto2008,Yamamoto2008). 3 TheWindowskey( )usedintheoperationshowninfigure3(1)isnormalylocatednearthelower-left cornerofthekeyboard.ifyoucannotfindthewindowskey,startexplorerandopen C:\khcoder3 manualy. 4 Thedesignatedstopwordscanberestoredtothescopeofanalysisandretrievalasfolows.Click[PRe- Processing]andthen[SelectWordstoAnalyze]from themenuofkh Coder,check OTHER inthe partsof speech paneinthewindowdisplayed,andthenclick OK.Thewordsdesignatedasstopwordsaregivena specialpos(partofspeech)nameof OTHER.ThosewordshavingaPOSnameof OTHER areremoved from thescopeofanalysisandretrievalunlesstheaboveoperationisperformed. Inadditiontothewordsmanualydesignatedasstopwords,wordsnotrepresentingthecontentsofthe writing,suchasprepositionsandconjunctions,areautomaticalyclassifiedas OTHER.Pleaserefertothe manualfordetailsofthepossystem ofkh Coderandhowtomodifythesystem. 5 Positionsofwordsarearrangedtomakethenetworkeasytosee.Forexample,itisbeterforlinesconnecting wordstointersectoroverlapaslitleaspossible.becauseweuserandom numberstocomputethis positioning,thewordplacementmaydiferdependingontheversionofkh Coderandtheoperatingsystem. However,eventhoughthepositioningofwordsvaries,whichwordsareconnectedbylinesorthegrouping resultshowninfigure9doesnotchange.refertothemanualfordetailsofkh Coder salgorithm for generatingco-occurrencenetworks. 6 Aswiththeco-occurrencenetworkabove,inactualanalysis,itisagoodideatofirstclickon OK andseethe resultwithoutchangingdetailedoptions.inthisexample,onceyouselectthevariable part,youcanclick OK withoutchanginganythingelse.afterthat,youcantryothersetings,likeselecting Bubbleplot and

90 RitsumeikanSocialSciencesReview(Vol52.No.3) reducingthenumberofwordsfrom thedefault 60 to 40.Withsuchsmaltrialanderror,thequalityof resultscanbeappreciablyimproved. References Danowski,J.A.,1993, NetworkAnalysisofMessageContent,W.D.RichardsJr.& G.A.Barneteds.,Progresin CommunicationSciencesIV,Norwood,NJ:Ablex,197-221. Doody,M.A.1997, Introduction,W.E.Barry,M.A.Doody& M.E.D.Joneseds.TheAnnotatedAnneofGreen Gables,OxfordUniversityPress,NewYork,9-34. Greenacre,M.J.,2007,CorrespondenceAnalysisinPractice2nded.,BocaRaton,FL:Chapman& Hal/CRC. Higuchi,K.,2004, QuantitativeAnalysisofTextualData:DiferentiationandCoordinationofTwoApproaches, SociologicalTheoryandMethods,19(1):101-15(WriteninJapanese). Higuchi,K.,2014,QuantitativeTextAnalysisforSocialResearchers:AContributiontoContentAnalysis,Nakanishiya Publishing:Kyoto,Japan(WriteninJapanese). Iker,H.P.& N.I.Harway,1969, ComputerSystemsApproachtowardtheRecognitionandAnalysisofContent, G.A.Gerbner,O.R.Holsti,K.Krippendorf,W.J.Paisly& P.J.Stoneeds.,TheAnalysisofCommunication Content:DevelopmentsinScientificTheoriesandComputerTechniques,NewYork:Wiley& Sons,381-486. Kawabata,Y.,2008, SurpriseofMarilaCuthbert Katsura,Y.andShirai,S.eds.TheworldofMasterpiecesWe WanttoKnowMore10:AnneofGreenGables,Minerva:Kyoto,Japan,109-19(WriteninJapanese). Matsumoto,Y.,2008,JourneytotheAnneofGreenGables:HiddenLoveandMystery,NHK Publishing:Tokyo,Japan (WriteninJapanese). Osgood,C.E.,1959, TheRepresentationalModelandRelevantResearchMethods, I.d.S.Pooled.,Trendsin ContentAnalysis,Urbana,IL:UniversityofIlinoisPress,33-88. Osgood,C.E.,G.J.Suci& P.H.Tennenbaum,1957,TheMeasurementofMeaning,Urbana,IL:Universityof IlinoisPress. Saporta,S.& T.A.Sebeok,1959, LinguisticandContentAnalysis, I.d.S.Pooled.,TrendsinContentAnalysis, Urbana,IL:UniversityofIlinoisPress,131-50. Stone,P.J.,1997, ThematicTextAnalysis:New AgendasforAnalyzingTextContent, C.W.Robertsed.,Text AnalysisfortheSocialSciences,Mahwah,NJ:LawrenceErlbaum,35-54. Yamamoto,S.2008,From AnneShirleytoJaneEyre:IntroducingEnglishLiteratureinUniversityClases,University oftokyopress:tokyojapan(writeninjapanese).

A Two-StepApproachtoQuantitativeContentAnalysis(PartI)(HIGUCHIKoichi) 91 接合アプローチによる量的内容分析の実践 ( 一 ) 赤毛のアン を用いた KH Coder チュートリアル 樋口耕一 ⅰ 本稿では, 量的な内容分析を実践するための方法として筆者が提案している 計量テキスト分析 を, 新たな分析事例とともに紹介する 計量テキスト分析において, データを分析する具体的な手順にはいくつかのバリエーションがあるが (Higuchi2014), 本稿では特に 接合アプローチ と呼ばれる手順をとりあげる 第一に, このアプローチと, その実現のために筆者が開発 公開しているフリーソフトウェア KH Coder について概要を手短に紹介する 第二に, このアプローチにもとづいて小説 赤毛のアン を分析する手順を, 読者が自分の PC で同じ分析を行えるチュートリアルの形で記述する 第三に, 分析の結果を踏まえて, 本アプローチの特徴について議論する 本稿で紹介する接合アプローチとは, 従来の内容分析で利用されてきた2つのアプローチを接合したものである 従来の内容分析では, テキスト型データを計量的に分析するために Correlational アプローチか Dictionary-based アプローチを用いることが多かった Correlational アプローチはクラスター分析のような統計手法を用い, 頻繁に同じ文書の中にあらわれる言葉のグループを見つけだすといった方法で, データ中の主題を探索するアプローチである このアプローチは StatisticalAssociation アプローチと呼ばれることもある それに対して Dictionary-based アプローチでは, 統計手法ではなく, 分析者自身の指定した基準にそって言葉や文書を分類し, 計量的な分析を行なう これら2つは考え方が大きく異なるアプローチでありながら, 実際の分析においては混同されやすい部分もあった そこで混同されやすい部分を峻別した上で, これら2 つを接合したものが, 本稿で紹介する接合アプローチである 本稿のチュートリアルでは, この接合アプローチを用いて, 小説 赤毛のアン の原文を分析する 小説 赤毛のアン では, 主人公である孤児のアンが, マシューとマリラの兄弟に引き取られ, 成長していく様子が描かれている この物語においては養母マリラの果たした役割が非常に大きいという指摘がある 親友のダイアナや, アンとの淡いロマンスが描かれるギルバートよりも, マリラの方が中心的であったという また 赤毛のアン は, マリラが子供を愛することを学び, それによって自分自身も幸せになっていくという, 大人の成熟と生き直しの物語であると指摘されている 本稿の分析では, こうしたマリラの重要性を, 計量的分析からも読み取ることができるのかどうかを確認する なお本稿の前半をここに掲載する 後半については本誌の将来の号に掲載の予定である キーワード : 量的内容分析,KH Coder, 赤毛のアン, チュートリアル, 計量テキスト分析 ⅰ 立命館大学産業社会学部准教授