ISCO自動コーディングシステムの分類精度向上に向けて―SSM およびJGSS データセットによる実験の結果―

Similar documents
fiš„v8.dvi

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

29 jjencode JavaScript

情意要因が英語の読解力と会話力に及ぼす影響-JGSS-2008 のデータから-

[2] , [3] 2. 2 [4] 2. 3 BABOK BABOK(Business Analysis Body of Knowledge) BABOK IIBA(International Institute of Business Analysis) BABOK 7

untitled

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

所得の水準とばらつきの時系列的推移について-JGSSと政府統計の比較-

06’ÓŠ¹/ŒØŒì

橡表紙参照.PDF



IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

‰gficŒõ/’ÓŠ¹

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

untitled

Vol. 9 No. 5 Oct (?,?) A B C D 132

大学における原価計算教育の現状と課題

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

chisq.test corresp plot

_™rfic

5 5 5 Barnes et al

, 3 2 Marshall [1890]1920, Marshall [1890]1920

日本における結婚観の変化―JGSS累積データ を用いた分析―

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

日本人の子育て観-JGSS-2008 データに見る社会の育児能力に対する評価-

教職教育センター紀要 9号☆/表紙(9)

DEIM Forum 2010 A Web Abstract Classification Method for Revie


28 Horizontal angle correction using straight line detection in an equirectangular image

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

Adult Attachment Projective AAP PARS PARS PARS PARS Table

e-learning station 1) 2) 1) 3) 2) 2) 1) 4) e-learning Station 16 e-learning e-learning key words: e-learning LMS CMS A Trial and Prospect of Kumamoto

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

日本人英語学習者の動機付け―JGSS-2003のデータ分析を通して―

untitled

Ł×

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

24312.dvi

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

親からの住宅援助と親子の居住関係-JGSS-2006 データによる検討-


2-栗原.TXT

kut-paper-template.dvi

Web Web Web Web Web, i

FA FA FA FA FA 5 FA FA 9

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

@08470030ヨコ/篠塚・窪田 221号

johnny-paper2nd.dvi

untitled

雇用不安時代における女性の高学歴化と結婚タイミング-JGSSデータによる検証-

05_藤田先生_責

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

ISO GC 24

IPSJ-TOD

(2008) JUMAN *1 (, 2000) google MeCab *2 KH coder TinyTextMiner KNP(, 2000) google cabocha(, 2001) JUMAN MeCab *1 *2 h

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

08-特集04.indd

地方債と地方財政規律

ブック

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

( )

クレジットカードの利用に関する一考察―JGSS-2005の分析から―

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

ABSTRACT

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2015-DBS-162 No /11/26 1,a) 1,b) EM Designing and developing an interactive data minig tool for rapid r

emarketer SNS / SNS 2009 SNS 15 64

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

IPSJ SIG Technical Report Vol.2013-GN-86 No.35 Vol.2013-CDS-6 No /1/17 1,a) 2,b) (1) (2) (3) Development of Mobile Multilingual Medical

3_23.dvi

Juntendo Medical Journal

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

IT,, i

untitled

社会関係資本と外国人に対する寛容さに関する研究―JGSS-2008 の分析から―

fiš„v5.dvi

Table 1 Utilization of Data for River Water Table 2 Utilization of Data for Groundwater Quality Analysis5,6,9,10,13,14) Quality Analysis5-13) Fig. 1 G


本文.indd

日本版 General Social Surveys 研究論文集[2]

/ p p

”Лï‡Æ™²“¸_‚æ4“ƒ__‘dflÅPDF‘‚‡«‘o‡µ.pdf

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

,,.,,.,..,.,,,.,, Aldous,.,,.,,.,,, NPO,,.,,,,,,.,,,,.,,,,..,,,,.,

29 Short-time prediction of time series data for binary option trade

1 1 tf-idf tf-idf i


04.™ƒ”R/’Ô”�/’Xfl©

untitled

個人消費支出からみた戦間期の景気変動:LTES個人消費支出の再推計

e-learning e e e e e-learning 2 Web e-leaning e 4 GP 4 e-learning e-learning e-learning e LMS LMS Internet Navigware

先端社会研究 ★5★号/4.山崎

(2003)

41-1 松本・廣津・吉村.pwd

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of


Transcription:

ISCO SSM JGSS Improvement of Classification Accuracy in an ISCO Automatic Coding System: Results of Experiments Using both the SSM Dataset and the JGSS Dataset Kazuko TAKAHASHI Faculty of International Studies Keiai University In social surveys, we need to conduct the occupation coding when occupation data is obtained by open-ended questionnaire. Conducting the occupation coding manually is a time-consuming and complicated task and sometimes leads to inconsistent coding results when coders are not experts. For this reason, the automatic coding system, which is a combination of a rule-based method and Support Vector Machines (SVMs), has been developed and used for SSM occupation codes, which are usually used in Japanese social surveys. Recently, coders are often requested to conduct both the SSM occupation coding and the ISCO (International Standard Classification of Occupation) coding. Therefore an automatic coding system for ISCO codes should be also developed. The purpose of this paper is to report results of experiments designed for improvement of classification accuracy in the ISCO automatic coding system by using real datasets from the 2005SSM surveys and the JGSS. Key Words: JGSS, ISCO automatic coding system, Support Vector Machines SSM ISCO 2 ISCO SSM ISCO SSM SSM JGSS JGSS ISCO 193

1. ISCO International Standard Classification of Occupation SSM JGSS 2 1 1984 (1) 2004, 2006 (2) 1988 ILO ISCO -88 ISCO Bureau of Statistics, International Labour Office 2001 ISCO-68 JSCO ; Japanese Standard Classification of Occupation JSCO JSCO SSM Social Stratification and Social Mobility 95 SSM 1995 SSM 1995 2000, 2004 SSM ISCO 2 ISCO 4 10 28 116 390 task duty job SSM 3 196 ISCO ISCO-68 skill level skill specialization 2008 SSM ISCO (3) (4) Kunz 2003, Creecy et al. 1992, Riviere 1994, 2004 2-gram (5) 3-gram 2 SSM 2001; 2002; 2003; 2004; 2005, Takahashi et al. 2005 ISCO SSM ISCO SSM SSM ISCO ISCO 194

2 SSM ISCO ISCO (6) ISCO ISCO 400 2007; 2008 ISCO 2003 SSM 767 2.2 2010 ISCO (7) ISCO 2005 SSM 2005SSM JGSS ISCO 2008 SSM 2005SSM JGSS JGSS (8) JGSS SSM 2005 SSM ISCO ISCO ISCO 2 2005SSM JGSS ISCO 3 4 5 195

2. SSM ISCO 2.1 SSM SSM 3 1 (9) 2000 2 (10) 2004 3 4 (11) 2005, Takahashi et al. 2005 add-code add-code JGSS-2003 80.7 (12) 2004 ISCO 1 2.2 ISCO 2.2.1 add-code add-code ISCO 2006 SSM ISCO add-code 2 SSM add-code SSM SSM SSM 2 2008 2003 SSM SSM SSM 3.3 9.6 SSM SSM SSM SSM SSM 2.2.2 1 ISCO SSM 2008 2008 1 196

1 2008 4 4 2 3 1 7 1.3 5 2008 ISCO 3. 3.1 ISCO 2 1 2 SSM JGSS 1 SSM 2 SSM 4 2005 SSM baseline (13) (13) 10 13 SSM 200 SSM 3.2 3.2.1 SVM (14) SVM 2 one-versus-rest Kressel 1999 SVM SSM 3.2.2 2005SSM JGSS JGSS-2006 JGSS-2008 JGSS-2010 2 2005SSM 4,133 5,542 2,915 3,499 16,089 JGSS-2006 JGSS-2008 JGSS-2010 ISCO SSM 2,224 1,375 2,570 197

2 2 5 4/5 1/5 5 5 2 3-1 3-2 2 3.1 4 1 2 1-1 2005SSM JGSS-2006 JGSS-2008 JGSS-2010 5 1 2005SSM & JGSS-2006 2005SSM & JGSS-2006 & JGSS-2008 1-2 2005SSM & JGSS-2006 & JGSS-2008 & JGSS-2010 5 2005SSM 2005SSM JGSS-2006 2 JGSS-2008 2 JGSS-2010 3-1 JGSS-2006 JGSS-2006 & JGSS-2008 JGSS-2010 3-2 2005SSM & JGSS-2006 2005SSM & JGSS-2006 & JGSS-2008 3.2.3 classification accuracy recall 4. 2 3.1 4.1 1-1 1-2 1-1 SSM 3 2.2.2 2005SSM 198

3 4 2005SSM JGSS-2006 JGSS-2008 JGSS-2010 15,271 1,779 1,086 2,056 baseline 0.6834 0.5323 0.5356 0.6051 0.6832 0.5323 0.4945 0.6051 0.7448 0.5863 0.6571 0.6882 0.7425 0.5863 0.6531 0.6882 1-2 2005SSM JGSS 4 SSM 2005SSM 6.1 JGSS 17.4 1-1 13 2.2.2 1-1 2.2.2 7 5 1 4 4 4 2005SSM & JGSS-2006 17,050 2005SSM & JGSS-2006 & JGSS-2008 18,136 2005SSM & JGSS-2006 & JGSS-2008 & JGSS-2010 20,192 baseline 0.6780 0.6308 0.5513 0.6785 0.6342 0.5536 0.7368 0.7156 0.7252 0.6833 0.6849 0.6851 4.2 2 3-1 3-2 JGSS 2005SSM 2 2005SSM 2005SSM JGSS 5 2005SSM 5 2005SSM 2005SSM 6.6 199

10.3 JGSS 3 JGSS-2008 2005SSM JGSS-2006 JGSS-2010 4 ISCO JGSS-2010 3 ISCO 3418 5249 5164 2008 (15) JGSS-2010 3410 3412 0110 JGSS-2010 JGSS 2005SSM JGSS-2010 2005SSM JGSS JGSS-2010 SSM SSM 5 2005SSM JGSS 2005SSM 16,089 2005SSM JGSS-2006 JGSS-2008 JGSS-2010 0.6834 0.5899 0.5770 0.5697 0.6832 0.5863 0.5925 0.5805 0.7448 0.6093 0.6699 0.6997 0.7425 0.6057 0.7015 0.6482 0.7010 0.5978 0.6352 0.6245 3-1 2005SSM JGSS 6 JGSS-2006 JGSS-2008 JGSS-2010 6 JGSS JGSS-2006 JGSS-2008 SSM 61.5 JGSS 5 8.5 2005SSM 200

6 JGSS JGSS-2010 JGSS-2006 JGSS-2006 & JGSS-2008 2,224 3,581 0.5148 0.5529 0.5148 0.5502 0.5852 0.6148 0.5852 0.6109 3-2 2005SSM JGSS 7 JGSS-2010 2005SSM JGSS-2006 JGSS-2008 SSM 5 7 2005SSM 4.6 SSM 5 7 2005SSM JGSS 7 2005SSM JGSS JGSS-2010 2005SSM & JGSS-2006 2005SSM & JGSS-2006 & JGSS-2008 18,313 19,670 0.5724 0.6016 0.5856 0.5969 0.6412 0.6533 0.6191 0.6521 ISCO 20,000 1 SSM 2 SSM 2005SSM SSM 2005SSM JGSS JGSS 1 201

1 3 X Y JGSS-2010 JGSS 2005SSM 2005SSM JGSS 1 5 (16) Takahashi et al. 2008 1 JGSS-2006 & JGSS-2008 2 2005SSM 3 2005SSM & JGSS-2006 & JGSS-2008 5. ISCO SSM JGSS SSM SSM ISCO SSM ISCO SSM ISCO 1 2 ISCO ISCO ISCO Web SSM ISCO 202

Acknowledgement General Social Surveys JGSS JGSS 2005 SSM 2005 SSM 1 SOC 2000 ASCO2 SOC NOC-S2001 353 340 449 520 ASCO2 SOC 986 821 2 1970 1980 Rubin 2004 3 SSM95 ISCO-88 2005 SSM 2004 2003 1 1 30 40 2005 SSM 2005 SSM ISCO SSM ISCO SSM ISCO 2008 4 Precision Data AIOCS PACE ACTR SICORE Keogh 1998 5 work at the office 2-gram work at at the the office 2-gram wo or ic ce 6 ISCO 7 2006 8 JGSS 34,521 JGSS-2003 15,000 8.3 9 1995 SSM 1996, 2001, 2006 ROCCO Rule-based OCcupation COding ROCCO JGSS 10 SVM Vapnik 1998, Sebastiani 2002 SVM 11 12 13,300 20,000 13 JUMAN 1998 203

14 http://chasen.org/~taku/software/tinysvm/ 3.3.1 15 3 1 16 SVM 1995 SSM, 1995, SSM 95 1995 SSM. 1995 SSM, 1996, 1995 SSM 1995 SSM. 2005 SSM, 2004, SSM95 ISCO-88. Bureau of Statistics, International Labour Office, 2001, Coding Occupation and Industry, Bureau of Statistics; International Labour Office. Creecy, R. H., Mas, B. M., Smith, S. J., and Waltz, D. L., 1992, Trading Mips and Memory for Knowledge Engineering, Communication of the ACM 35(8): 48-63. Dumais, S., Platt, J., Hecherman, D., and Sahami, M., 1998, Inductive Learning Algorithms and Representations for Text Categorization, Proceedings of the ACM-CIKM98: 145-155. Gillman, D. W., and Appel, M. V., 1999, Developing an Automated Industry and Occupation Coding System for CENSUS 2000, 2000 Proceeding of the American Statistical Association Annual Meeting, Government Statistics Section., 1984,., 2006, 2005 SSM SSM95. Joachims, T., 1998, Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Proceedings of the European Conference on Machine Learning: 137-142., 2008, SSM EGP SIOPS ISEI 2005 SSM 12 16 19 2005 SSM : 69-94. Keogh, G., 1998, Automatically Coding Occupation Description from the 1996 Census of Population of Ireland, Technical report in Central Statistic Office (CSO). Kunz, C., 2003, CENSUS: OCCUPATION (Census Paper No.03/06), Australian Bureau of Statistics., 1998, JUMAN version 3,61. Kressel, U., 1999, Pairwise classification and Support Vector Machines, Scholkopf, B., Burgesa, C. J. C., and Smola, A. J. [eds.], Advances in Kernel Methods Support Vector Learning, The MIT Press, 255-268., 2006,., 2001, SSJ Data Archive Research Paper Series JGSS-2000., 2006, No.57., 2004, 2 : 46-77. Riviere, P., 1997, SICORE - general automatic coding system, Statistical Data Editing Vol.2 Methods and Techniques, United Nations Statistical Commission and Economic Commission for Europe, 222-231. Rubin, D. B., 2004, Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons, Hoboken New Jersey. 204

Sebastiani, F., 2002, Machine Learning Automated Text Categorization, ACM Computing Surveys 34(1): 1-47., 2004,., 2000, SSM 15(1): 149-164., 2001, A 2 8(1): 31-52., 2002, JGSS-2000 General Social Surveys 1: 171-183., 2003, JGSS-2001 General Social Surveys 2: 179-191., 2004, ROCCO SVM General Social Surveys 3: 163-174., 2004, 15(1): 177-196., 2005, 12(2): 3-24. Takahashi, K., Takamura, H., and Okumura, M., 2005, Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules, Bao, H. T., David, C., and Huan, L. [eds.], Advances in Knowledge Discovery and Data Mining Proceedings Series: Lecture Notes in Computer Science Subseries: Lecture Notes in Artificial Intelligence 3518: 269-279, Springer-Verlag Berlin Heidelberg., 2008, ISCO 2005 SSM 12 16 19 2005 SSM : 47-68., 2008, 15(2): 3-38. Takahashi, K., Takamura, H., and Okumura, M., 2008, Direct estimation of class membership probabilities for multiclass classification using multiple scores, Knowledge and Information Systems (KAIS), 19(2): 185-210, Springer-Verlag, London., 2010, 24 https://kaigi.org/jsai/webprogram/2010/pdf/260.pdf, 2008, SSM ISCO-88 2005 SSM - 16 19 2005 SSM, 31-47., 2008,. Vapnik, V., 1998, Statistical Learning Theory, John Wiley, New York. 205