2003 02673006 1,.,,.,.,,. 2 SQL,.,,.,.,, SQL., Apriori[1]., [2].,.,.,. 3... ( 1)..,. SQL [3], [4]. 1: 4 ( ). 0.4%, 2.5%, MRSA, 17, 98. 2. 2: 5,. [1] R.Srikant, R.Agrawal, Mining Generalized Association Rules, the 21st VLDB Conference, 1995. [2],,,, SIG-FAI-A101-2(9/21). [3],, 56, 2003. [4],,, 2002.
1 2 2 4 2.1... 4 2.2 (Apriori)... 6 3 SQL 10 3.1 SQL... 10 3.2... 14 3.3... 15 3.4... 16 4 17 4.1... 17 4.2... 18 4.3... 21 4.4... 22 4.5... 24 5 27 5.1... 27 5.2... 28 6 34 35 36 1
1,.,.,.,.,.,. [10].,,..,, [11, 12, 13].,.,., 1995 1998 4.,,,, 178.,,, 2 [5].,,.,.,.,,.,,.,.,, [19]. Apriori[1, 2] 2
, SQL [3, 6]., Apriori.,.,.,.,, [4, 15, 16].,,., XML[7]. XSLT[8],., Web, VML.,,, [14, 18].,,.,., Internet Explorer,.,,., ASP(Active Server Pages) [9].,.,,.. 2 Apriori. 3, SQL, 4. 5, 6. 3
2,, SQL Apriori[1, 2]. 2.1, 1995 1998 4 Microsoft Excel. 2.1. CSV, 1 178 8, 32MB. ID,,,,,,,,,,,,,,,,, WBC,,,,,,,,,,, biocode,vitek,, MIC PCG, PCs, Aug, PCs, CEPs1, CEPs2, CEPs3, CEPs4, CEPs, AGs,MLs,CPs,TCs,CBPs,VCM,RFP,FOM 2.1,.,. 2.1, 2.1.,. 4
PCG : PCs : Aug : PCs- : CEPs-1 : 1 CEPs-2 : 2 CEPs-3 : 3 CEPs-4 : 4 CEPs- : AGs : MLs : TCs : LCMs : CPs : CBPs : VCM : 2.1,., 2.2., 2.2 MRSA( ) [11, 12, 13, 19]., 4 MRSA [17]. = Staphylococcus aureus,pcg= R, PCs= R,CEPs-1= R,AGs= R = St.aureus(MRSA) = St.aureus = St.aureus MRSA,, CSV Microsoft Access, Microsoft Access 5
[9]. Microsoft Access, 8,. Microsoft Access, SQL [15, 14, 16]. SQL 3.1. 2.2 (Apriori),., X Y (X Y). Agrawal Apriori[1, 2],.,, Apriori.. X,,. D = {(tid,t tid ) T tid X} (transaction database). tid ID(transaction ID), T tid (transaction)., ID,, D = {T T X}.,. =., 2.3,, 1, 0. tid i 1 i 2 i 3 i 4 i m 1 i m transaction 1 0 1 0 1 0 1 T 1 = {i 2,i 4,,i m } 2 1 0 1 1 1 0 T 2 = {i 1,i 3,i 4,,i m 1 } 3 0 1 1 0 1 1 T 3 = {i 2,i 3,,i m 1,i m }........... n 1 1 0 1 1 0 T n = {i 1,i 2,i 4,,i m 1 } 2.3 A, B X, A B = B 6=., A B (association rule). {A 1,,A k } {B 1,,B h }, A 1 A k B 1 B h 6
. A 1 A k B 1 B h (premise) (concequence)., (definite association rule). D A X, D A (frequency)freq D (A) : freq D (A) = {T D A T }., D A B (support)supp D (A B) (confidence)conf D (A B) : supp D (A B) = freq D (A B), D conf D (A B) = freq D (A B). freq D (A), A B,, A B,, A B., D. D, : minsupp minconf D., : 1.. (large itemset). 2. 1,. 1, Agrawal, Apriori., k k (k-itemset). L k C k : L k = {hx, ni X : k, freq(x) =n} (k 1), C k = {hi,ni X : k, freq(x) =n} (k 2)., c = hx, ni C k, L k.item = {X hx, ni L k }, C k.item = {X hx, ni C k }, 7
ID 100 3,4,5 200 1,5,7,9,10 300 5,9,10 400 4,6,8,10 500 1,2,5,7,9,10 600 3,7,10 700 1,4,7,9 800 1,5,7,10 2.4 D 1 {1} 4 {2} 1 {3} 2 {4} 3 {5} 4 {6} 1 {7} 5 {8} 1 {9} 3 {10} 5 1 {1} 4 {3} 2 {4} 3 {5} 4 {7} 5 {9} 3 {10} 5 2 {1, 5} 3 {1, 7} 4 {1, 9} 3 {1, 10} 3 {5, 7} 3 {5, 9} 3 {5, 10} 3 {7, 9} 3 {7, 10} 4 {9, 10} 2 C 1 L 1 L 2 2.5 D C 1, L 1 L 2., L k., C k (candidate itemset). 1 2.4 D., 25%., D 1 C 1 1 L 1 2.5., L 1, C 2, {1, 3}, {1, 4}, {1, 5}, {1, 7}, {1, 9}, {1, 10}, {3, 4}, {3, 5}, {3, 7}, {3, 9}, {3, 10},. L 2. 2.5. L 3, L 4, L 5 2.6. Apriori, 2 : 8
3 {1, 5, 7} 3 {1, 5, 9} 2 {1, 5, 10} 3 {1, 7, 9} 3 {1, 7, 10} 3 {1, 9, 10} 2 {5, 7, 9} 2 {5, 7, 10} 3 {5, 9, 10} 2 {7, 9, 10} 2 4 {1, 5, 7, 9} 2 {1, 5, 7, 10} 3 {1, 5, 9, 10} 2 {1, 7, 9, 10} 2 {5, 7, 9, 10} 2 5 {1, 5, 7, 9, 10} 2 2.6 D L 3, L 4 L 5 freq(l) L., minconf freq(l A) A L, (L A) A., L, A L. Apriori,, Apriori. 9
3 SQL, SQL.,.. 3.1 SQL 2.2, Apriori.,,. SQL(Structured Query Language) [3]. SQL[6],. SQL 3.1. SELECT / / FROM / / WHERE / / GROUP BY / / HAVING / /; 3.1 SQL FROM, SELECT. GROUP BY,. WHERE HAVING,. WHERE, HAVING GROUP BY. GROUP BY,, Apriori., 23. Apriori 1, 23., SQL 1 10
. SQL 1, SQL. HAVING count, GROUP BY. WHERE,,. 2 area_d, ID 00000, 3 3.2 SQL. SELECT FROM area_d WHERE ID = 00000 and = GROUP BY HAVING count(*)=>3; 3.2 2 SQL WHERE, ID = 00000 =, GROUP BY. HAVING count 3 SELECT. SQL,. 3.3 5. ( ) 3.3 3.2,.,.,, A B B., area_d. area_b area_d., 11
,. area_d area_b. k=1, 5 : 1. k=2 2. k area_b. 3. 2 k, k area_d 4. k, 5. 5. (k+1) 2. Apriori 1, 2.,. SQL., 2. 2.2, A B min sup. supp D (A B) = freq D (A B) D > min sup,. D, area_d., : freq D (A B) > min sup * D SQL 3.4. SELECT GROUP BY, k, k. WHERE,,. SELECT, GROUP BY, WHERE. SQL, SELECT / /, count(*) FROM area_b WHERE / / GROUP BY / / HAVING count(*) => min_sup* D ; 3.4 12
. SELECT count freq D (A B),,. 3. 2,. A B min conf. conf D (A B) = freq D (A B) freq D (A) > min conf., freq D (A B) freq D (A), SQL., 3.5 SQL freq D (A). SQL freq D (A) 2 freq D (A B) SELECT / /, count(*) FROM area_d WHERE / / GROUP BY / / 3.5 freq D (A), : min_conf*freq D (A) => freq D (A B),. 3, 4 5, 2. MRSA., area_d area_sa, area_b MRSA area_mrsa.. 3 area, 5, 20, MRSA., 10,000. MRSA 1,000., 10,000 3.6 SQL, MRSA 1,000 3.7 SQL., area_sa area_mrsa. 13
SELECT count(*) FROM area SELECT count(*) FROM area WHERE MRSA; 3.6 3.7 MRSA, 1 SQL 3.8. 3.8 SQL. 3.8 SQL WHERE,. freq D (A B), freq D (A B) freq D (A). freq D (A B) 3.8 SQL count(*).. SQL. 3.8 SQL freq D (A) SELECT count(*), SQL., 3, 20 3.8 freq D (A B) freq D (A) : freq D (A B) => 0.2*freq D (A) freq D (A) 1. 1. SELECT, count(*) FROM area_b GROUP BY HAVING count(*) => 0.05*10000; SELECT, count(*) FROM area_d WHERE / / GROUP BY ; 3.8 3 SQL 3.2,.,. 3.1 SQL,.,..,., 14
,,. MRSA, :,,,,,,,,,,,,,, biocode, lactamese, VITEK 3.3,,.,, [4].,,..,,,.,, 1, 2 : --> --> --> --> --> 3.9. --> --> --> --> : --> --> --> : --> --> --> : 3.9 15
3.4 XML(eXtensible Markup Language). XML,. XML 3.10. < > < > < > < > < > </ > < > </ > : </ > < > : </ > : </ > < > : </ > </ > < > < > : </ > </ > < > : </ > </ > 3.10,,.,. 4.2. 16
4..,,.,,. 4.1,, [5].,,,.,,..,,.,.,, 5, ( 4.1). 1. 2., 3., 4., 5., 17
4.1 5 5,.,,,. 2 5,,., 3.1 SQL Apriori.,, [18]. 4.2 4.3,. 4.2 3.1 SQL. 8. 1..,,,, ( 4.2). 2., SQL FROM area_d SQL. 18
3. 2 area_d,, SQL WHERE area_b. 4. k =1. 5. area_b k SQL. 6. 5, area_d k SQL. 7. k (k+1), k = k+1 5. 8. 8. ( 4.4). 4.2 2,, 4.3., 4.5.,. 19
4.3 4.4 1 20
4.5 2 4.3., 3.1 SQL..,.,.., 2.2.,,., XML, XSLT, VML, Web.,. XML. XML 3.2, 1, [7]. XML, XML XSL XSLT(XSL Transformations) [8]. XML,. XSLT, HTML, ( 4.6).., XSLT 21
4.6., XSLT.,, Web ( 4.7). Web Web Excel,, (Microsoft PivotTableR) (Microsoft Pivot ChartR) Microsoft ActiveXR. (, ),, Microsoft Execel., Gif, HTML.,, ( 4.8) VML. VML(Vector Markup Language) XML. 4.4,.,.., Internet Explorer, InternetExplorer.,,, DHTML. DHTML(Dynamic HTML) Web HTML. DHTML ( 4.9). 22
4.7 Web 4.8 VML 23
4.9., ( 4.10). 4.5,.,.,.,., ASP(Active Server Pages) [9]. ASP Microsoft, Web., Web., JavaScript VBScript., 4.11., 4.4 DTML. : SQL 24
4.10 XML XML XSLT HTML XML XSLT 25
4.11 26
5,. 5.1, MRSA. : OS: CPU: : WindowsXP Professional Intel Pentium 1000MHz 752MByte RAM, : 95 01 98 12 :MRSA :0.4% :2.5% :,,,,,,,,,,,,,, biocode, lactamese, VITEK, MRSA 17., 17 3 : 1. MRSA 2., 3. 4. 1,. 2, WBC,,,,,,,,, MIC 11 27
. 3,,,.,. 4.,,,., 5 : : : : : : ID. ID, 5 : : : : (VML) : : (VML) : 5.2, 5, 98. 1 5.1 10. 5.1 1. k 5.2.,. 5.1 = = 2,, 2., MRSA.,,.. 5.3 31, 5.4. 3 1.,. 5.5,,.,., : 28
(%) (%) = 1.78 2.8 = 0.45 2.74 = ( ) 0.75 2.92 = ( ) 1.64 2.53 1= 0.53 4.75 = 1.91 2.66 = 2.01 2.73 = 0.47 4.16 = 0.83 2.76 lactamese= - 2.29 3 5.1 1 k 1 10 2 32 3 37 4 18 5 1 5.2 { =, = ( ), = ( ), lactamese= - } {MRSA },., ID,,. 3.1,. Apriori 1 21,, 10., Apriori.,., 5.1 1 = ( ) =, = 29
k 1 8 2 14 3 8 4 2 5.3 =., 5.6. 5.6 10.,..,.,,. 30
(%) (%) = 1.78 2.8 = 0.45 2.74 = ( ) 0.75 2.92 = ( ) 1.64 2.53 = 0.53 4.75 = 0.47 4.16 = 0.83 2.76 lactamese= - 2.29 3 =, = 0.41 3.4 =, = ( ) 0.63 3.36 =, = ( ) 1.34 3.14 =, = 0.41 4.76 =, = 0.61 3.16 =, lactamese= - 1.7 3.44 =, lactamese= - 0.43 3.38 = ( ), = ( ) 0.55 2.88 = ( ), lactamese= - 0.71 3.54 = ( ), = 0.63 3.32 = ( ), lactamese= - 1.54 3.1 =, lactamese= - 0.51 5.79 =, lactamese= - 0.47 5.03 =, lactamese= - 0.77 3.32 =, = ( ), = ( ) 0.53 3.76 =, = ( ), lactamese= - 0.59 4 =, = ( ), = 0.51 4.12 =, = ( ), lactamese= - 1.26 3.81 =, =, lactamese= - 0.57 3.74 = ( ), = ( ), lactamese= - = ( ), =, lactamese= - =, = ( ), = ( ), lactamese= - =, 1= ( ), =, lactamese= - 0.51 3.44 0.57 3.79 0.49 4.46 0.47 4.76 5.4 :MRSA, :0.4%, :2.5%, :15 31
5.5 32
(%) (%) (ms) (ms) 0.3 2 202 178918 139 162193 0.3 2.5 101 160401 76 142874 0.3 3 10 25071 8 22783 0.4 2 107 100327 74 89879 0.4 2.5 40 66245 32 61348 0.4 3 4 16892 2 15512 0.5 2 66 72861 48 64573 0.5 2.5 24 47036 23 45285 0.5 3 2 13217 1 12138 5.6 33
6,. SQL,., ASP. Oracle, SQL Server, VBA,., IE,. ActiveX XML, Web,.,., IE., ASP, ActiveX, XML,,,.,. SQL,,.,..,., SQL,.,.,,.,.,..,.,.,. 34
,,.,,,.,,.,,,. 35
[1] R.Srikant, R.Agrawal, Mining Generalized Association Rules, the 21st VLDB Conference, 1995. [2] R.Sarawagi, S.Thomas, R.Agrawal, Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications, ACM SIGMOD Record Volume 27, 1998. [3] S.Thomas, R.Sarawagi, Mining Generalized Association Rules and Sequential Patterns Using SQL Queries, Knowledge Discovery and Data Mining, 1998. [4],,,, SIG-FAI-A101-2(9/21). [5],,, 2002. [6], SQL,, 2001. [7], XML,, 2001. [8] PROJECT KySS/, XSLT+XPath,, 2001. [9], ASP Web,, 2001. [10],,, 25 1, 2002. [11], (1) MRSA, 50, 2003. [12], (2) MRSA,, 50, 2003. [13], (3) MRSA DNF, 50, 2003. 36
[14], (4), 50, 2003. [15],, 56, 2003. [16],,, 2003. [17], XML,, 2001. [18],,, 2002. [19],,, 1999. 37