Microsoft Excelを用いたケモメトリクス計算(4)–主成分回帰–

J. Comput. Chem. Jpn., Vol. 10, No. 1, pp. 32 43 (2011) 2011 Society of Computer Chemistry, Japan Microsoft Excel(4) *, 183-85093-5-8 *e-mail: yosimura@cc.tuat.ac.jp (Received: October 21, 2010; Accepted for publication: December 1, 2010; Advance publication: March 23, 2011) Microsoft Excel (Excel) 4 NIPALS ( ) Lambert Beer : Microsoft Excel,,,, NIPALS 1 [1 4] Microsoft Excel ( Excel ) (principal component regression: PCR) [1,3 5] Excel PCR (principal component analysis: PCA) [1 4,6] (multiple linear regression: MLR) PCR MLR PCA 3 (33 ) [7 12]255 Excel [13] PCA " " Excel PCA [14] PCA PCR MLR Lambert Beer [14](LBA) 32 DOI: 10.2477/jccj.H2218

Microsoft Windows Excel Mac OS Excel Excel 5.0 (45 ) 2 2.1 [14] MLR PCR y xm M y = f x (1) m= 1 m m fm (y' y ) MLR M N M N MLR N M MLR (principal component regression: PCR) MLR PCR 2 1 (x1) 2 (x2) 4 n 1 (x1n) 2 (x2n) x1x2 Figure 124 ( Figure 1 Appendix )MLR x1 (x1n) x2 (x2n) {Figure 1 (a)} Figure 1 (b) x1x2 p1 p2 n p1p2 t1n, t2n t1n, t2n p1p2 t1n, t2nmlr PCR PCA pi pi ( i ) pi tin si pi si i = 1, 2, tin yn tin Figure 1. The difference of axes and explanatory variables between MLR and PCA. DOI: 10.2477/jccj.H2218 33

( t t ) t = (2) i i1 in MLR PCR pi PCA, PCR [1 4] 2.2 nonlinear iterative partial least-squares (NIPALS) [4] NIPALS [14] L(PCA ) N M (l 1 ~l M) n an ( ) ( ) cn A C PCR 2.2.1 PCA pi, ti 1. ( ) Acal Ccal Aavg, Cavg X1, Y ( ) X x x A A 1 = 1 N = avg, ( ) Y = y y = C C (3) 1 N avg X1, Y1 PCR i = 1 2. pi( ) 3. ti si t = p X (4) T i i i 4. pi T i i T i i p i = X t X t p i = p i 3. 5. 5. Xi+1 X = X p t i+ 1 i i i (7) i = i+12. pi M > N pi i = N Xi0 (0) pi (L < N) i = L 2.2.2 PCR 6. pi, ti P, T 7. Y T Q Y Y T Y = QT (8) Q T T 1 Q = YT ( TT ) (9) 8. F T F = QP (10) 9. f0 f0 = cavg Fa avg (11) a c c = Fa + f (12) 0 (6) T s i = tit i (5) 34 J. Comput. Chem. Jpn., Vol. 10, No. 1, (2011)

Figure 2. Preparation of a worksheet (Sheet1) for generating the spectra of calibration and unknown samples. The ranges surrounded by dashed lines have to be filled by values following this Figure. The values in the ranges surrounded by bold solid lines are calculated by formulas shown in Table 1. Table 1. Procedures to construct worksheet (Sheet1) for simulating spectra. address formula or value etc input style, constraints defined name 1 C2:E4 2 I3:M5 Conc fill this ranges with values following Figure 2. but you can > 0 (C4:E4) 3 P3:T4 change them into optional values under constraints. 0 (other cells) 4 T7 cun3max 5 T9 noise 6 I3:M4 Ccal naming only 7 P3:T5 Cun 8 C5:E5 = 4*LN (2)/C4:E4^2 array* 1 9 B12:B32 2980 to 3020 with 2 intervals in this example. But optional. 10 C12:E32 = C2:E2*EXP (-C5:E5*(B12:B32-C3:E3)^2) array* 1 K 11 I12:M32 = MMULT (K,Conc) Acal 12 N3:N4 input a formula " = AVERAGE(I3:M3)" into N3. normal* 2 cavg 13 N12:N32 Then copy N3 and paste N4 and N12:N32 aavg 14 I8:M9 = Conc-cavg Y 15 I35:M55 = Acal-aavg array* 1 X 16 P5:T5 = cun3max*rand () 17 P12:T32 = MMULT (K,Cun)+noise*(RAND ()-0.5) Aun *1 array input: when inputting an array formula in cell(s), keep pressing [Ctrl] and [Shift] key, and then hit [Enter] key. *2 normal input: when inputting a formula in a cell, hit only [Enter] key. DOI: 10.2477/jccj.H2218 35

Figure 3. The spectra of the chemical species 1, 2 and 3. *1: chemical species 1, *2: chemical species 2 and *3: chemical species 3. 2.2.3 10. ( ) aun (12) c c un = Faun + f 0 (13) Nun Aun f0 Nun F0 C un = FAun + F 0 (14) C 3 ExcelPCR Excel Sheet1 Sheet2 PCA Sheet3 PCR 3.1 Sheet1 Figure 2 [14] 14 3037 53 Figure 4. The spectra of 5 samples for calibration. Table 1 Table 1 ([Ctrl] [Shift] [Enter] [15] ) Figure 2 Sheet1 Acal Aun Conc Cun3 ( ) cavg aavg 1 N3 N4N12:N32 Acal ( Acal RAND Acal Sheet21 Acal p1 ) 3 (cun3max) T7 AunT9 0 1, 2, 3 Figure 3, 4 36 J. Comput. Chem. Jpn., Vol. 10, No. 1, (2011)

3.2 PCA Figure 5 PCA Sheet2 1p14 2027 45 Table 2 Sheet2 piexcel [13] (1) Figure 5 p I2:I22 (2)pi G2 ( :[Alt] [E] [S] [V] [Enter]) (3) [F4] pi ti Sheet3 B27:F47 Xi+1B2:F22pi p2 X30 5 p5 2 p2 3 p3 H24 si si pi si si i = 1, 2, s1 ( = 1.463) > s2 ( = 0.886) > s3 (»0) Figure 5. Preparation of a worksheet (Sheet2) for getting principal component vectors (pi), and score vectors (ti). This Figure shows the initial state for getting p1. The iteration calculation is proceeded by copying I2:I22 and pasting to G2:G22 as value. Then press [F2] key until reaching convergence. Table 2. Procedures to construct worksheet (Sheet2) of initial state for PCA. address formula or value input style, etc referred equation defined name 1 G2:G22 1 (G2), 0 (other cells) initial values of pi pi 2 B2:F22 = X Xi 3 B24:F24 = MMULT (TRANSPOSE (pi),xi) (4) ti 4 H2:H22 = MMULT (Xi,TRANSPOSE (ti)) arrray (6) Xt 5 I2:I22 = Xt/SQRT (SUMSQ (Xt)) (6) 6 B27:F47 = Xi-pi*ti (7) 7 H24 = SUMSQ (ti) normal (5) DOI: 10.2477/jccj.H2218 37

Figure 6. Preparation of a worksheet (Sheet3) for constructing PCR model and predicting the concentration of unknown samples. The values of p1, p2 t1 and t2 are copied from Sheet2, after convergence on each step. The PCR model is constructed from 1st to 31st lows. The concentrations of unknown samples are predicted in 32nd to 35th lows. Table 3. Procedures to construct worksheet (Sheet3) for PCR and predicting the concentration of the unknown samples. address formula or value, etc input style 1 C2:G3 = Y array 2 C5:G6 3 B10:C30 C5:G5 C6:G6 B10:B30 C10:C30 copy form B24:F24 on Sheet2 after the calculation of the first component is converged, and paste to this range as value. copy form B24:F24 on Sheet2 after the calculation of the second component is converged, and paste to this range as value. copy form G2:G22 on Sheet2 after the calculation of the first component is converged, and paste to this range as value. copy form G2:G22 on Sheet2 after the calculation of the second component is converged, and paste to this range as value. 4 D10:E11 = MMULT (TRANSPOSE (P),P) array 5 F10:G11 = MMULT (T,TRANSPOSE (T)) array 6 I2:J3 = MMULT (MMULT (Y,TRANSPOSE (T)),MINVERSE (MMULT (T,TRANSPOSE (T)))) referred equation defined name T array (9) Q 7 J8:AD9 = MMULT (Q,TRANSPOSE (P)) array (10) F 8 J10:J11 = cavg-mmult (F,aavg) array (11) f0 9 C34:G35 = MMULT (F,Aun)+f0 array (12) P 38 J. Comput. Chem. Jpn., Vol. 10, No. 1, (2011)

3.3 PCR PCR Figure 6 Sheet331 Table 3 pi ti pi+1 ti+1 Sheet2 pi, ti, Sheet3p1, t1 Sheet3 B10:B30, C5:G5 Sheet2 p2, t2 Sheet3 C10:C30, C6:G6p1, p2 B10:C30 P t1, t2 C5:G6 TD10:E11 P T PF10:G11 TT T PCR ti T Y (9) Y T Q P Q F (10) (J8:AD9) (11) f0(j10:j11) 3.4 Sheet3 (Figure 6) 34, 35 Table 3 C34:G35 " = MMULT(F,Aun)+f0" Acal Aun C un Sheet1 P3:T4 4 RCR MLR LBA 22 () [14] MLR LBA ( 3) PCR, MLR, LBA 3 conc3 (0.03, 0.06, 0.09, 0.06, 0.03)3 c3max = 0.01, noise = 0.1 3 1000 1000 j C un Cun average of relative errors : ARE ARE = 1000 2 5 ccal ln, i cun ln, i ccal ln, i (15) i= 1 l= 1 n= 1 10000 i i MLR, LBA PCR 2 (PCR2) 3 (PCR3) MLR 2 (2994, 3004) (MLR2)3 (2994, 3000, 3004) (MLR3) 2 3 PCR3, MLR3 LBA 2 2 Table 4 Table 4 (15) ARE ARE MLR LBA [14] PCR2 LBA 2 ( p1 p2 12) MLR2 3 PCR2 LBA PCR2 DOI: 10.2477/jccj.H2218 39

Table 4. Comparison of quantitative performance among PCR, MLR and LBA conditions conc3 non non non fix* 1 fix* 1 fix* 1 fix* 1 cun3max 0.1 0 0.1 0.1 0 0.1 0.1 noise 0 0.01 0.01 0 0.01 0.01 0.01* 7 ARE* 2 PCR2* 3 0.0922 0.0044 0.0923 0.0574 0.1101 0.0576 0.0565 MLR2* 4 0.0877 0.0095 0.0881 0.0553 0.1089 0.0561 - LBA 0.0922 0.0044 0.0923 0.0604 0.1132 0.0605 - PCR3* 5 - - - 0.0000 0.0069 0.0069 0.0088 MLR3* 6 - - - 0.0000 0.0193 0.0193 - PCR4* 8 - - - - - - 0.0095 *1 conc3 is fixed, (0.03 0.06 0.09 0.06 0.03). *2 average of relative errors calculated by eq. (15). *3, *5 and *8 PCR using 2, 3 and 4 principal components, respectively. *4 MLR using 2 wavelengths: 2994 and 3004. *5 PCR using 3 principal components. *6 MLR using 3 wavelengths: 2994, 3000 and 3004. *7 errors are included in not only Aun, but also Acal. 3 PCR3 MLR3 Table 4PCR3 MLR3 PCR2 MLR2 PCR3 MLR3 ( )PCR3 MLR33 3 3 ( ) MLR3 PCR3 MLR PCR PCR MLR PCR Table 4 Acal Sheet1 I12:M32 " = MMULT(K,Conc)+noise*(RAND()-0.5)" Sheet2 I12:M32 I12:M32 Sheet2, 3 PCA4Figure 414 ARE PCR2 PCR3 ARE 4 PCR4ARE 3 PCR Acal Ccal Ccal Aun Cun 40 J. Comput. Chem. Jpn., Vol. 10, No. 1, (2011)

Figure 7. Comparison between the spectra of pure chemical species and principal component vectors. Traces p1 and p2 are similar to the spectra of chemical species 1 and 2, respectively, but do not coincide with them. Especially, p1 takes negative values at over 3000 wavelength. *1: chemical species 1, *2: chemical species 2. Figure 8. Score plot of calibration samples. The value near each plot is the sample number. The horizontal axis corresponds to p1, while the vertical axis corresponds to p2. ( ARE ) (cross validation: CV) [4] 5 pi Figure 1 pi (xm) xm an pi tin pi an M an Figure 7 p1, p2 p112994 p223004 p11p22 p13000 pi 12 Figure 8p1p2 (p1) 1 (p2) 2 11 2 6 PCA PCR Excel PCA PCA PCA MLR PCR DOI: 10.2477/jccj.H2218 41

PCR Table 4 MLR MLR PCR xn = an a avg, n = n avg y c c (iv) aavg, cavg N Aavg Cavg ( N columns ) A = a a, avg avg avg ( N columns ) C = c c (v) avg avg avg Excel Excel http://www.tuat.ac.jp/~mt2459/chemom/ n (n = 1, 2,, N) l m(m = 1, 2,, M) amn A n l (l = 1, 2,, L) cln C n ( ) an cn a 11 1N = ( 1 N ) =, A a a a c M 1 a a MN 11 1N = ( 1 N ) = C c c c L1 A, C aavgm, cavgl a avgm 1 N N n = 1 = a, cavg mn l 1 N N n = 1 c c LN ln (i) = c (ii) aavg, cavg a avg1 c avg1 aavg =, cavg = (iii) a avgm c avgl an aavg cn cavg yn, xn X1 Y ( ) X x x A A 1 = 1 N = avg, ( ) Y = y y = C C (vi) 1 N avg [1] T. Hasegawa, Supekutoru Teiryô Bunseki, Kodansha Scientific Ltd. (2005) [2] T. Mitsui, Kemometorikkusu no Kiso to Ôyô, - Bunseki Kagakuto Tahenryô Kaisekihô, IPC Inc. (2003) [3] Y. Ozaki, A. Uda, T. Akai, Kagakusha no tameno Tahenryô Kaiseki, Kodansha Scientific Ltd. (2002) [4] Y. Miyashita, S. Sasaki, Kemometorikkusu - Kagaku Patân Ninsiki to Tahenryô Kaiseki, Kyotitsu Shuppan Co., Ltd. (1995) [5] W. F. Massy, J. Am. Stat. Assoc, 60, 234 (1965).[Medline] [CrossRef] [6] S. Wold, K. Esbensen, P. Geladi, Chemometrics Intell. Lab. Syst., 2, 37 (1987). [CrossRef] [7] K. Arai, S. Ishimura, Tahenryô Kaiseki no Hanashi, Tokyo Tosho Co., Ltd. (2000) [8] O. Uchida, Suguwakaru EXCEL niyoru Tahenryô Kaiseki, Tokyo Tosho Co., Ltd. (2000) [9] T. Yoshimuta, Y. Aoyama, Toranjisuta Gijutu Special Gijutsuhsa no tameno Excel Katsuyô Kenkyû, CQ Publishing Co., Ltd (2002) [10] M. Ogura, Excel de Kantan Tahenryô Kaiseki, Kodansha Scientific Ltd (2006) [11] T. Suga, Excel de Manabu Tahenryô Kaiseki Nyûmon, Ohmsha Ltd. (2007) [12] Y. Nagata, M. Muneuchi, Tahenryô Kaiseki Nyûmon, Saiensu Sha Co. Ltd. (2009) [13] N. Yoshimura, A. Shigetani, M. Takayanagi, J. Comput. Chem. Jpn., 8, 183 (2009). [CrossRef] 42 J. Comput. Chem. Jpn., Vol. 10, No. 1, (2011)

[14] N. Yoshimura, A. Shigetani, M. Takayanagi, J. Comput. Chem. Jpn., 9, 109 (2010). [CrossRef] [15] N. Yoshimura, M. Takayanagi, J. Comput. Chem. Jpn., 7, 71 (2008). [CrossRef] Chemometrics Calculations with Microsoft Excel (4) Principal Component Regression Norio YOSHIMURA *, Koji FUKUHARA, Kenichiro MITSUKI and Masao TAKAYANAGI Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwaicho Fuchu, Tokyo 183-8509 Japan * yosimura@cc.tuat.ac.jp Although chemometrics has become widely used recently for analyzing experimental chemical data, there exist only a few instructions for the proper usage of chemometrics other than those in some introductory books. As the fourth step of chemometrics calculations with Microsoft Excel (Excel), the principal component regression is performed on worksheets. Three worksheets were constructed for generating the spectra of model calibration samples and unknown samples, solving principal component analysis by NIPALS algorithm and calculating principal component regression. The quantitative performance of principal component regression was compared with that of multiple linear regression or the analysis based on Lambert-Beer law. Principal component regression was found to be superior to the other two methods. Keywords: Microsoft Excel, Chemometrics, Principal component analysis, Principal component regression and NIPALS algorithm DOI: 10.2477/jccj.H2218 43