On the Limited Sample Effect of the Optimum Classifier by Bayesian Approach he Case of Independent Sample Size for Each Class Xuexian HA, etsushi WAKA

Journal Article / 学術雑誌論文ベイズアプローチによる最適識別系の有限標本効果に関する考察 : 学習標本の大きさがクラス間で異なる場合 (< 論文小特集 > パターン認識のための学習 : 基礎と応用 On the limited sample effect of bayesian approach : the case of each class 韓, 雪仙 ; 若林, 哲史 ; 木村, 文隆 ; 三宅, 康二 Han, Xuexian; Wakabayashi, etsushi; Kimura, Fumitaka; Miyake 電子情報通信学会論文誌. D-II, 情報システム, II-パターン he transactions of the Institute o Communication Engineers. D-II. 1999 http://hdl.handle.net/10076/11104

On the Limited Sample Effect of the Optimum Classifier by Bayesian Approach he Case of Independent Sample Size for Each Class Xuexian HA, etsushi WAKABAYASHI, Fumitaka KIMURA, and Yasuji MIYAKE 1. 18 [1] Faculty of Engineering, Mie University, su-shi, 514 8507 Japan [] [5] [6] Aitchson Dunsmore [7] [8] [6].. 1 θ D II Vol. J8 D II o. 4 pp. 61 630 1999 4 61

99/4 Vol. J8 D II o. 4 p(x χ = p(x θp(θ χdθ (1 p(x χ χ X p(x θ θ X p(θ χ θ p(θ χ θ p(θ [9] (1 ( p(θ. X p(x p(x θ θ ˆθ(χ ( θ ˆθ(χ χ θ (1 ( ˆθ(χ X (3 [10] 3. 3. 1 3. 1. 1 t 1 1 (1 p(x θ p(θ χ p(x χ t p(x θ p(θ χ p(x χ [6] p(x χ =( π n Σ 1 = + 0 1+ (X M Σ 1 Σ =(1 ασ + ασ 0 0 α = + 0 Γ( +1 Γ( n+1 (X M } +1 (3 X n M Σ Σ 0 X 0 Σ 0 Γ p(x χ t 3. 1. (3 g(x = lnp(xp (ω =( +1ln 1+ (X } M Σ 1 (X M D =( π n +ln Σ lnd lnp (ω (4 Γ( +1 Γ( n+1 P (ω ω 0 =0 = Σ =Σ Σ 0 0 = 0 Σ 0 X Σ 0 = σ I I σ 4.1 4. 0 =0 4.3 0 = 0 σ M 3. (4 6

[11] [ 1.] g(x [ =( + 0 +1ln 1+ 1 X M 0σ + k (1 αλ i (1 αλ i + ασ [ Φ i (X M ] k ln ( (1 αλ i + ασ lnp (ω ]} (5 λ i Φ i Σ i i k 3. 3 Σ P (ω 0 (5 3 1 ( + 0 +1 g(x = X M k (1 αλ i (1 αλ i + ασ Φ i (X M } (6 i < = k λ i ( 0/σ (6 [1] g(x = n i=k+1 = X M Φ i (X M } k Φ i (X M } (7 X k K-L X 1 Fig. 1 Decision boundaries of projection distance and modified projection distance. 1 1 X 4. 4.1 1 4. 8 4.3 3. g(x =(X M Σ 1 (X M +ln Σ lnp (ω (8 63

99/4 Vol. J8 D II o. 4 (% 1 + =6,σ1 = σ =1.0, 1 Fig. heoretical mean error rate (% v.s. sample size with fixed total sample size ( 1 + =6,σ1 = σ =1.0, univariate case. 3 (% 1 + =6,σ1 =4.0,σ =0.5, 1 Fig. 3 heoretical mean error rate (% v.s. sample size with fixed total sample size ( 1 + =6,σ1 =4.0,σ =0.5, univariate case. 4. 1 1 3 1 σ 1 = σ =1.0 3 σ 1 =4.0,σ =0.5 1.0 1.0 1 6 1 + =6 t [.] 4. 4 4 (% 1 + =40,8 Fig. 4 Mean error rate (% v.s. sample size with fixed total sample size ( 1 + = 40, 8-variate case. 5000 1000 1 8 8 64

8 8 diagσ =(8.41, 1.06, 0.1, 0., 1.49, 1.77, 0.35,.73 (9 8 M 1 =(0, 0, 0,..., 0, M =(3.86, 3.10, 0.84, 0.84, 1.64, 1.08, 0.6, 0.01 (10 1 40 1 + =40 1 4. 3 3. 4. 3. 1 400 [11] 3, 6, 9, 1, 0, 3, 48, 64, 100, 144, 196, 56 400 (1 7 7 ( r r=5 (3 0 1 (4 Roberts (5 π /16 3 1 able 1 Sample size of each class (Case of nearly common learning sample size. 0 30 175 1 3100 1678 891 1386 3 3088 1556 4 398 1380 5 937 1546 6 75 1650 7 805 1481 8 886 191 9 3945 141 otal 9877 14961 able Sample size of each class (Case of independent learning sample size. 0 000 175 1 00 1678 0 1386 3 000 1556 4 00 1380 5 0 1546 6 000 1650 7 00 1481 8 0 191 9 000 141 otal 8660 14961 (6 9 9 81 (5 3 (7 [14641] 3 16 5 5 1 9 9 5 5 400 (8 y = x u u =0.4 400 n n = 3, 6, 9, 1, 0, 3, 48, 64, 100, 144, 196, 56, 400 4. 3. 1 65

99/4 Vol. J8 D II o. 4 5 Fig. 5 Recognition rate of handwritten numeral recognition (Case of nearly common learning sample size. [11], [13], [14] 3 14,946 44,838 9,877 14,961 1 4. 3. 3 5 0 0 = α 1 α (11 α 5 6 Fig. 6 Recognition rate of optimum discriminant function (Case of independent learning sample size. 3 99.0% 144 99.31% 3 3 48 [15]. 4. 3. 4 6 7 6 0 0 0 0 66

able 3 3 Ratio of computation cost. 10 1 0.65 1 10 5. 7 Fig. 7 Recognition rate of handwritten numeral recognition (Case of independent learning sample size. [], [16] 7 3.3 4. 3. 5 3 1 10 0.65 1 10 (1 ( (3 (1 ( (3 (4 (5 0 Σ 0(= σ I t 67

99/4 Vol. J8 D II o. 4 [11] 6. (1 0 0 ( 4.3 (3 (4 (5 [1] 1988. [] J.M. Van Campenhout, On the peaking of Hughes mean recognition accuracy : he resolution of an apparant paradox, IEEE rans. Syst., Man & Cybern., vol.smc-8, no.5, pp.390 395, May 1978. [3] W.G. Waller and A.K. Jain, On the monotonicity of the performance of Bayesian classifiers, IEEE rans. Info. heory, vol.i-4, pp.39 394, 1978. [4] J.M. Van Campenhout, opics in Measurement Selection, Handbook of Statistics, vol., orth- Holland Publishing Company, pp.793 803, 198. [5] D. Lindley, he Bayesian approach, Scand. J. Statist., vol.5, pp.1 6, 1978. [6] D.G. Keehn, A note on learning for Gaussian properties, IEEE rans. Inform. heory, vol.i-11, no.1, pp.16 13, Jan. 1965. [7] B.D. Ripley, Pattern Recognition and eural etworks, p.5, Cambridge University Press, 1996. [8] S.J. Raudys and A.K. Jain, Small sample size effects in statistical pattern recognition : Recommendations for practitioners, IEEE rans. Pattern Analysis & Machine Intelligence, vol.13, no.3, pp.5 64, March 1991. [9] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, p.5, John Wiley & Sons, Inc., ew York, 1973. [10] vol.77, no.8, pp.853 864, Aug. 1994. [11] D-II vol.j77-d-ii, no.10, pp.046 053, Oct. 1994. [1] vol.4, no.1, pp.106 11, Jan. 1983. [13] PRU9-33, Sept. 199. [14] PRU-93-46, Sept. 1993. [15] K. Fukunaga and R.R. Hayes, Effects of sample size in classifier design, IEEE rans. Pattern Analysis & Machine Intelligence, vol.pami-11, no.8, pp.873 885, Aug. 1989. [16] G.F. Huges, On the mean accuracy of statistical pattern recognizers, IEEE rans. Info. heory, vol.i- 14, no.1, pp.55 63, Jan. 1968. 1. Σ 0 = σ I Σ =(1 ασ + ασ I (A 1 (1 ασ + ασ I}Φ i =(1 ασφ i + ασ Φ i =(1 αλ i + ασ }Φ i (i =1,,,n (A Σ (1 αλ i + ασ Φ i(i =1,,,n Y =(X M Σ 1 (X M n 1 = Φ (1 αλ i + ασ i (X M } (A 3 k i > k (1 αλ i ασ (A 3 68

k 1 Y Φ (1 αλ i + ασ i (X M } + n i=k+1 n Φ i (X M } i=k+1 = X M 1 ασ Φ i (X M } k Φ i (X M } (A 4 [ Y 1 X M ασ k (A 4 (A 5 (1 αλ i (1 αλ i + ασ Φ i (X M } n ln Σ = ln(1 αλ i + ασ } k ln(1 αλ i + ασ } + n i=k+1 ] (A 6 ln(ασ (A 7 (4 (A 6 (A 7 α = 0/( + 0 (5.. 1 1 (4 g i(x =σ +1 i 1+ 1 ( } x mi σ i (i =1, (A 8 h(x 0 h(x =g 1(x g (x = σ +1 1 σ +1 ( 1+ 1 x m1 σ 1 } ( 1+ 1 x m σ =(a bx (am 1 bm x + am 1 bm + c a = 1 σ +1 1, b = 1 σ +1, c = σ h(x =0 α = β = +1 1 σ m1 + m +1 (σ 1 = σ } (A 9 α, β = am1 bm (m 1 m ab (a bc a b (σ 1 = σ (A 10 σ 1 > = σ ε = P (ω 1ε 1 + P (ω ε = P (ω 1P (error χ, ω 1+P (ω P (error χ, ω = B + A + C A = B = α p(x χ, ω P (ω dx ( α m = 1 Φ β α σ p(x χ, ω 1P (ω 1dx ( ( = 1 β m1 Φ 1 α m1 σ 1 Φ σ 1 C = β p(x χ, ω P (ω dx = 1 ( } β m 1 Φ (A 11 σ Φ (x 0 t t (x Φ (x 0= x0 t (xdx (A 1 69

99/4 Vol. J8 D II o. 4. σi ( } i +1 g i(x =( 1+ 1 x mi D i σ i D i =( i π 1 Γ i ( i +1 Γ ( i (i =1, (A 13 h(x =0 ewton x k+1 = x k h(x k h (x k σ1 h(x =( D 1 1+ 1 1 ( } 1 +1 x m1 σ 1 ( σ ( 1+ 1 x m D σ ( h (x = ( 1 +1 ( x m1 σ1 1 σ 1 σ 1 D 1 ( } 1 1+ 1 x m1 1 σ 1 } +1 60 6 3 10 48 53 58 10 1 3 ME 35 40 43 43 53 ME ( ( +1 ( x m σ σ σ D ( } 1+ 1 x m (A 14 σ (A 11 10 7 16 11 16 1987 1994 9 10 630