On the Statistical Bias Found in the Horse Racing Data (1) Akio NODA Mathematics Abstract: The purpose of the present paper is to report what type of statistical bias the author has found in the horse racing data (based on []). In order to explain the type of statistical bias, let us consider a racing with m participants. We denote by {a, b, c}(1 a < b < c m) a set of numbers of the first, second and third racehorses to reach the goal. The number of each participant is determined by lot, which leads us to the following null hypothesis: H 0 : A set {a,b,c} is nothing but a result of random sampling from the set {1,,,m}. Studying the probability distributions of various random variables arising in the random sampling of H 0, we are in a position to examine, by means of the chi-square test, how frequency distributions observed in the data mentioned above deviate from the expected ones under the null hypothesis H 0. Our method of contracting the original data consists of studying two random variables, R = c - a (the range) and D = min {b - a, c - b} (the adjacent interval of three numbers), as well as the following pair of partitions of the total event: A 0 = {b = a + c}, A 1 = {b < a + c} and A ={b > a + c} ; B 0 = {a + b = c}, B 1 = {a + b < c} and B = {a + b > c}. In this paper (1), we take up three racetracks, Chukyo, Hanshin and Kyoto, to examine all racings of m = 16 (and also m =14) carried out on these racetracks. Indeed, we sum up the original data into two kinds of contingency tables, the one corresponding to the joint probability distribution of (R,D) and the other to the probability table of the product events A i B j (i, j = 0, 1,). Performing the chisquare tests for these contingency tables, we are able to detect some types of statistical bias for each racetrack. Furthermore, these results tell us interesting dependency of the type of detected bias upon the racetrack, which suggests that the individual character of racetrack can be extracted from the long-term racing files []. Keywords: contingency table, chi-square test, random sampling as the null hypothesis, adjacent interval of three numbers. 1
On the Statistical Bias Found in the Horse Racing Data (1) {a, b, c}(1 a < b < c m) H 0 H 0 {a, b, c} {a, b, c} m C H 0 mm ( 1)( m ) = 6 R= c a D= min b a, c b { } A = b= a+ c, A b a c, A b a c { } = { < + } = { > + } 0 1 B0 = { a+ b= c}, B1 = { a+ b< c}, B = { a+ b> c} RD, A i B j B 1 B H 0 A i B j (i, j = 0,1,) m =16 N 1 =148 N = 9 N = 96 m =16 n 1 =616 n =656 n =590
n k {a,b,c} k (k = 1,, ) A i B j χ H 0 H 0 m m =16 m m m =14 n 1 = 140 N 1 n = 67 n = 5 m =16 {a,b,c} m H 0 χ a a a m 1 r m 1,1 d R = r D = d x r d r 1 r +1 x = (m r); d x = 0 r d r 1 x = (m r); r d x = m r; d r + 1 x = 0 m =16 H 0 D = 1 D = D = D = 4
On the Statistical Bias Found in the Horse Racing Data (1) D = 1 D = D = D = 4 D 5 6( r 1)( m r) PR ( = r) = mm ( 1)( m ) 6( m d) PD ( = d) = mm ( 1)( m ) ( r m 1) 1 d m 1 ( abc,, ) a ( αβγ,, ) a a a' α = m+ 1 c, β = m+ 1 b, γ = m+ 1 a f A 0 A 1 A A A 1 a α = + a a+ a, β = b, γ = c A A A 0 B 0 A 1 B 1 α = c a, β = b, γ = c A A B 0 g 1 0 1 g A 0 B 0 A α = b a, β = b, γ = c A ( B B ) h A B B A A B B A 0 1 0 0 0 1 0 1 PA ( 0) = PB ( 0) PA ( 1) = PA ( ) = PB ( 1) = PB ( ) PA ( 0) = PB ( 0) = p 1 p m m p m = m m 6 mm ( 1)( m ) 4
PA ( B) = PB ( A), PA ( B) = PB ( A) 0 1 0 1 0 0 PA ( 1 B) = PA ( B1) Ai Bj(, i j = 01,, ) A0 B0, A0 B1, A B yzw,, 1 a = a( a+1), a = a( a+ )( a+1) y z = m = 1 m m m m m + m + ( ) w m m 1 m = m + 1 + m m m + m 1 m 1 ( m 1) + m m m m ( m ) + m + m m m 1 + m + 1 5 m m + m 6 m =16 A i B j B 0 B 1 B A 0 A 1 A 5
On the Statistical Bias Found in the Horse Racing Data (1) χ m =16 χ m =16 {a,b,c} A i B j m =16 χ m =14 m m =16 B 0 B 1 B A 0 A 1 A 6
H 0 χ χ A i B i (i=0,1,) A 1 B 0 A 0 B 0 A B 1 v=8 P H 0 A i B j (i, j = 0,1,) A i B j χ =40.515 v=16 m=16 R= R= R=4 R=5 R=6 R=7 R=8 R=9 D=1 D= D= D=4 7
On the Statistical Bias Found in the Horse Racing Data (1) R=10 R=11 R=1 R=1 R 14 D=1 D= D= D=4 D 5 R D D R (R,D) D 5 v=4 χ D D D χ χ D D 8
H 0 D R v=1 χ R R 5 R 8 R=7 R 9 R 10 χ =11.0400 H 0 R= R 4 R χ R 4 R 5 R χ =.894 v=4 χ 9
On the Statistical Bias Found in the Horse Racing Data (1) m=16 R D A i B j m=14 χ A i B j A 0 B 0 A 1 B 0 A 0 B 1 v=7 H 0 A B m=16 D= D= D χ =6.05 H 0 χ H 0 R 4 R 5 6 R 9 10 R 11 R 1 χ =9.8958 H 0 R=4 5 v= χ =1.9668 H 0 m=16 χ =9.74 H 0 1 m 15 m 10
The Analysis of Contingency Tables 11