II 3ADT2206 19 3 1 1
1 5 2 5 2.1 DA................................................ 5 2.2 RA................................................ 7 2.3.......................................... 8 2.4.................................... 8 2.5......................................... 10 3 11 3.1.................................... 11 3.1.1 SAMBA................................. 14 3.2................................. 15 3.2.1................................... 16 3.2.2....................... 18 3.2.3 k-................................. 20 3.3......................................... 20 3.3.1 FASTA.......................................... 20 3.3.2 BLAST.......................................... 23 3.4..................................... 26 3.4.1......................... 26 3.4.2 lustalw......................................... 26 2
1 DA.......................................... 5 2.............................................. 6 3.............................................. 6 4.............................................. 6 5............................................... 6 6.............................................. 7 7 2.................. 12 8...................... 12 9............................ 13 10 o1.......................... 14 11 o2.......................... 14 12 o3.......................... 14 13 o4.......................... 14 14 SAMBA..................................... 15 15..................................... 17 16.......................... 18 17 eedleman-wunsch.................................... 19 18 eedleman-wunsch ( ).............. 19 19 eedleman-wunsch....................... 19 20 DDBJ Search and Analysis FASTA.................... 22 21 BI P BLAST.............................. 24 3
1......................................... 8 2.............................................. 9 4 DA.................................. 9 3......................................... 10 5......................... 11 6 (i, j)............................................ 12 4
1 DA DA DA DA 2.1 DA DA Deoxyriboucleic Acid( ) DA RA DA RA DA ( 1 ) 2 (Bioinformatics) Biology( ) Information science( ) 1: DA 4 2 3 4 5 Adenine( ) ytosine( ) Guanine( ) Thymine( ) 5
2: O 3: O 4: O O 5: 6
4 A(Adenine) (ytosine) G(Guanine) O T(Thymine) A T G mra DA O DA(cDA; complementary DA) DA bp(base pair; ) kbp(1,000bp) Mbp(1,000kbp) Gbp(1,000Mbp) 4 6: (omenclature ommittee of the International Union of Biochemistry) 1 3 mra(messenger RA; RA) tra(transfer RA; RA) 2.2 RA RA Riboucleic Acid( ) 4 4 Adenine( ) ytosine( ) Guanine( ) Uracil( ) DA DA DA A G U(Uracil) 6 rra(ribosome RA; RA) RA sira(small interfering RA) mira(micro RA) mra RA tra RA rra RA RA mra 7
G G (Guanine) A A (Adenine) T T (Thymine) (ytosine) R A G (purine) Y T (pyrimidine) M A (amino) K G T (Keto) S G 3 (Strong interactioins) W A T 2 (Weak interactioins) A T G = B G T A =B V A G T U (U RA )=V D A G T =D A G T (ay base) 1: 2.3 mra ( ) 2.3 3 1 1 1 3 2.4 DA RA 15 30 2003 4 14 DA Geprge I. Bell 1974 (Los Alamos atioinal Laboratory) (Theoretical Biology and Biophysics Group) GenBank 4 8
1 2 3 U A G Phe Ser Tyr ys U U Phe Ser Tyr ys Leu Ser STOP STOP A Leu Ser STOP Trp G Leu Pro is Arg U Leu Pro is Arg Leu Pro Gln Arg A Leu Pro Gln Arg G Ile Thr Asn Ser U A Ile Thr Asn Ser Ile Thr Lys Arg A Met(START) Thr Lys Arg G Val Ala Asp Gly U G Val Ala Asp Gly Val Ala Glu Gly A Val Ala Glu Gly G 2: 1974 George I. Bell 1979 Goad GenBank 1980 (EMBL) 1982 1992 GenBank 1984 DA (DDBJ) 1989 AceDB GenBank,EMBL,DDBJ 4: DA 9
1 3 A Ala ys D Asp E Glu F Phe G Gly is I Ile K Lys L Leu M Met Asn P Pro Q Gln R Arg S Ser T Thr V Val W Trp X Xxx Y Tyr Z Glx ( ) 3: DA 2.5 Fred Blattner amilton Smith 5 10
[kb] 1995 KW20 1,830 1997 K12 4,639 1997 26695 4,214 1997 S288 12,069 1998 97,000 2000 137,000 2000 115,428 2001 O157 4,100 2001 2,840 2001 T18 4,809 2001 3,100,000 2002 14,000 2002 2,700,000 2005 38,000 5: 3 DA (A G T) DA DA 1977 Maxam Gilbert Sanger 3.1 1 0 (i 1, j 1) + d(x i, y j ) (i, j) = Max (1) (i 1, j) g (i, j 1) g (i, 0) = (0, j) = 0 d(x i, y j ) : x i y j g : 1 x i y j 2 3 X = (x 1, x 2,, x i,, x n ) (2) Y = (y 1, y 2,, y j,, y m ) (3) 6 (i, j) x i y j 11
X Y y 1 y 2 y j y m X Y A T G x 1 G (1, 1) (1, 2) (1, j) (1, m) x 2 T (2, 1) (2, 2) (2, j) (2, m)......... x i (i, 1) (i, 2) (i, j) (i, m)......... x n A (n, 1) (n, 2) (n, j) (n, m) 6: (i, j) 7 ATGGAG GTAGGTGAT A T G G A G 0 0 0 0 0 0 0 0 0 G 0 0 0 0 2 1 2 1 2 0 0 0 2 1 4 3 2 1 T 0 0 2 1 1 3 3 2 1 0 0 1 4 3 3 2 2 1 A 0 2 1 3 3 2 2 4 3 G 0 1 1 2 5 4 4 3 6 0 0 0 3 4 7 6 5 5 G 0 0 0 2 5 6 9 8 7 T 0 0 2 1 4 5 8 8 7 G 0 0 1 1 3 4 7 7 10 A 0 2 1 0 2 3 6 9 9 T 0 1 4 3 2 2 5 8 8 7: 2 g d(x i, y j ) g = 1 +2, x i = y j d(x i, y j ) = 1, 7 ( 10 ) 8 2 ( ) T G G A G T A G G T G 8: 1 (i, j) (i 1, j 1) (i 1, j) (i, j 1) ( 9 ) 1 1 P i,j 10 11 12 13 P i,j t 4 12
P( i - 2, j - 2) P( i - 2, j - 1) P( i - 2-1, j ) P( i - 1, j - 2) P( i - 1, j - 1) P( i - 1, j ) ( i - 1, j - 1) ( i - 1, j ) ( i, j +1) P( i, j - 2) P( i, j - 1) ( i, j - 1) P( i, j ) P( i, j +1) ( i +1, j ) ( i +1, j +1) P( i +1, j ) P( i +1, j +1) 9: 13
10: o1 11: o2 12: o3 13: o4 Sp = 5 l g : l b : : (l g l b ) l g + (l b ) 1 l g (5) 3.1.1 SAMBA SAMBA Systolic Accelerator for Molecular Biological Applicatioins 14 SAMBA SAMBA 1995 SAMBA Web SAMBA BLAST t = i + j + 1 (4) t = 0 1 1 Sp 5 FASTA SSEAR BISP BioSA 14
LSI DB FPGA 14: SAMBA KESTREL RAPID-2 FPGA SPLAS-2 ReRLe-1 FPGA SAMBA SAMBA 1 FPGA SAMBA 3.2 3 k- 2 15
/ / 1 eedleman Wunsch Smith Waterman 2 Dayhoff PAM250 DA 2 3 2 k- FASTA BLAST k- 2 FASTA BLAST 3.2.1 1970 A.J. Gibbs G.A. McIntire 2 1. 2. 3. 15 ATTGGT GTTAGGTGAT 16
A T T G G T G T T A G G T G A T 15: 15 GGT 15 2 1 RA Gibbs McIntyre 2 2 DA DA 20 DA 4 DA 15 10 10 15 2 3 2 20 5 2 3 1. 2. PAM250 BLOSUM62 3. 17
2. 3. 3.2.2?? ( 16 2 ) A A T G G A B A G A G 16: 6 g(l) = α beta (L 1) (6) 6 L: α: β: α β α 1/10 α β Dayhoff / 300 10 88 eedleman Wunsch 2 eedleman- Wunsch 1 1 2 1 0( 0 ) GTTAGGTGAT ATTGGT 2 17 17 17 18 18 GTTAGGTGAT ATTGGT 19 18
A T T G G T G 1 1 1 T 1 1 1 T 1 1 1 A 1 G 1 1 1 G 1 1 T 1 1 1 G 1 1 A 1 T 1 1 1 17: eedleman-wunsch A T T G G T G 1 1 2 T 1 1 3 T 1 2 1 A 1 G 3 1 4 G 1 5 T 1 1 6 G 2 1 A 1 T 2 1 2 18: eedleman-wunsch ( ) T T G G T T T A G G T 19: eedleman-wunsch 18 6 18 2 3 TT GGT( ) 1 18 6 2 ( ) Smith Waterman DA Smith Waterman eedleman- Wunsch ( ) Smith-Waterman 2 2 1 19
3.2.3 k- k- ( k- ) 2 3.3?? eedleman-wunsch Smith-Waterman DA W. Pearson D. Lipman 1988 FASTA FASTA ( ) 2 BLAST 1990 S. Altschul BLAST D BI BLAST FASTA 2 BLAST DA BLAST GAPPED-BLAST BLAST 3 BLAST PSI-BLAST(Position-Specific-Iterated BLAST; BLAST) 3.3.1 FASTA FASTA DA 2 k- k FASTA DA 20
FASTA 20 BI BLAST ( 20 http://helix.genes.nig.ac.jp/homology/fasta.shtml ) FASTA k- 2 2 2 FASTA k- 1( 20 400 ) 2 4 6 k- FASTA FASTA3 FASTA FASTA3(FASTA version34) FASTA FASTA ( ;E) E (Smith-Waterman SSEAR ) u λ FASTA FASTA 1. 2. 3. 4. (z ) 5. 6. 1. 5. 7. z ( 7 21
20: DDBJ Search and Analysis FASTA 22
Z ) z ( 7 Z ) P (Z > z) = 1 exp( e 1.2825z 0.5772 ) (7) D z e DP z 1 E = 1 e DP P < 0.1 E = DP 8 E(Z > z) = D P (Z > z) (8) 8. z = 50 + 10z 9. 3.3.2 BLAST BLAST(Basic Local Alignment Search Tool) FASTA BLAST BI 21 BI Web BLAST ( 21 http://www.ncbi.nlm.nih.gov/genomes/prokhits.cgi ) BLAST BI BLAST BLAST WU-BLAST WU-BLAST BI-BLAST BI BLAST BLAST FASTA k- FASTA BLAST BLOSUM62 BLAST 3 11( 6 3) FASTA FASTA BLAST 1. 2. 3 3. BLOSUM62 2 3 4. (T ) 5. 3 4 3 23
21: BI P BLAST 24
6. 3 7. 2 (seed) 8. BI BLAST2 gapped BLAST BLAST T 9. SP(high-scoring segment pair; ) S S SP 10. SP 2 SP S x ρ 9 P (S x) = 1 exp( e λ(x u) ) (9) u = [log(km n )]/λ K λ BLAST n m m m (ln Kmn)/ (10) n n (ln Kmn)/ (11) 2 = (ln Kmn)/l l m n l 10 11 D S x E E 1 e p(s>x)d (12) 12 p < 0.1 E = pd E D 2 E = 1 1 11. 2 SP 12. Smith-Waterman 25
SP 1 13. E E 3.4 3 GG PILEUP LUSTALW T-OFFEE ( ) PSSM(Positioin-Specific Scoring Matrix; ) 2 DA DA RA 3.4.1 2 2 2 3 DA 3.4.2 lustalw 1 lustalw lustalw 1. 2. eighbor Joining 26
3. 4. 2 lustalw 2 lustalw 27
[1] Dominique Lavenier: Speeding Up Genome omputatioins With a Systolic Accelerator [2] ova Ahmed, Yi Pan, Art Vandenberg: Parallel Algorithm for Multiple Genome Alignment on the Grid Environment [3] Gong-Xin Yu, Al Geist, George Ostrouchov, agiza F. Samatova: An SVM-based Algorithm for Identificatioin of Photosynthesis-specific Genome Features [4] David W. Mount: [5] yntbia Gibas, Per Jambeck: [6] : [7] : [8], : [9], : DA [10] DDBJ Searh and Analysis http://helix.genes.nig.ac.jp/ [11] BI http://www.ncbi.nlm.nih.gov/ 28