ryamasi@hgc.jp
BLAST Genome browser InterProScan PSORT DBTSS Seqlogo JASPAR Melina II Panther Babelomics +@
>cdna_test CCCCTGCCCTCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCCTGTGCAGCTGTGGGTTGATTCCAC ACCCCCGCCCGGCACCCGCGTCCGCGCCATGGCCATCTACAAGCAGTCACAGCACATGACGGAGGTTGTG AGGCGCTGCCCCCACCATGAGCGCTGCTCAGATAGCGATGGTCTGGCCCCTCCTCAGCATCTTATCCGAG TGGAAGGAAATTTGCGTGTGGAGTATTTGGATGACAGAAACACTTTTCGACATAGTGTGGTGGTGCCCTA TGAGCCGCCTGAGGTTGGCTCTGACTGTACCACCATCCACTACAACTACATGTGTAACAGTTCCTGCATG GGCGGCATGAACCGGAGGCCCATCCTCACCATCATCACACTGGAAGACTCCAGTGGTAATCTACTGGGAC GGAACAGCTTTGAGGTGCGTGTTTGTGCCTGTCCTGGGAGAGACCGGCGCACAGAGGAAGAGAATCTCCG CAAGAAAGGGGAGCCTCACCACGAGCTGCCCCCAGGGAGCACTAAGCGAGCACTGCCCAACAACACCAGC TCCTCTCCCCAGCCAAAGAAGAAACCACTGGATGGAGAATATTTCACCCTTCAGATCCGTGGGCGTGAGC
NCBI(http://www.ncbi.nlm.nih.gov/)
UniGene EST cdna GEO Gene Entrez Gene Structure Map viewer
http://www.geneontology.org/
1. AAAAAAAA 2.
NCBI http://www.ncbi.nlm.nih.gov/genomes/ Ensembl http://www.ensembl.org/index.html UCSC Genome browser http://genome.ucsc.edu/
http://genome.ucsc.edu/
Genome browser :download
26250
1. EST 2.
http://www.ebi.ac.uk/tools/interproscan/
http://psort.ims.u-tokyo.ac.jp/
WoLF PSORT
BLAST Genome browser InterProScan PSORT DBTSS Seqlogo JASPAR Melina II Panther Babelomics +@
cdna CDS
genome mrna(full) AAAA Genbank Refseq cdna TTTT TTTT TTTT TTTT cdna 5 TTTT
Oligo-capping AP- 5 http://dbtss.hgc.jp
:0&+$ &!0*#!"#$ %#$!&# '()!$#* '&+""#, '-. '/!(0*/1$2 %#$!&# '()!$#* '&+""#, '-. '/!(0*/1$2 3#456 78869#"5 :%; 76;<87 666=>7 ;>>? @ @ @ @ @ @ 3#457 7887.+45 :%66 <8877A 6>=><A?==B @ &&6 ==<>7 6<B8B 7;>? @ 3#45= 788=.+C5 :%6= <8877A 6?8?B< 667=<?<;8 &&7 A>878? 6?A<<B ;A7< B>;A 3#45< 788<-!3 :%6B <8877A 7;;;?< 6AA=B 67;>8 &&= A>878? 7?8;6< 6666B 68?== 3#45A 788A9#" :%6; 6;>87?A 6=A?888 6?;A= 6A7B7 &&A A>878? =B<<>; 6<;<B 6<6B7 Ver. 5 19753 / 22682(87.1%) 14746 / 17213(85.7%) Ver. 6 (2007 Sep) SOLEXA 9
Case 1 5 UTR ORF ignored Case 2 TSS with max number of clones Median locus of TSS ignored
DBTSS ver. 5
Weblogo http://weblogo.berkeley.edu/ SEQLOGO http://www.bioinf.ebc.ee/ep/ep/ Weblogo
Ribosomal protein mrna TSS -10~+10 45
Ribosome protein (45 ) (880
http://microrna.sanger.ac.uk/
pre-mature
mature
A B C D
TRANSFAC Public DBTSS Link http://www.biobase.de/ JASPAR http://jaspar.genereg.net
http://fantom.gsc.riken.jp/4/
! " #! $! "! # " % & % % % % & % % # % % & % % % % % &! & % % & % & % & % $ % % % % & % % % % # % % & % % % % % & $ % % % % & % % % % " % & % % % % & % %! & % % & % & % & % # % % & % % % % % &! " #! $! "! # " % & % % % % & % % # % % & % % % % % &! & % % & % & % & % $ % % % % & % % % % # % % & % % % % % & $ % % % % & % % % % " % & % % % % & % %! & % % & % & % & % # % % & % % % % % &! " #! $! "! # " % & % % % % & % % # % % ' % % % % % &! & % % ( % & % & % $ % % % % ) % % % % # % % & % % % % % & $ % % % % & % % % % " % & % % % % & % %! & % % & % & % ' % # % % & % % % % % ( 3 3 n n!!! MEME Gibbs CONSENSUS.
MEME Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------- SEQ8; 172 9.57e-10 CCCGGAGTAT CTCAATCGTAGATGA ATACCACTTT SEQ3; 112 9.57e-10 GTTATATTGG CTCAATCGTAGATGA AACCAGACTC SEQ5; 185 1.96e-09 ACGGGCAAGC CTCAATCGTAGAGGA T SEQ6; 105 2.82e-09 GTCAGCCGGT CTCAATCGTAGATCA GAGGCGAGAA SEQ4; 173 4.67e-09 GTTCGAGAGC CTCAATCGTAGATAA CCTCTCTGGC SEQ2; 172 4.67e-09 AAGCGTCGTG CTCAATCGTAGATAA CAGAGGTCGG SEQ10; 3 7.52e-09 TT CTCAATCGTAGAGTA TGCTTAGAGG SEQ9; 93 7.52e-09 CGCCTAGAAA CTCAATCGTAGAGTA TCACGCACCG SEQ1; 52 9.33e-09 CTTTACTCGG CTCAATCGTAGAGGC GGTGCCGCGA SEQ7; 177 1.95e-08 AAGTCTTTGA CTCAATCGTAGACCC AACACTTGA -------------------------------------------------------------------------------- MOTIF A Gibbs 1-1 53 tttactcggc TCAATCGTAG aggcggtgcc 62 2-1 173 agcgtcgtgc TCAATCGTAG ataacagagg 182 3-1 113 ttatattggc TCAATCGTAG atgaaaccag 122 4-1 174 ttcgagagcc TCAATCGTAG ataacctctc 183 5-1 186 cgggcaagcc TCAATCGTAG aggat 195 6-1 106 tcagccggtc TCAATCGTAG atcagaggcg 115 7-1 178 agtctttgac TCAATCGTAG acccaacact 187 8-1 173 ccggagtatc TCAATCGTAG atgaatacca 182 9-1 94 gcctagaaac TCAATCGTAG agtatcacgc 103 10-1 4 ttc TCAATCGTAG agtatgctta 13
1. FASTA 2. 3.submit
1. 3. 2.sequence logo
1. 3. 2.sequence logo 4.
BLAST Genome browser InterProScan PSORT DBTSS Seqlogo JASPAR Melina II Panther Babelomics +@
microarray,? GO regulation
http://www.pantherdb.org/
http://www.babelomics.org/
454: 500 bp * 1,000,000 reads Solid, SOLEXA: 25~50(70bp) * 100,000,000~ 1 run 1T
mapping assemble Web Mapping: Maq, SOAP, BowTie, TopHat Assemble: velvet, GSassembly
参考文献 Database issue Web server issue
UNIX R http://www.r-project.org/) Perl, ruby, python, C++, C
refgene.txt cut -f 3 refgene.txt sort uniq -c Mac OSX 2 refgene.txt 3. cd ~/Desktop 4. cut -f 3 refgene.txt sort uniq -c
https://supcom.hgc.jp/japanese/
2T
http://www.hgc.jp/~ryamasi/others ryamasi@hgc.jp