2003 1 28 Infobiologist Infobiologist 0
FORTRAN NEC PC8001 (Z-80) N-BASIC NEC PC9801,9821 N88-BASIC, MS-DOS Macintosh, Windows GUI only (Dark period of programming) Linux GNU C, Perl, Tcl/Tk Mac OS X (=Darwin or Free BSD) 1
2
in vivo 3
13Cラベル化合物のࡐ量分析 4
5
BASIC BASIC 1. 2. (redundant 3. (PC8001, Brother typewriter, interface) 6
7
EST Excel BLAST, DBGET Clustal W 8
9
SISEQ 1995 Macintosh, Windows CD Macintosh CodeWarrior Windows Visual C++ CodeWarrior Performa 10
Linux 1996 Linux Linux Slackware CD DOS/V 1997 Linux pcmcia PC CD Slackware 11
Linux Linux gcc FASTA GenBank GenBank 12
SISEQ UNIX Macintosh, Windows SISEQ header GenBank FASTA 13
Seqfile typedef struct { long int seqlen; int dtype; /* data type, DNA, RNA, protein */ int circular; char *ID; char AC[30]; /* accession */ char OS[50]; /* organism species */ char dbtype[30]; char *header; char *seq; } Seqfile; /* Structure of description of a sequence. */ 14
File format File format /* Sequence file format */ enum filetype { RAW_SEQ, /* 0 */ FASTA_SEQ, /* 1 */ GENBANK_SEQ, /* 2 */ EMBL_SEQ, /* 3 */ SW_SEQ, /* 4 */ EMBL_UK, /* 5 */ PRF_SEQ, /* 6 */ CLUSTAL_SEQ, /* 7 */ PIR_SEQ, /* 8 */ PHYLIP_SEQ}; /* 9 */ 15
Exon_record typedef struct { char AC[30]; /* Accession number of sequence segment */ int nc; /* normal (0) or complementary (1) strand */ long begin; /* beginning position */ Boolean begin_ext; /* actual beginning is located upstream, ie, <begin */ long end; /* end position */ Boolean end_ext; /* actual end is located downstream, ie., >end */ } Exon_record; /* Structure of description of exons. */ 16
17
SISEQ (1) GenBank, EMBL, SwissProt DNA) FASTA *.aln *.aln *.phy 18
SISEQ (2) "siseq.cf" Mac, Win GC skew, codon usage, DNA ORF (libseq.a) 19
SISEQ (3) SISEQ UNIX TACG 20
SISEQ (4) "siseq" Mac, Win "siseq command infile outfile" UNIX "command infile outfile" (GUI) Tcl/Tk "siseq.tk" Graphical Mac, Win 21
22
SISEQ (5) (Mac siseq setvar form [circular/default]. setvar printline(line) [<number>/default]. setvar addseqg [true,true,1/false]. true: include introns in cdsnuc and extrna. setvar seq_import [true,true,1/false]. true: enables import of external sequences. default, mitochondrion, etc. + external file 23
SISEQ (6) "siseq help" ************************************************ Available commands are: 1. txtr 2. getseq2 3. toprot 4. tofast 5. seqcat 21. extcds 22. cdsnuc 23. extrna 24. genlist 25. getent 26. extint 27. noncod 31. getclu 32. chname 33. simtbl 91. codon (explanation) 99. Other commands. ************************************************ 24
UNIX LAN OS 25
UNIX graphical user interface MacOS X 26
UNIX MacOS X G3, G4 UNIX (FreeBSD Darwin) X on Windows Cygwin Windows X Window UNIX gcc Perl UNIX Dual boot Linux Windows Mac Virtual PC OS 27
MacOS X MacOS X BLAST UNIX Word, Excel, Illustrator, Photoshop Mac Windows UNIX swap 28
MacOS X X Window MacOS X X Window Terminal X Window Linux X Window open source X Window Unix X Window GUI UNIX Clustal X ssh X Window 29
MacOS X MacOS X CD Root XDarwin NCBI toolbox 2002 30
MacOS X web page MacOS X web page http://www.molbiol.saitama-u.c.jp/~naoki/pne/ 31
MacOS X C, C++ (cc, c++) (CodeWarrior, RealBASIC) (bash, tcsh) AppleScript (HyperCard Classic Perl, Java, Tcl/Tk, Python, etc. Fortran 32
Software resources Software resources Name SISEQ Function Manipulation of large database sequences Source language C Platform UNIX Mac Win GenoMap Graphical representation of microarray data Tcl/Tk UNIX (Mac, Win) LSORT Homology-based clustering of genomeencoded proteins C UNIX UNIX = Linux, MacOS X, IRIX, Solaris 33
Discontinuous evolution of plastid genomic machinery N. Sato (2001) Trends in Plant Science 6: 151-156 34
Clustering by the homology group method ORF pool Annotation table BLASTP 1st step: BLAST E-value bl2ls2.pl Groups of all possible homologues Homologue list 2nd Alternative step: homology 2nd step: region LSORT3 scanning Subgroups E-value and multidomain Lineage-specific proteins Homology group matrix homology groups homologgroupsg.pl, tbsort6d.pl, etc Homology group sequences 35
Example result Example result Group 268: 21 sequences. Syn_sll1237 299 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 protoporphyrinogen_ix_oxidase_hemg TE_c47g5823 301 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 Predicted_rRNA_or_tRNA_methylase_sll1237_277_1e-74 Ana_alr0115 304 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 protoporphyrinogen_oxidase;_hemk NP_c370g2 296 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SAM-dependent_methyltransferases_sll1237_301_4e-82 Tel_tlr1836 291 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 protoporphyrinogen_ix_oxidase S81_g810 296 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 SAM-dependent_methyltransferases_sll1237_183_2e-46 PM1_g2057 289 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SAM-dependent_methyltransferases_sll1237_156_2e-38 PM2_g45 306 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SAM-dependent_methyltransferases_sll1237_168_9e-42 CA_c1066g1014 283 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 SAM-dependent_methyltransferases Ctep_CT1487 294 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 HemK_protein Rpal_g1394 289 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SAM-dependent_methyltransferases_hemK_122_3e-28 Rpal_g5406 345 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SAM-dependent_methyltransferases_yfcB_251_7e-67 ATH_At5g64150 377 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 putative_protein BS_ywkE 288 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 similar_to_protoporphyrinogen_oxidase EC_b1212 277 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 hemk_possible_protoporphyrinogen_oxidase EC_b2330 421 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 yfcb_putative_adenine-specific_methylase SC_YNL063W 315 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 Hypothetical_ORF Rpal_g1713 259 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 SAM-dependent_methyltransferases_TM0691_54_2e-07 BS_yabB 247 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 similar_to_hypothetical_proteins TE_c3g4023 239 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 SAM-dependent_methyltransferases_PM1839_173_2e-43 EC_b2575 285 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 yfic_putative_enzyme 36
Cyanobacterial genomes An: Anabaena 7120 Np: Nostoc punctiforme Sy: Synechocystis 6803 37
Comparison of cyanobacterial, plant, and other genomes ORFs that might not have been acquired without cyanobacterial endosymbiosis (80) 38
Name Syn Ana Species Accession Length Proteins Group Genomes Synechocystis sp. PCC 6803 used AB001339 in 3,573,470 the analysis 3,264 Cyanobacteria Anabaena sp. PCC 7120 BA000019 6,413,773 5,364 Cyanobacteria GC % Jan 47.7 2003 41.3 S81 Synechococcus sp. WH8102 JGI 2,434,431 2,514 Cyanobacteria 59.4 Pm1 Prochlorococcus marinus MED4 JGI 1,657,995 1,694 Cyanobacteria 30.8 Pm2 Prochlorococcus marinus MIT9313 JGI 2,410,873 2,251 Cyanobacteria 50.7 Np Nostoc punctiforme PCC 73102 JGI 9.2 Mb 7,281 Cyanobacteria 41.4 TE Tel Ctep Trichodesmium erythraeum Thermosynechococcus elongatus BP-1 Chlorobium tepidum TLS JGI 6.5 Mb 4,841 Cyanobacteria BA000039Total 2,593,857 = 97,563 2,475 proteins Cyanobacteria AE006470 2,154,946 2,252 Green-sulfur 33.6 53.9 56.5 CA Chloroflexus aurantiacus JGI 3,854,393 3,372 Green non-sulfur 56.6 Rpal Rhodopseudomonas palustris JGI 5,459,222 4,690 Proteo alpha 65.0 EC Escherichia coli K-12 MG1655 U00096 4,639,221 4,289 Proteo gamma 50.8 BS Bacillus subtilis 168 AL009126 4,214,814 4,100 Low GC Gram + 43.5 SC Saccharomyces cerevisiae NC001133-48 12.1 Mb 6,306 Ascomycota 38.3 SCmt S. cerevisiae mitochondrion AJ011856 85,779 28 17.1 CE Caenorhabditis elegans GenBank 100.1 Mb 17,083 Nematoda 35.6 CEmt C. elegans mitochondrion X54252 13,794 10 23.8 ATH Arabidopsis thaliana NC003070-4 116.4 Mb 25,545 eudicotyledons 36.0 ATHmt A. thaliana mitochondrion Y08501-2 366,924 117 44.7 ATHcp A. thaliana chloroplast AP000423 154,478 87 36.3 39
Homology groups that are shared by Arabidopsis and photosynthetic prokaryotes. The groups were extracted by varying the threshold E value from 1e-8 to 1e-30, and classified according to the properties of Arabidopsis members. Category Cpencoded Nuc-encoded With transit seq. No transit seq. Annotation given No annotation Annotation given No annotation Total 8 Cy & Ath 18 55 (19) 26 3 6 108 8 Cy & Ath & 1-3 Ph Potential novel 8 25 ( 7) 2 2 1 38 photosynthesisrelated genes Total 26 80 (26) 28 5 7 146 Cy, cyanobacteria; Ath, Arabidopsis; Ph, photosynthetic bacteria. Cp, chloroplast; Nuc, nucleus; Transit seq., transit sequence. 40
Contribution of cyanobacterial genome to plant (chloroplast and nuclear ) genomes Endosymbiotic origin of chloroplast The ancestor of chloroplasts must be a common ancestor of all the cyanobacteria analyzed. 41
GenoMap showing the gene expression in whole genome 42
Thanks to: Saitama S. Ehira T. Hamano Tokyo M. Ohmori M. Ikeuchi Kazusa S. Tabata T. Kaneko Kyushu S. Kuhara K. Tashiro 43