NGS (RNAseq)
»NGS Now Generation Sequencer»NGS»» 4
NGS(Next Generation Sequencer) Now Generation Sequencer
http://www.youtube.com/watch?v=womkfikwlxm http://www.youtube.com/watch?v=mxkya9xcvbq http://www.youtube.com/watch?v=nhcj8ptycfc 6
MiSeq NextSeq 7
SRR001356.1 2023DAAXX:5:1:123:563 length=33 TGTCGGTCCAGCTCGGCCTTGGGCTCCGTTTTC +SRR001356.1 2023DAAXX:5:1:123:563 length=33 -IIIIIIII8IIIIIIIIIII6IIIIIIIII9I @SRR001356.2 2023DAAXX:5:1:123:476 length=33 TCTGAACCCGACTCCCTTTCGATCGGCCGCGGG +SRR001356.2 2023DAAXX:5:1:123:476 length=33 IIIIIIIIIIIIIIIIIIIIIGIIIIIII-III @SRR001356.3 2023DAAXX:5:1:121:746 length=33 GTGGCAGCGTTTTTGGGCCCGCCGCTTGCCGTT +SRR001356.3 2023DAAXX:5:1:121:746 length=33 IIIII&IIIIIIIIIIIIIIIIHI1IIIIIIII 8
NGS RNA RNAseq(transcriptome sequencing) DNA ChIPseq( ) ChIP Chromatin immunoprecipitation Exome(exon ), Re-sequence 9
NGS (RNAseq)
RNAseq RNA cdna http://en.wikipedia.org/wiki/rna-seq Whole transcriptome shutgun sequencing (WTSS) Transcriptome sequencing 11
SRR001356.1 2023DAAXX:5:1:123:563 length=33 TGTCGGTCCAGCTCGGCCTTGGGCTCCGTTTTC +SRR001356.1 2023DAAXX:5:1:123:563 length=33 -IIIIIIII8IIIIIIIIIII6IIIIIIIII9I @SRR001356.2 FASTQ 2023DAAXX:5:1:123:476 length=33 TCTGAACCCGACTCCCTTTCGATCGGCCGCGGG +SRR001356.2 2023DAAXX:5:1:123:476 length=33 IIIIIIIIIIIIIIIIIIIIIGIIIIIII-III @SRR001356.3 2023DAAXX:5:1:121:746 length=33 GTGGCAGCGTTTTTGGGCCCGCCGCTTGCCGTT +SRR001356.3 2023DAAXX:5:1:121:746 length=33 IIIII&IIIIIIIIIIIIIIIIHI1IIIIIIII RNAseq 1.fa 1.tophat (bowtie).bam 2.cufflinks.gtf 3.cummeRbund 12
1 FASTA.fa.fasta 2 FASTQ.fq.fastq 3 SRA/SRA-lite.sra.lite.sra 4 SAM/BAM.sam.bam 5 GTF(GFF).gtf.gff 6 BED.bed 7 VCF.vcf 13
1. FASTA FASTA :.fa.fasta 1 > 1 2 >gi 5524211 gb AAD44166.1 cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY : http://ja.wikipedia.org/wiki/fasta 14
2. FASTQ NGS :.fq.fastq 1 @ 1 2 3 + 4 2 @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + ''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 : http://ja.wikipedia.org/wiki/fastq 15
3. SRA, SRA-lite FASTQ NGS :.sra.lite.sra SRA-toolkit FASTQ http://www.ncbi.nlm.nih.gov/traces/sra/?view=software fastq-dump -A SRR233129 SRR233129.lite.sra 16
4. SAM/BAM SAM (ASCII) BAM (binary) 1:497:R:-272+13M17D24M 113 1 497 37 37M 15 100338662 0 CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG 0;==-==9;>>>>>=>>>>>>>>>>>= 19:20389:F:275+18M2D19M 99 1 176440 37M = 17919314 TATGACTGCTAATAATACCTACACATGTTAGAACCAT >>>>>>>>>>>>>>>>>>>><<>>><<>>4: 19:20389:F:275+18M2D19M 147 1 179190 18M2D19M = 17644-314 GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT ;44999;499<8<8<<<8<<><<<<>< 9:21597+10M2I25M:R:-209 83 1 216780 8M2I27M = 21469-244 CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT <;9<<5><<<<><<<>><<><>><9> : http://genome.sph.umich.edu/wiki/sam 17
5. GTF(GFF) General Transfer Format. GFF(General Feature Format) version2 : X Ensembl Repeat2419108 2419128 42.. hid=trf; hstart=1; hend=21 X Ensembl Repeat2419108 2419410 2502 -. hid=alusx; hstart=1; hend=303 X Ensembl Repeat2419108 2419128 0.. hid=dust; hstart=2419108; hend=2419128 X Ensembl Pred.trans.2416676 2418760 450.19-2 genscan=genscan00000019335 X Ensembl Variation 2413425 2413425. +. X Ensembl Variation 2413805 2413805. +. : http://asia.ensembl.org/info/website/upload/gff.html 18
6. BED : track name=pairedreads description="clone Paired Reads" usescore=1 chr22 1000 5000 clonea 960 + 1000 5000 0 2 567,488, 0,3512 chr22 2000 6000 cloneb 900-2000 6000 0 2 433,399, 0,3601 : http://genome.ucsc.edu/faq/faqformat.html#format1 19
7. VCF Variant Call Format ##fileformat=vcfv4.0 ##filedate=20110705 ##reference=1000genomespilot-ncbi37 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 2 4370 rs6057 G A 29. NS=2;DP=13;AF=0.5;DB;H2 GT:GQ:DP:HQ 0 0:48:1:52,51 1 0:48:8:51,51 1/1:43:5:.,. 2 7330. T A 3 q10 NS=5;DP=12;AF=0.017 GT:GQ:DP:HQ 0 0:46:3:58,50 0 1:3:5:65,3 0/0:41:3 2 110696 rs6055 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1 2:21:6:23,27 2 1:2:0:18,2 2/2:35:4 2 130237. T. 47. NS=2;DP=16;AA=T GT:GQ:DP:HQ 0 0:54:7:56,60 0 0:48:4:56,51 0/0:61:2 2 134567 microsat1 GTCT G,GTACT 50 PASS NS=2;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 : http://en.wikipedia.org/wiki/variant_call_format 20
NGS 2000 : (GEO) http://lifesciencedb.jp/geo/ 21
oligoprobe Genespring 22
SRR001356.1 2023DAAXX:5:1:123:563 length=33 TGTCGGTCCAGCTCGGCCTTGGGCTCCGTTTTC +SRR001356.1 2023DAAXX:5:1:123:563 length=33 -IIIIIIII8IIIIIIIIIII6IIIIIIIII9I @SRR001356.2 FASTQ 2023DAAXX:5:1:123:476 length=33 TCTGAACCCGACTCCCTTTCGATCGGCCGCGGG +SRR001356.2 2023DAAXX:5:1:123:476 length=33 IIIIIIIIIIIIIIIIIIIIIGIIIIIII-III @SRR001356.3 2023DAAXX:5:1:121:746 length=33 GTGGCAGCGTTTTTGGGCCCGCCGCTTGCCGTT +SRR001356.3 2023DAAXX:5:1:121:746 length=33 IIIII&IIIIIIIIIIIIIIIIHI1IIIIIIII 1.tophat (bowtie) RNAseq.fa.bam 2.cufflinks.gtf 3.cummeRbund 23
(= ) ( )Excel Excel2003 24
NGS(RNAseq) +++ +++ +++ +++ - ++ - ++ + +++ + +++ 25
: RPKM Reads Per Kilobase per Million mapped reads 100 1000 FPKM Fragments Per Kilobase of exon per Million mapped fragments Reference: Nat Methods, 5(7):621-628. 26
NGS WET&DRY https://www.yodosha.co.jp/jikkenigaku/book/9784758101912/ 28
The cat way http://cat.hackingisbelieving.org/lecture/ tophat -p 8 -r 100 -o output_dir/ips_01 bowtie2_indexes/mm9 ips_01_1.fastq cuffdiff -p 24 ensembl_gene.gtf -L ips_01,ips_02,hesc_01,hesc_02,fibroblast_01,fibroblast_02 -o results ips_01.bam,ips_2.bam hesc_1.bam,hesc_2.bam Fibroblast_01.bam,Fibroblast_02.bam Tuxedo suite bowtie,tophat,cufflinks R + Bioconductor 29
SRR001356.1 2023DAAXX:5:1:123:563 length=33 TGTCGGTCCAGCTCGGCCTTGGGCTCCGTTTTC +SRR001356.1 2023DAAXX:5:1:123:563 length=33 -IIIIIIII8IIIIIIIIIII6IIIIIIIII9I @SRR001356.2 FASTQ 2023DAAXX:5:1:123:476 length=33 TCTGAACCCGACTCCCTTTCGATCGGCCGCGGG +SRR001356.2 2023DAAXX:5:1:123:476 length=33 IIIIIIIIIIIIIIIIIIIIIGIIIIIII-III @SRR001356.3 2023DAAXX:5:1:121:746 length=33 GTGGCAGCGTTTTTGGGCCCGCCGCTTGCCGTT +SRR001356.3 2023DAAXX:5:1:121:746 length=33 IIIII&IIIIIIIIIIIIIIIIHI1IIIIIIII RNAseq 1.fa 1.tophat (bowtie).bam 2.cufflinks.gtf 3.cummeRbund 30
0. Illumina igenomes 31
Illumina igenomes (human UCSC hg19 ) ( ) Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/ % cp -r Bowtie2Index/ 141025/ ( ) Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf % cp genes.gtf 141025/ 32
1. fastq.fastq pass.sra fastq-dump fastq-dump homebrew sratoolkit % brew install sratoolkit -v tap % brew tap homebrew/science -v % fastq-dump *.sra.fastq 33
2. tophat(bowtie) tophat % brew install tophat -v bowtie2, samtools % tophat -p 8 -o tophat/cntl Bowtie2Index/ genome GSE50491/SRR960409.fastq tophat bowtie 35
tophat2 % mkdir tophat % tophat -p 8 -o tophat/brd4 Bowtie2Index/ genome GSE50491/SRR960410.fastq p: CPU r: ( ) 1fastq 1 (MacPro -p 12) 36
3. cufflinks cufflinks % brew install cufflinks -v % cufflinks -p 8 -o cufflinks_cntl tophat/ CNTL/accepted_hits.bam % cufflinks -p 8 -o cufflinks_brd4 tophat/ BRD4/accepted_hits.bam 1bam 1.5 (MacPro -p 12) 38
cuffmerge cuffmerge assembly merge % cuffmerge -p8 s hg19.fa assemblies.txt cuffcompare % cuffcompare -p8 s hg19.fa -r genes.gtf merged_asm/merged.gtf transcript slice variant cuffdiff 39
cuffdiff % cuffdiff -p 8 -L CNTL,BRD4 -o cuffdiff genes.gtf tophat/cntl/accepted_hits.bam tophat/brd4/accepted_hits.bam homebrew http://cufflinks.cbcb.umd.edu/ % ~/Downloads/cufflinks-2.2.1.OSX_x86_64/ cuffdiff -p 8 -L CNTL,BRD4 -o cuffdiff genes.gtf tophat/cntl/accepted_hits.bam tophat/brd4/accepted_hits.bam 40
4. cummerbund R % R cummerbund > source("http://bioconductor.org/bioclite.r") > bioclite("cummerbund") in R Graphical Manual http://rgm3.lab.nig.ac.jp/rgm/r_image_list? package=cummerbund&init=true > browsevignettes("cummerbund") 42
cummerbund( ) manual > library( cummerbund") > setwd("~/desktop/141025") > cuff.dir <- "cuffdiff" > cuff <- readcufflinks(dir=cuff.dir) > dens <- csdensity(genes(cuff)) > dens CSV FPKM > gene.matrix <- fpkmmatrix(genes(cuff)) > write.csv(gene.matrix, file="fpkm.csv") 43
4.
R R (R ) http://www.iu.a.u-tokyo.ac.jp/~kadota/r.html (R ) http://www.iu.a.u-tokyo.ac.jp/~kadota/r_seq.html by from http://www.kyoritsu-pub.co.jp/bookdetail/9784320123700 45
CLC Genomics workbench Agilent Avadis NGS GeneSpring TIBCO Spotfire 46
SRR001356.1 2023DAAXX:5:1:123:563 length=33 TGTCGGTCCAGCTCGGCCTTGGGCTCCGTTTTC +SRR001356.1 2023DAAXX:5:1:123:563 length=33 -IIIIIIII8IIIIIIIIIII6IIIIIIIII9I @SRR001356.2 FASTQ 2023DAAXX:5:1:123:476 length=33 TCTGAACCCGACTCCCTTTCGATCGGCCGCGGG +SRR001356.2 2023DAAXX:5:1:123:476 length=33 IIIIIIIIIIIIIIIIIIIIIGIIIIIII-III @SRR001356.3 2023DAAXX:5:1:121:746 length=33 GTGGCAGCGTTTTTGGGCCCGCCGCTTGCCGTT +SRR001356.3 2023DAAXX:5:1:121:746 length=33 IIIII&IIIIIIIIIIIIIIIIHI1IIIIIIII RNAseq 1.fa 1.tophat (bowtie).bam 2.cufflinks.gtf 3.cummeRbund 47
TV CLC Genomics Workbench http://togotv.dbcls.jp/20110628.html 48
RNA-seq by Avadis NGS http://togotv.dbcls.jp/20111124.html 49
ChIP-seq by Avadis NGS http://togotv.dbcls.jp/20120626.html 50
GeneSpring https://www.youtube.com/user/genespringtv 51
Spotfire cuffdiff % cuffdiff -p 8 Caenorhabditis_elegans.WBcel215.69.gtf -L N2,UV -o cuffdiff SRR454084.bam SRR454085.bam 52