GenomeJack Browser Appendix 3.1 MITSUBISHI SPACE SOFTWARE CO., LTD. 2014 09 18
Contents 1 1 1.1 BED....................................... 1 1.2 BED Graph.................................... 3 1.3 TSV (TXT).................................... 4 1.4 CSV....................................... 5 1.5 GFF........................................ 6 1.6 GTF........................................ 7 1.7 SAM / BAM................................... 9 1.8 VCF....................................... 11 1.9 WIG / BigWig.................................. 12 2 14 2.1 TSV............................. 14 2.2...................... 22 3 Tutorials 34 3.1 RNA-seq.................................. 34 3.1.1................................... 35 3.1.2 TopHat.............. 36 3.1.3 Cufflinks................... 36 3.1.4 GenomeJack............... 37 3.2 Chip-seq.................................. 38 3.2.1................................... 38 3.2.2 Bowtie.............. 39 3.2.3 MACS ChIP-seq................ 40 3.2.4 GenomeJack............... 40 4 GenomeJack 41 4.1 Web Start.................... 41 4.2 Web Start.................... 43 i
CHAPTER 1 1.1 BED BED BED 3 9 3 : 1. chrom 2. chromstart 3. chromend 9 : 1. name 2. score 3. shade 4. scoreinrange 5. strand 6. thickstart 7. thickend 8. itemrgb 1
9. blockcount 10. blocksizes 11. blockstarts ) chr No. chr.start chr.end name chr19 57684729 57684949 78 chr19 57684999 57685913 79 chr19 57717135 57717360 80 1.1. BED 2
1.2 BED Graph BED Graph probability scores transcriptome data BED Graph BED format track definition line track definition line track type=bedgraph Start End Start 1base name tracklabel User Track description centerlabel User Supplied Track visibility full dense hide hide color RRR,GGG,BBB 255,255,255 altcolor RRR,GGG,BBB 128,128,128 priority N 100 autoscale on off on alwayszero on off off griddefault on off off maxheightpixels max:default:min 128:128:11 graphtype bar points bar viewlimits lower:upper ylinemark real-value 0.0 ylineonoff on off off windowingfunction maximum mean minimum maximum smoothingwindow off [2-16] off ) track type=bedgraph name="bedgraph Format" description="bedgraph format" visibility=full color=200,100,0 altcolor=0,100,200 priority=20 hs_ch01 59302000 59302300-1.0 hs_ch01 59302300 59302600-0.75 hs_ch01 59302600 59302900-0.50 hs_ch01 59302900 59303200-0.25 hs_ch01 59303200 59303500 0.0 hs_ch01 59303500 59303800 0.25 hs_ch01 59303800 59304100 0.50 hs_ch01 59304100 59304400 0.75 hs_ch01 59304400 59304700 1.00 1.2. BED Graph 3
1.3 TSV (TXT) TSV (tab-separated values) 100 1.3. TSV (TXT) 4
1.4 CSV CSV (comma-separated values) (, ) 100 ) #chr,start,end,data1,data2 chr1,1,100,test1,test2 chr1,101,200,test1,"test2_1,test2_2" 1.4. CSV 5
1.5 GFF GFF (General Feature Format) GFF standard file format GFF 9 1. seqname sccafold 2. source feature 3. feature ferture CDS, start_codon, stop_codon, exon 4. start feature 5. end feature. 6. score 0 1000. 7. strand + -. 8. frame feature coding exon frame 0-2 coding exon. 9. group ) track name=regulatory description="telegene(tm) Regulatory Regions" chr11 TeleGene enhancer 1000000 1001000 500 +. touch1 chr11 TeleGene promoter 1010000 1010100 900 +. touch1 chr11 TeleGene promoter 1020000 1020000 800 -. touch2 1.5. GFF 6
1.6 GTF GTF GFF GTF GFF GTF group GFF gene_id transcript_id exon number type(ex. gene_id) value(ex. AB000123.1 ) ; gene_id transcript_id 1. seqname 2. source 3. feature CDS exon strat codon stop codon. 4. start 5. end DATA. 6. score 0~1000 track line usescore 1 DATA ( ) 7. strand + +, -, 8. frame feature CDS 0~2 ORF. 9. group 1.6. GTF 7
gene_id value geneid transcript_id value transcriptid ) chr1 Cufflinks exon 16337 16711 1000 -. gene_id "CUFF.11"; transcript_id "CUFF.11.1"; exon_number "1"; FPKM "2.7373380791"; frac "1.000000"; conf_lo "0.000000"; conf_hi "6.046319"; cov "6.243354"; chr1 Cufflinks transcript 564499 565139 1000 -. gene_id "CUFF.423"; transcript_id "CUFF.423.1"; FPKM "210.2223591982"; frac "0.563585"; conf_lo "187.674841"; conf_hi "232.769878"; cov "479.477689"; chr1 Cufflinks exon 564499 565139 1000 -. gene_id "CUFF.423"; transcript_id "CUFF.423.1"; exon_number "1"; FPKM "210.2223591982"; frac "0.563585"; conf_lo "187.674841"; conf_hi "232.769878"; cov "479.477689"; chr1 Cufflinks transcript 564657 565121 1000 +. gene_id "CUFF.425"; transcript_id "CUFF.425.1"; FPKM "241.0664985011"; frac "0.436415"; conf_lo "219.475829"; conf_hi "262.657168"; cov "549.827374"; chr1 Cufflinks exon 564657 565121 1000 +. gene_id "CUFF.425"; transcript_id "CUFF.425.1"; exon_number "1"; FPKM "241.0664985011"; frac "0.436415"; conf_lo "219.475829"; conf_hi "262.657168"; cov "549.827374"; 1.6. GTF 8
1.7 SAM / BAM SAM NGS SAM SAMtools SAM SAM SAM @ BAM SAM SAMtools SAM ) : @HD VN:1.0 SO:coordinate @SQ SN:hs_ch01 LN:197195432 @PG ID:TopHat VN:1.3.0 CL:/home/hoge/tophat/tophat -o./fuga /home/hoge/bowtie/indexes/hg19 hoge.fastq : HWI-ST256_0153_A81NGJABXX:6:44:1779:132415#0 16 chr1 24267522 255 100M * 0 0 ACAAGTAGGAAAAGTAACTCAGAACAAGGGCAAAGGTCAACTCTGCT CAGCTCTTCCAAAGGTCATGCAAAGGTCATTCAAAGGTCATTCAAAGGTCATT bbedegggg bgggfggdgefffffeccdbcgeeegccbbagbgggdggggfgggeegggggggggggggggg gggggggfgggggggggfgggggggggg NM:i:0 NH:i:1 HWI-ST256_0153_A81NGJABXX:6:41:20727:81915#0 16 chr1 24269023 255 75M * 0 0 GATTGATGGTTTGAGCTGTATAACCCAGTCCCATCTCTCTGGTTATGT CAGATTCAGTCACATGTCCCAAGCTCT bddgeee^ba[dbgbfeaefdf_eefffdfdda adbb^effefeefeffeegggfffffcefefeggegffgeeg NM:i:0 NH:i:1 1.7. SAM / BAM 9
HWI-ST256_0153_A81NGJABXX:6:24:2715:19715#0 16 chr1 24601316 1 35M * 0 0 CGGCAGCCACAGTCAAGTAGCGCCCATGTCTTGGA a ddeedfe ^]UWV abaa^ax]vy[[z]y[ NM:i:0 NH:i:3 CC:Z:chr2 CP:i:25078220 HI:i:0 HWI-ST256_0153_A81NGJABXX:6:62:8552:153328#0 16 chr1 24618408 3 100M * 0 0 TCTGTTTGGCGTAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGG GAGAGAAGGATGAAGGGGTATGCTATATATTTTGTTAGTGGGTCTAGAATAA dfddfggg^ ggggggagggffegbgggeeeeegegead eaageeggggeaggggeggggdc]^a[xbabgg gggggggfgggggggggggggggggggg NM:i:0 NH:i:2 CC:Z:chrM CP:i:10909 HI:i:0 HWI-ST256_0153_A81NGJABXX:6:48:17219:11563#0 16 chr1 24618419 3 100M * 0 0 TAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGGGAGAGAAGGAT GAAGGGGTATGCTATATATTTTGTTAGTGGGTCTAGAATAATGGAGATGCGA eedeefdf egafg_dccgc_deddcg]ggcfgfgggggfgfgggddddd_nyvu[dcyeagggggg gfgg gggfeffddddddeyfffdedffdebee NM:i:0 NH:i:2 CC:Z:chrM CP:i:10898 HI:i:0 HWI-ST256_0153_A81NGJABXX:6:26:8890:148254#0 272 chr1 24618437 3 33M * 0 0 TACAATTATTCCTCATAGGGAGAGAAGGATGAA a Tfffefgggg_ gggggggggffggggggfeg NM:i:1 NH:i:2 CC:Z:chrM CP:i:10947 HI:i: 1.7. SAM / BAM 10
1.8 VCF Variant Call Format (VCF) single nucleotide variants insertions/deletions, copy number variants and structural variants VCF meta-information lines data lines ) ##fileformat=vcfv4.1 ##filedate=20090805 ##source=myimputationprogramv3.1 ##reference=file:///seq/references/1000genomespilot-ncbi36.fasta ##contig=<id=20,length=62435964,assembly=b36,species="homo sapiens"> ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP,build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 11 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0 0:48:1:51,51 1 0:48:8:51,51 1/1:43:5:.,. 11 17330. T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0 0:49:3:58,50 0 1:3:5:65,3 0/0:41:3 11 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1 2:21:6:23,27 2 1:2:0:18,2 2/2:35:4 11 1230237. T. 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0 0:54:7:56,60 0 0:48:4:51,51 0/0:61:2 11 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 1.8. VCF 11
1.9 WIG / BigWig WIG GC probability scores WIG BedGraph WIG variablestep fixedstep WIG 1 track definition line WIG declaration line 2 BigWig WIG wigtobigwig WIG Wig variablestep variablestep chrom=chrn [span=windowsize] chromstarta datavaluea chromstartb datavalueb... etc...... etc... ) variablestep chrom=chr2 300701 12.5 300702 12.5 300703 12.5 300704 12.5 300705 12.5 fixedstep fixedstep chrom=chrn start=position step=stepinterval [span=windowsize] datavalue1 datavalue2... etc... 1.9. WIG / BigWig 12
) fixedstep chrom=chr3 start=400601 step=100 11 22 33 11,22,33 chr3 400601, 400701, 400801 span 1 1.9. WIG / BigWig 13
CHAPTER 2 TSV 2.1 TSV GenomeJack TSV GenomeJack TSV 1. (Mac OS X control + ) [Import] > [TSV] 2. 14
3. Setup Input / Output Resources Next 4. Set Parameters / Execute Create : Setting/Setting 5. Header Setting / File Import Setting 2.1. TSV 15
0 Next : 100 100 6. Determine Regions / File Import Setting F1 F2 F3 By Three Columns 2.1. TSV 16
: By One Column : 1 By Three Columns : 7. Column Settings / File Import Settings Finish Use As Label Create Index 2.1. TSV 17
8. Select 9. Import Setting OK TSV 2.1. TSV 18
10. [View] > [Show Table View] > [ ] TSV 11. 12. Edit 2.1. TSV 19
OK 13. 2.1. TSV 20
14. CSV 15. 2.1. TSV 21
2.2 GenomeJack 1. (Mac OS X control + ) [Import] > [arrayprobe] 2. 3. Setup Input / Output Resources Next 2.2. 22
4. Set Parameters / Execute Create 5. 1 Next 2.2. 23
6. Determine Regions / File Import Setting Name By One Column 2.2. 24
: By One Column : 1 By Three Columns : 7. Column Settings / File Import Setting ID ID Create ndex on Finish 2.2. 25
: Use As Label ID Use As Label Create ndex 8. Select 2.2. 26
9. Import Setting OK 10. 11. (Mac OS X control + ) [Import] > [array] 2.2. 27
12. 13. Setup Input / Output Resources Next 14. Set Parameters / Execute 014698_D_DNABack_BCLeft_20101001.txt Select Probe Import Setting Create 2.2. 28
15. Header Setting / Import Array Data Settings 15 Next 16. Determine Probe Position / Import Array Data Settings ID ID ID Doping Control Name Next 2.2. 29
17. Column Number / Import Array Data Settings 16 85 2.2. 30
: 18. Column Setting / Import Array Data Settings Finish 2.2. 31
19. Import Setting OK 20. 2.2. 32
2.2. 33
CHAPTER 3 Tutorials Fastq GenomeJack Browser RNA-Seq Chip-Seq 2 3.1 RNA-seq RNA-seq Paired-end FASTQ SRR064286 SRR064437 MCF-7 breast cancer cell line RNA-seq cdna RNA-seq GenomeJack hg19 1. 2. TopHat 3. Cufflinks 34
4. GenomeJack 3.1.1 TopHat Cufflinks TopHat Bowtie2 TopHat Cufflinks Bowtie2 TopHat Transcriptome Cufflinks http://tophat.cbcb.umd.edu/downloads/ tophat-2.0.9.linux_x86_64.tar.gz Bowtie2 http://cufflinks.cbcb.umd.edu/downloads/ cufflinks-2.1.1.linux_x86_64.tar.gz TopHat http://sourceforge.net/projects/bowtie-bio/files/ bowtie2/2.1.0/bowtie2-2.1.0-linux-x86_64.zip/download Bowtie bowtie2-2.1.0 indexes indexes hg19.zip ftp://ftp.cbcb.umd.edu/pub/data/bowtie2_indexes/hg19.zip : PATH 3.1. RNA-seq 35
3.1.2 TopHat 1. SRR064286_1.fastq SRR064286_2.fastq human genome tophat2 -r 200 --library-type fr-unstranded -o./srr064286_tophat_out path/to/bowtie_index SRR064286_1.fastq SRR064286_2.fastq -o SRR064286_out 2. SRR064437 tophat2 -r 200 --library-type fr-unstranded -o./srr064437_tophat_out path/to/bowtie_index SRR064437_1.fastq SRR064437_2.fastq accepted_hits.bam 3.1.3 Cufflinks 1. cufflinks -o./srr064437_cufflinks_out -G refgene.gtf --library-type fr-unstranded SRR064437_tophat_out/accepted_hits.bam cufflinks -o./srr064286_cufflinks_out -G refgene.gtf --library-type fr-unstranded SRR064286_tophat_out/accepted_hits.bam refgene.gtf RefSeq URL http://genome.ucsc.edu/cgi-bin/hgtables?command=start 3.1. RNA-seq 36
human RefSeq Genes GTF transcripts.gtf 3.1.4 GenomeJack transcripts.gtf GenomeJack FPKM KREMEN2 MCF-7 3.1. RNA-seq 37
3.2 Chip-seq ChIP-seq SRR020051 SRR020053 HeLa PHF8 ChIP-seq HeLa IgG ChIP-seq (Control) GenomeJack hg19 : PHF8 H3K4me3 1. 2. Bowtie 3. SAMtools 4. MACS ChIP-seq 5. GenomeJack 3.2.1 Bowtie2 SAMtools MACS Bowtie SAMtools MACS Bowtie2 SAMtools RNA-seq Bowtie MAC 3.2. Chip-seq 38
MACS http://sourceforge.net/project/ showfiles.php?group_id=246254 Bowtie ChIPseq https://github.com/taoliu/macs/ : PATH 3.2.2 Bowtie 1. SRR020051.fastq SRR020053.fastq human genome bowtie2 -x path/to/hg19 -U SRR020051.fastq -S SRR020051.sam SAM 2. SRR020053.fastq bowtie2 -x path/to/hg19 -U SRR020053.fastq -S SRR020053.sam 3. SAMtools SAM BAM GenomeJack BAM samtools view -bhs -o SRR020051.bam SRR020051.sam samtools sort SRR020051.bam SRR020051.sorted samtools index SRR020051.sorted.bam samtools view -bhs -o SRR020053.bam SRR020053.sam samtools sort SRR020053.bam SRR020053.sorted samtools index SRR020053.sorted.bam : view : SAM BAM sort : BAM 3.2. Chip-seq 39
index : BAM 3.2.3 MACS ChIP-seq 1. SRR020053 control SRR020051 macs2 -t SRR020051.sorted.bam -c SRR020053.sorted.bam -f BAM -n SRP010008 -g hs SRP010008_peaks.bed SRP010008_summits.bed SRP010008_MACS_wiggle/treat/ wig.gz 3.2.4 GenomeJack 1. SRP010008_peaks.xls GenomeJack PEX10 PHF8 ChIP-seq PEX10 PHF8 3.2. Chip-seq 40
CHAPTER 4 GenomeJack GenomeJack Web Start Web Start Web Start GenomeJack 4.1 Web Start GenomeJack Web Start Web Start 1. GenomeJACK 41
gj lib settings temp GenomeJack GenomeJack GenomeJack GenomeJack 2. gj Web Start ( GenomeJACK3.0 ) 3. gj genomejack.ini "sessionfilepath": " /gj/genomejack.session" "sessionfilepath": "Web Start /gj/genomejack.session" 4. Web Start GenomeJack Web Start 4.1. Web Start 42
4.2 Web Start GenomeJack Web Start Web Start 1. Web Start ( GenomeJACK3.0) gj settings temp GenomeJack GenomeJack 2. Web Start gj GenomeJACK 3. gj genomejack.ini "sessionfilepath": "Web Start /gj/genomejack.session" "sessionfilepath": " /gj/genomejack.session" 4. GenomeJack Web Start 4.2. Web Start 43