####### this file indicates the workflow of my analysis ####### step 1: build genome index of hg19 sofware: GMAP input file: gencode.v19.chr_patch_hapl_scaff.fasta output file: hg19_genecode.(this is a folder contains the genome index) codes: available in Wu_4_scripts, file ToFU.sh step 2: align fasta sequence to reference genome sofware: GMAP input file: hq_isoforms.fastq output file: hq_isoforms.fastq.sam codes: available in Wu_4_scripts, file ToFU.sh step 3:sort sam file sofware: terminal sort function input file: hq_isoforms.fastq.sam output file: hq_isoforms.fastq.sorted.sam codes: available in Wu_4_scripts, file ToFU.sh step 4: collapse redundant transcripts into unique ones sofware: ToFU input file: hq_isoforms.fastq, hq_isoforms.fastq.sorted.sam output file: iso.collapsed.gff, iso.collapsed.rep.fq, iso.collapsed.group.txt codes: available in Wu_4_scripts, file ToFU.sh step 5:align collased sequence to reference genome sofware: GMAP input file: iso.collapsed.rep.fq output file: iso.collapsed.rep.fq.sam codes: available in Wu_4_scripts, file ToFU.sh step 6:sort sam file sofware: terminal sort function input file: iso.collapsed.rep.fq.sam output file: iso.collapsed.rep.fq.sorted.sam codes: available in Wu_4_scripts, file ToFU.sh step 7:annotate the sam file software:ToFU input file: iso.collapsed.rep.fq.sorted.sam output file: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt codes: available in Wu_4_scripts, file ToFU.sh step 8: parse the annotation file software: ToFU input: iso.collapsed.rep.fq iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt output: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt codes: available in Wu_4_scripts, file ToFU.sh step 9: statistical summaries software: R input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt, hq_isoforms.fastaq output: Gene_1_isoform.png, Gene_over1_isoform.png, Genenumbers.png, isoformnumbers.png.(and all read length distribution charts) codes: barchart.R, stats_and_table.R step 10: shared genes count software: R input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt output: Table 2 to Table 5 in my thesis. codes: common_genes.R step 11: gene differential expression analysis software: R(EdgeR) input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt output: Table 14 to Table 21 in my thesis codes: differential expression.R, differential_express.R step 12: gene ontology analysis software: R(clusterProfiler) input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt output: Figure 6 to Figure 21 in my thesis codes: GO_analysis.R, GO_test.R. step 13: novel genes software: R input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt output: Figure 22 in my thesis codes: alternative splicing.R step 14: IGV visualization software: IGV, Python input:genepos.txt output: folder visualization in Wu_3_figures codes: composeIGV.py additionally: TAPIS debug sofware: Python input: hq_isoforms.fastq.sam, gencode.v19.chr_patch_hapl_scaff.fasta output: cmds.txt, alignPacBio.py, cleanAlignments.py, run_tapis.py codes:tapis.sh