####### this file indicates the workflow of my analysis #######

step 1: build genome index of hg19
		sofware: GMAP
		input file: gencode.v19.chr_patch_hapl_scaff.fasta
		output file: hg19_genecode.(this is a folder contains the genome index)
		codes: available in Wu_4_scripts, file ToFU.sh

step 2: align fasta sequence to reference genome
		sofware: GMAP
		input file: hq_isoforms.fastq
		output file: hq_isoforms.fastq.sam
		codes: available in Wu_4_scripts, file ToFU.sh		

step 3:sort sam file
		sofware: terminal sort function
		input file: hq_isoforms.fastq.sam
		output file: hq_isoforms.fastq.sorted.sam
		codes: available in Wu_4_scripts, file ToFU.sh

step 4: collapse redundant transcripts into unique ones
		sofware: ToFU
		input file: hq_isoforms.fastq, hq_isoforms.fastq.sorted.sam
		output file: iso.collapsed.gff, iso.collapsed.rep.fq, iso.collapsed.group.txt
		codes: available in Wu_4_scripts, file ToFU.sh

step 5:align collased sequence to reference genome
		sofware: GMAP
		input file: iso.collapsed.rep.fq
		output file: iso.collapsed.rep.fq.sam
		codes: available in Wu_4_scripts, file ToFU.sh

step 6:sort sam file
		sofware: terminal sort function
		input file: iso.collapsed.rep.fq.sam
		output file: iso.collapsed.rep.fq.sorted.sam
		codes: available in Wu_4_scripts, file ToFU.sh

step 7:annotate the sam file
		software:ToFU
		input file: iso.collapsed.rep.fq.sorted.sam
		output file: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt
		codes: available in Wu_4_scripts, file ToFU.sh

step 8: parse the annotation file
		software: ToFU
		input: iso.collapsed.rep.fq iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt
		output: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt
		codes: available in Wu_4_scripts, file ToFU.sh

step 9: statistical summaries
		software: R
		input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt, hq_isoforms.fastaq
		output: Gene_1_isoform.png, Gene_over1_isoform.png, Genenumbers.png, isoformnumbers.png.(and all read length distribution charts)
		codes: barchart.R, stats_and_table.R

step 10: shared genes count
		software: R
		input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt.parsed.txt
		output: Table 2 to Table 5 in my thesis.
		codes: common_genes.R

step 11: gene differential expression analysis
		software: R(EdgeR)
		input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt
		output: Table 14 to Table 21 in my thesis
		codes: differential expression.R, differential_express.R

step 12: gene ontology analysis
		software: R(clusterProfiler)
		input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt
		output: Figure 6 to Figure 21 in my thesis
		codes: GO_analysis.R, GO_test.R.

step 13: novel genes
		software: R
		input: iso.collapsed.rep.fq.sorted.sam.matchAnnot.txt
		output: Figure 22 in my thesis
		codes: alternative splicing.R

step 14: IGV visualization
		software: IGV, Python
		input:genepos.txt
		output: folder visualization in Wu_3_figures
		codes: composeIGV.py


additionally: TAPIS debug
		sofware: Python
		input: hq_isoforms.fastq.sam, gencode.v19.chr_patch_hapl_scaff.fasta
		output: cmds.txt, alignPacBio.py, cleanAlignments.py, run_tapis.py
		codes:tapis.sh