De DRAP Wiki.


Data processing upstream DRAP

Read Normalization

In silico normalization using the PERL script coming with the Trinity package.

Command line example for paired reads:

/usr/local/bioinfo/src/trinityrnaseq/current/util/ --left F_Dr_1_ATCACG_L008_R1.fastq.gz --right F_Dr_1_ATCACG_L008_R2.fastq.gz --seqType fq --JM 128G --max_cov 30 --pairs_together --PARALLEL_STATS --JELLY_CPU 8 --output F_Dr_1_norm

The --output argument is important if several processes are launched from the same directory.

Job submission example:

qsub -N normalize -pe parallel_smp 8 -R y -l mem=8G,h_vmem=16G

Data processing downstream DRAP


Hierarchically clustering of the transcripts based on the proportion of shared reads. Need to produce bam files with all locations for each reads (bowtie2 --all or STAR).

See the corset project page.

Command line example for bowtie2:

bash -c ' -1 F_Dr_1_ATCACG_L008_R1.fastq.gz -2 F_Dr_1_ATCACG_L008_R2.fastq.gz | fastq_illumina_filter -N | | bowtie2 -x Zebrafish.fa --12 - --end-to-end --very-fast -a -p 4 2> F_Dr_1_to_Zebrafish.fa.bam.log | samtools view -bS - | samtools sort -m 2G - F_Dr_1_to_Zebrafish.fa'

Job submission example:

qarray -N bowtie2 -pe parallel_fill 4 -o err_log -e err_log -l mem=4G,h_vmem=16G

Job submission example for corset:

set bam = `\ls -1 *.bam`
set name = `echo $bam | sed -e 's/_to\S*//g' | tr ' ' ,`

qsub -N corset -q hypermemq -l mem=32G,h_vmem=512G -b y /save/sigenae/src/corset-0.94/corset -n $name $bam

# contigs extraction after clustering -f Zebrafish.fa -stat | cut -f1,4 | sort -k1,1 > Zebrafish.longest_orf_length.tsv
cat clusters.txt | sort -k1,1 > clusters.sort.txt
join -1 1 -2 1 clusters.sort.txt Zebrafish.longest_orf_length.tsv | tr " " "\t" | 1 2 | sort -k1,1 | perl -e 'map{chomp;@t=split("\t",$_);if($t[0]ne$c){print join("\t",$c,$r,$lo)."\n";($c,$r,$lo)=@t}else{($c,$r,$lo)=@t if($t[2]>$lo)}}<STDIN>;print"$c\t$r\t$lo,$lc\n"' | cut -f2 > Zebrafish_corset.lst
cat Zebrafish.fa | Zebrafish_corset.lst > Zebrafish_corset.fa

Powered By FusionForge