Accueil
De DRAP Wiki.
Sommaire |
Data processing upstream DRAP
Read Normalization
In silico normalization using the PERL script normalize_by_kmer_coverage.pl coming with the Trinity package.
Command line example for paired reads:
/usr/local/bioinfo/src/trinityrnaseq/current/util/normalize_by_kmer_coverage.pl --left F_Dr_1_ATCACG_L008_R1.fastq.gz --right F_Dr_1_ATCACG_L008_R2.fastq.gz --seqType fq --JM 128G --max_cov 30 --pairs_together --PARALLEL_STATS --JELLY_CPU 8 --output F_Dr_1_norm
The --output argument is important if several processes are launched from the same directory.
Job submission example:
qsub -N normalize -pe parallel_smp 8 -R y -l mem=8G,h_vmem=16G normalize.sh
Data processing downstream DRAP
Corset
Hierarchically clustering of the transcripts based on the proportion of shared reads. Need to produce bam files with all locations for each reads (bowtie2 --all or STAR).
See the corset project page.
Command line example for bowtie2:
bash -c 'fastq_interleave_paired_files.py -1 F_Dr_1_ATCACG_L008_R1.fastq.gz -2 F_Dr_1_ATCACG_L008_R2.fastq.gz | fastq_illumina_filter -N | fastq_interleaved_to_tabbed.pl | bowtie2 -x Zebrafish.fa --12 - --end-to-end --very-fast -a -p 4 2> F_Dr_1_to_Zebrafish.fa.bam.log | samtools view -bS - | samtools sort -m 2G - F_Dr_1_to_Zebrafish.fa'
Job submission example:
qarray -N bowtie2 -pe parallel_fill 4 -o err_log -e err_log -l mem=4G,h_vmem=16G bowtie.sh
Job submission example for corset:
set bam = `\ls -1 *.bam` set name = `echo $bam | sed -e 's/_to\S*//g' | tr ' ' ,` qsub -N corset -q hypermemq -l mem=32G,h_vmem=512G -b y /save/sigenae/src/corset-0.94/corset -n $name $bam # contigs extraction after clustering get_longest_orf.pl -f Zebrafish.fa -stat | cut -f1,4 | sort -k1,1 > Zebrafish.longest_orf_length.tsv cat clusters.txt | sort -k1,1 > clusters.sort.txt join -1 1 -2 1 clusters.sort.txt Zebrafish.longest_orf_length.tsv | tr " " "\t" | interchange_cols.pl 1 2 | sort -k1,1 | perl -e 'map{chomp;@t=split("\t",$_);if($t[0]ne$c){print join("\t",$c,$r,$lo)."\n";($c,$r,$lo)=@t}else{($c,$r,$lo)=@t if($t[2]>$lo)}}<STDIN>;print"$c\t$r\t$lo,$lc\n"' | cut -f2 > Zebrafish_corset.lst cat Zebrafish.fa | fasta_extract.pl Zebrafish_corset.lst > Zebrafish_corset.fa