MulCyber

Accueil

De DRAP Wiki.

Sommaire

Data processing upstream DRAP

Read Normalization

In silico normalization using the PERL script normalize_by_kmer_coverage.pl coming with the Trinity package.

Command line example for paired reads:

/usr/local/bioinfo/src/trinityrnaseq/current/util/normalize_by_kmer_coverage.pl --left F_Dr_1_ATCACG_L008_R1.fastq.gz --right F_Dr_1_ATCACG_L008_R2.fastq.gz --seqType fq --JM 128G --max_cov 30 --pairs_together --PARALLEL_STATS --JELLY_CPU 8 --output F_Dr_1_norm

The --output argument is important if several processes are launched from the same directory.

Job submission example:

qsub -N normalize -pe parallel_smp 8 -R y -l mem=8G,h_vmem=16G normalize.sh

Data processing downstream DRAP

Corset

Hierarchically clustering of the transcripts based on the proportion of shared reads. Need to produce bam files with all locations for each reads (bowtie2 --all or STAR).

See the corset project page.

Command line example for bowtie2:

bash -c 'fastq_interleave_paired_files.py -1 F_Dr_1_ATCACG_L008_R1.fastq.gz -2 F_Dr_1_ATCACG_L008_R2.fastq.gz | fastq_illumina_filter -N | fastq_interleaved_to_tabbed.pl | bowtie2 -x Zebrafish.fa --12 - --end-to-end --very-fast -a -p 4 2> F_Dr_1_to_Zebrafish.fa.bam.log | samtools view -bS - | samtools sort -m 2G - F_Dr_1_to_Zebrafish.fa'

Job submission example:

qarray -N bowtie2 -pe parallel_fill 4 -o err_log -e err_log -l mem=4G,h_vmem=16G bowtie.sh

Job submission example for corset:

set bam = `\ls -1 *.bam`
set name = `echo $bam | sed -e 's/_to\S*//g' | tr ' ' ,`

qsub -N corset -q hypermemq -l mem=32G,h_vmem=512G -b y /save/sigenae/src/corset-0.94/corset -n $name $bam

# contigs extraction after clustering
get_longest_orf.pl -f Zebrafish.fa -stat | cut -f1,4 | sort -k1,1 > Zebrafish.longest_orf_length.tsv
cat clusters.txt | sort -k1,1 > clusters.sort.txt
join -1 1 -2 1 clusters.sort.txt Zebrafish.longest_orf_length.tsv | tr " " "\t" | interchange_cols.pl 1 2 | sort -k1,1 | perl -e 'map{chomp;@t=split("\t",$_);if($t[0]ne$c){print join("\t",$c,$r,$lo)."\n";($c,$r,$lo)=@t}else{($c,$r,$lo)=@t if($t[2]>$lo)}}<STDIN>;print"$c\t$r\t$lo,$lc\n"' | cut -f2 > Zebrafish_corset.lst
cat Zebrafish.fa | fasta_extract.pl Zebrafish_corset.lst > Zebrafish_corset.fa

Powered By FusionForge