MulCyber

Pyrocleaner

De PyroCleaner Wiki.

(Redirigé depuis Main Page)

The pyrocleaner is intended to clean the reads included in the sff file in order to ease the assembly process. It enables filtering sequences on different criteria such as length, complexity, number of undetermined bases which has been proven to correlate with pour quality and multiple copy reads. It also enables to clean paired-ends sff files and generates on one side a sff with the validated paired-ends and on the other the sequences which can be used as shotgun reads. To install the Pyrocleaner, please refere to the Installation guide.

Announcing

  • pyrocleaner_galaxy_tool for pyrocleaner version 1.3 - provides all files to run the pyrocleaner within the galaxy workflow manager. Also available from the Galaxy Tool Shed.
  • pyrocleaner_ergatis_component for pyrocleaner version 1.3 - provides all files to run the pyrocleaner within the ergatis workflow manager.
  • pyrocleaner v1.3 - Use Seqio instead of Bio.SeqIO as input/outpout sequences library to spead up the process and delete the Biopython dependance.
  • pyrocleaner v1.2 - Fix a bug when using --clean-pairends.
  • pyrocleaner v1.1 - Add --aggressive option to keep only 1 read per cluster, add --clean-quality cleaning option to clean reads based on basepairs quality.
  • pyrocleaner v1.0 - Here is the first packaged version of the tool.

Usage

Usage: pyrocleaner -i file -o output -f format --clean-pairends --clean-length-std --clean-ns --clean-duplicated-reads --clean-complexity-win --clean-quality

Options:

 --version             show program's version number and exit
 -h, --help            show this help message and exit
 Input files options:
   -i FILE, --in=FILE  The file to clean, can be [sff|fastq|fasta]
   -q FILE, --qual=FILE
                       The quality file to use if input file is fasta
   -f FORMAT, --format=FORMAT
                       The format of the input file [sff|fastq|fasta] default
                       is sff
 Output files options:
   -o OUTPUT, --out=OUTPUT
                       The output folder where to store results
   -g FILE, --log=FILE
                       The log file name (default:pyrocleaner.log)
   -z, --split-pairends
                       Write splited pairends sequences into a fasta file
 Cleaning options:
   -p, --clean-pairends
                       Clean pairends
   -l, --clean-length-std
                       Filter short reads shorter than mean less x*standard
                       deviation and long reads longer than mean plus
                       x*standard deviation
   -w, --clean-length-win
                       Filter reads with a legnth in between [x:y]
   -n, --clean-ns      Filter reads with too much N
   -d, --clean-duplicated-reads
                       Filter duplicated reads
   -c, --clean-complexity-win
                       Filter low complexity reads computed on a sliding
                       window
   -u, --clean-complexity-full
                       Filter low complexity reads computed on the whole
                       sequence
   -k, --clean-quality
                       Filter low quality reads
 Processing options:
   -a NB_CPUS, --acpus=NB_CPUS
                       Number of processors to use
   -r RECURSION, --recursion=RECURSION
                       Recursion limit when computing duplicated reads
 Cleaning parameters:
   -b BORDER_LIMIT, --border-limit=BORDER_LIMIT
                       Minimal length between the spacer and the read
                       extremity (used with --clean-pairends option,
                       default=70)
   -m, --aggressive    Filter all duplication reads gathered in a cluster to
                       keep one (used with --clean-duplicated-reads,
                       default=False)
   -e MISSMATCH, --missmatch=MISSMATCH
                       Limit of missmatch nucleotide (used with --clean-
                       pairends option, default=10)
   -j STD, --std=STD   Number standard deviation to use (used with --clean-
                       length-std option, default=2)
   -1 MIN, --min=MIN   Minimal length (used with --clean-length-win option,
                       default=200)
   -2 MAX, --max=MAX   Maximal length (used with --clean-length-win option,
                       default=600)
   -3 QUALITY_THRESHOLD, --quality-threshold=QUALITY_THRESHOLD
                       At least one base pair has to be equal or higher than
                       this value (used with --clean-quality, default=35)
   -s NS_PERCENT, --ns_percent=NS_PERCENT
                       Percentage of N to use to filter reads (used with
                       --clean-ns option, default=4)
   -t DUPLICATION_LIMIT, --duplication_limit=DUPLICATION_LIMIT
                       Limit size difference (used with --clean-duplicated-
                       reads, default=70)
   -v WINDOW, --window=WINDOW
                       The window size (used with --clean-complexity-win,
                       default=100)
   -x STEP, --step=STEP
                       The window step (used with --clean-complexity-win,
                       default=5)
   -y COMPLEXITY, --complexity=COMPLEXITY
                       Minimal complexity/length ratio (used with --clean-
                       complexity-win and --clean-complexity-full,
                       default=40)

How to cite

Mariette J, Noirot C, Klopp C. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool. BMC Research Notes 2011, 4:149.

Powered By FusionForge