RDXplorer

Downloads & Links

Download Software
Download README File

Support, Questions, Suggestions

Send email to makarovv at gmail.com

RDXplorer at a glance

What is RDXplorer

The RDXplorer (Read Depth eXplorer) is a computational tool for copy number variants (CNV) detection in whole human genome sequence data using read depth (RD) coverage. CNV detection is based on the Event-Wise Testing (EWT) algorithm recently published by our group (see Publications). The read depth coverage is estimated in non-overlapping intervals (100bp Windows) across an individual genome based on the pileup generated by SAMTools.

Availability

Source code, supporting files and user manual are freely available for download for academic and non-profit use.

Release

Beta Release, Version 3.2, Latest update has been uploaded on May 24, 2011.

What is new in the latest update

As per multiple user's requests, it is now possible to select specified chromosome for analysis

Operating System

LINUX/UNIX/MAC OSX

Environment

Standalone machine or HPCC. Possible run modes:
  1. As a standalone program by runing the shell script provided (run.sh). In this mode one BAM file at a time is analyzed.
  2. As an API, integrated to your own python code to analyse multiple BAM files. Use "rdxplorer_api.py" python script as a starting point to include RDXplorer to your python application.
  3. Submitted to High Performance Computer Cluster. Writing submission shell will be reqired based on your HPCC specifications. Due to multiplicity of possibilities (SGE, PBS, etc.), writing of submission shell is left to the local developer.

Input

RDXplorer accepts the BAM files. Current version accepth HG18 and HG19 builds. The user specifies this at runtime. Default is HG19. Please note!
  • The BAM files must be sorted.
  • Duplicates must be either removed or marked using Picard's MarkDuplicates tool or Samtools's rmdup tool.
  • When run, RDXplorer assumes that duplicates are marked with the "0x400" flag and makes attempts to remove them, while building pileup. The original BAM file is not changed or moved.
  • BAM file may contain one or more chromosomes. Accepted chromosomes are 1-22, X, Y.
  • USER MUST PROVIDE THE SAME REFERENCE FASTA FILE WHICH WAS USED FOR ALIGNMENT. OTHERWISE, THE PROGRAM WILL NOT WORK.

Output

For each chromosome found in the BAM file, the following output is generated:
  1. chrN.ewt - Event-wise testing (EWT) states
  2. chrN.gcc - GC corrected depth of coverage
  3. chrN.nzd - normalized depth of coverage
  4. chrN.sum - summary report (contains all variants)
  5. chrN.filt.sum - filted summary report (default 10 Win/1000bp)
  6. chrN.pdf - plot of variants
The summary report has the following columns:
  1. segStart - segment start by 100bp-windows
  2. segEnd - segment end by 100bp-windows
  3. state - 1 for deletion, 2 for normal, 3 for duplication
  4. length - length of segment by 100bp-windows
  5. copyEst - estimated copy number, that is a rounded median for the segment
  6. segMedian - median normalized read-depth of the segment
  7. zstat - Z-score of the segment compared to the chromosome
  8. chrom - chromosome number (1-22, X,Y)
  9. posStart - genomic position start
  10. posEnd - genomic position end

Prerequisites

  • Download and install the latest SAMTools from
    http://sourceforge.net/projects/samtools/files/samtools/

The current version is Python/R hybrid and relies on the following technologies:

  • Python 2.6 - please make sure it is not an earlier version (such as 2.5, 2.4, etc).
  • NumPy and ScyPy - numeric libraries for python
  • Rpy2 - Interface between R and Python
    http://rpy.sourceforge.net/rpy2_download.html
    Works best with R-2.11 or later
  • R-2.12 recommended, (R-2.11 is a min) - Please get the latest version of R. RDXplorer does not rely on any special R libraries and standard installation should do.

Installation

  1. Extract the content of "rdxplorer.tgz" to any directory at your machine.
  2. cd to "rdxplorer" directory and change permissions to 755 for all *.py files and run.sh file
  3. Perform one time configuration as described below

One Time Configuration Before You Use

Before you use, please edit 2 lines at "globals.py" according to local specifications. You must specify full path to SAMTools (even it is on your path). This makes your life much easier if you decide to submit it to HPCC to process multiple BAM files at a time.

To Run

To run open "run.sh" and change parameters according to your bam file location. This shell is very minimalistic, keeping in mind that evryone will be using RDXplorer differently (see ENVIRONMENT). It can be easily adopted to be called programmatically from your own python program or shell script.

To see the list of accepthed arguments and their types, please see the README file or issue "python rdxplorer.py" (no arguments) command

Performance and Memory Requirements

4 GB of RAM is a recommended minimum. Adding more memory will improve performance.

Publications

Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009 Sep;19(9):1586-92. Epub 2009 Aug 5. PubMed PMID: 19657104; PubMed Central PMCID: PMC2752127.