Copyright (c) 2011 MSSM FREE FOR ACADEMIC AND NON-COMMERCIAL USERS THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. RDXplorer, Version 2.0 The RDXplorer (Read Depth eXplorer) is a computational tool for copy number variants (CNV) detection in whole human genome sequence data using read depth (RD) coverage. CNV detection is based on the Event-Wise Testing (EWT) algorithm recently published by our group (see Publications). The read depth coverage is estimated in non-overlapping intervals (100bp Windows) across an individual genome based on the pileup generated by SAMTools. Current version accepth HG18 and HG19 builds. The user specifies this at runtime. Default is HG19. OPERATING SYSTEM LINUX/UNIX/MAC OSX ENVIRONMENT Standalone machine or HPCC. Possible run modes: 1. As a standalone program by runing the shell script provided (run.sh). In this mode one BAM file at a time is analyzed. 2. As an API, integrated to your own python code to analyse multiple BAM files. Use "rdxplorer_api.py" python script as a starting point to include RDXplorer to your python application. 3. Submitted to High Performance Computer Cluster. Writing submission shell will be reqired based on your HPCC specifications. Due to multiplicity of possibilities (SGE, PBS, etc.), writing of this kind of shell is left to the local developer. INPUT RDXplorer accepts the BAM files. The BAM files must be sorted. Duplicates must be either removed or marked using Picard/Samtools. When run, RDXplorer assumes that duplicates are marked with the "0x400" flag and makes attempts to remove them, while building pileup. The original BAM file is not changed. BAM file may contain one or more chromosomes. Accepted chromosomes are 1-22, X, Y. USER MUST PROVIDE THE SAME REFERENCE FASTA FILE WHICH WAS USED FOR ALIGNMENT. OTHERWISE, THE PROGRAM WILL NOT WORK. OUTPUT For each chromosome found in the BAM file, the following output is generated: chr.ewt - Event-wise testing (EWT) states chr.gcc - GC corrected depth of coverage chr.nzd - normalized depth of coverage chr.sum - summary report (contains all variants) chr.filt.sum - filted summary report (default 10 Win/1000bp) chr.pdf - plot of variants PREREQUISITES: Download and install the latest SAMTools from http://sourceforge.net/projects/samtools/files/samtools/ The current version is Python/R hybrid and relies on the following technologies: Python 2.6 - please make sure it is not an earlier version (such as 2.5, 2.4, etc). NumPy and ScyPy - numeric libraries for python Rpy2 - Interface between R and Python http://rpy.sourceforge.net/rpy2_download.html Works best with R-2.11 or later HDF5 1.8 (HDF5 1.6 might work, but we developed with 1.8) H5PY - interface between HDF5 and Python http://h5py.alfven.org/ R-2.12 recommended, (R-2.11 is a min) - Please get the latest version of R. RDXplorer does not rely on any special R libraries and standard installation should do. INSTALLATION 1. Extract the content of "rdxplorer.tgz" to any directory at your machine. 2. cd to "rdxplorer" directory and change permissions to 775 for all *.py files and run.sh file 3. Perform one time configuration as described below ONE TIME CONFIGURATION BEFORE YOU USE: Before you use, please edit 2 lines at "globals.py" according to local specifications. You must specify full path to SAMTools (even it is on your path). This makes your life much easier if you decide to submit it to HPCC to process multiple BAM files at a time. TO RUN To run open "run.sh" and change parameters according to your bam file location. This shell is very minimalistic, keeping in mind that evryone will be using RDXplorer differently (see ENVIRONMENT). It can be easily adopted to be called from your own program. MEMORY REQUIREMENT 4 GB of RAM is a recommended minimum. Adding more memory will improve performance. The program accepts the following arguments in that order: path2bam - input BAM file. Type: String. Default: None reference - reference fasta file. Type: String. Default: None wrkgdir - Output directory. Type: String. Default: directory where the bam file is gender - Gender. Type: String. Default: M hg - Human Genome build. Type: String. Default: HG19 winSize - Window size. Type: Integer. Default: 100 baseCopy - Base copy number. Type: Integer. Default: 2 filter - Filter Summary Table. Default: 10 win sizes, 1000 bp for 100bp Win sumWithZero - Sum q0 and q20 depth while calculating windows average. Type: Boolead. Default: True. If "False", only q20 is used debug - Debug mode. Type: Boolean. Default: True( Print output) delete - Delete temporary files. Type: Boolean. Default: True( Delete tmp files) To see the list of accepthed arguments and their types at any time, please issue "python rdxplorer.py" (no arguments) command