Get started

The command-line usage

After installing iGV_snapshot_maker, run this command in the terminal to verify the installation:

igv_snapshot_maker -h
usage: IGV_snapshot_maker [-h] [-o output directory] [-e Extend +/- N bp]
                        [-g genome] [--igv IGV_CMD] [-m IGV memory MB] -i
                        Input file [-n] [-b Target OS[Mac/Win]
                        original_prefix new_prefix] [-c config YAML file]

IGV_snapshot_maker.py v0.1.0-dev: Genenerate IGV snapshots

optional arguments:
-h, --help            show this help message and exit
-o output directory, --output output directory
                        Output directory for snapshots
-e Extend +/- N bp, --extend Extend +/- N bp
                        Extend N (N=100 by default) base pairs in two
                        directions in IGV window
-g genome             Name of the reference genome, Defaults to hg19
--igv IGV_CMD         The command to run IGV (at CCAD)
-m IGV memory (MB), --mem IGV memory (MB)
                        Amount of memory to allocate to IGV, in Megabytes (MB)
-i Input file, --input Input file
                        Input file in YAML format
-n, --norun           Do not run the batch script
-b Target OS[Mac/Win] original_prefix new_prefix, --binding Target OS[Mac/Win] original_prefix new_prefix
                        Replace the original path prefix with new path prefix
                        after binding at the target OS.
-c config YAML file, --config config YAML file
                        IGV setting in YAML format

Prepare the YAML input file

The only required input file is to specify the bam files and regions of interest to take IGV snapshots. The information is defined as a list of entries in the YAML format. In the each entry (i.e., an IGV session), there are several attributes specified:

Attributes of each entry specified in the YAML input file

Attribute

Type

Description

Examples

name

string

unique identifer for the session name (and folder)

cdRCC_1929_03_T01

bam_files

list of strings

absolute file paths to the bam files

  • /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0421.bam

  • /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0401.bam

snapshots

list of items

Each item contains 5 attributes of the regions of interest: name, chr, start, stop, and ext.

  • chr: ‘1’

  • ext: 200

  • name: cdRCC_1929_03_T01_INTER_SV00035_BP1

  • start: 104423883

  • stop: 104423984

An example of the YAML input file:

---
-
    bam_files:
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0421.bam
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0401.bam
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0403.bam
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0402.bam
    name: cdRCC_1929_03_T01
    snapshots:
        -
            chr: '1'
            ext: 200
            name: cdRCC_1929_03_T01_INTER_SV00035_BP1
            start: 104423883
            stop: 104423984
        -
            chr: '8'
            ext: 200
            name: cdRCC_1929_03_T01_INTER_SV00035_BP2
            start: 33776273
            stop: 33776374
-
    bam_files:
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK7006_2000.bam
        - /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK4017_0401.bam
    name: pRCC1_1654_01_T01
    snapshots:
        -
            chr: '6'
            ext: 200
            name: pRCC1_1654_01_T01_INTRA_SV00060_BP1
            start: 136376293
            stop: 136376293

There are YAML libraries for the common programming languages, like, PERL, Python, and R. So, it is easy for the users with the programming skill to generate a YAML input to specify the regions of interest. We may also provide additional helper scripts to convert from other input files to the YAML input files upon request.

Run igv_snapshot_maker

Users usually prefer to running igv_snapshot_maker at the server, where the bam files can be accessed easily. In that case, IGV and the unix command xvfb-run should be installed at the server, so as to generate the IGV snapshots without a display.

Users may use IGV to interactively review the regions of interest if the snapshot generated by igv_snapshot_maker cannot fully meet the need. As a general solution, it is easier to mount the network drive where the bam files are located rather than to transfer the large bam files from the remote server to the local computer. In the output of igv_snapshot_maker, three different IGV batch scripts are generated:

Three types of IGV batch scripts generated by igv_snapshot_maker

Name

Description

Examples

<SessionName>.bat

IGV batch script to generate all the snapshots for the session at the (remote) server.

cdRCC_1929_03_T01.bat

<SessionName>_ROIs.bat

IGV batch script to list all the regions of interest for the interactively inspection at the (local) desktop/laptop.

cdRCC_1929_03_T01_ROIs.bat

<SnapshotName>.bat

IGV batch script to regenerate the specific snapshot at the (local) desktop/laptop.

cdRCC_1929_03_T01_INTER_SV00035_BP1.bat

Among the different types of the IGV batch script, the bam file locations are different to address the change in the bam location path due to the network drive mounting, for example:

  • On the server side: /data/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0421.bam

  • On the local computer:
    • Mac: /Volumes/DCEG_pRCC_SV/EAGLE_Kidney_BAM/GPK0149_0421.bam

    • Windows: T:\data\DCEG_pRCC_SV\EAGLE_Kidney_BAM\GPK0149_0421.bam

User may specify the binding path using the command-line option -b, followed by three parameters. For instance,

# For Windows machines
igv_snapshot_maker -n -b Win '^/'  'T:\\' --igv "igv -m 20g " -i input.yaml

# For Mac machines
igv_snapshot_maker -n -b Mac '^/data'  '/Volumes' --igv "igv -m 20g " -i input.yaml

# The output from this dry-run (due to -n option)
tree IGV_Snapshots/
IGV_Snapshots/
├── cdRCC_1929_03_T01
│   ├── cdRCC_1929_03_T01.bat
│   ├── cdRCC_1929_03_T01_INTER_SV00035_BP1.bat
│   ├── cdRCC_1929_03_T01_INTER_SV00035_BP2.bat
│   └── cdRCC_1929_03_T01_ROIs.bat
└── pRCC1_1654_01_T01
    ├── pRCC1_1654_01_T01.bat
    ├── pRCC1_1654_01_T01_INTRA_SV00060_BP1.bat
    └── pRCC1_1654_01_T01_ROIs.bat

2 directories, 7 files

Run igv_snapshot_maker at Biowulf

xvfb has been installed at most linux systems, including Biowulf and CCAD.

# Start an interactive session at Biowulf first
sinteractive --cpus-per-task=4 --mem=32g --gres=lscratch:20

# Then load igv in the interactive session
module load igv
igv_snapshot_maker -g hg19 -i pRCC_SV.yaml -o pRCC_mac -c IGV_config.yaml -b Mac '^/data'  '/Volumes' --igv "igv -m 20g "

Example input files and config files are available at github.