Introduction to biobase

Bulker comes with a core manifest called biobase that includes more than 50 common bioinformatics tools. Biobase is useful for everyday interactive analysis, basic pipelines, and also as a starting point for more complicated manifests. In this tutorial we'll show how to activate biobase and use it for some interactive RNA-seq analysis, and to run a pipeline. I assume you've already gone through the install and configure instructions.

Loading the biobase crate

Let's load the biobase crate, which is available on the bulker registry. First I'm initializing an empty bulker config for this tutorial.

export BULKERCFG="biobase_example/bulker_config.yaml"
rm $BULKERCFG
bulker init -c $BULKERCFG

Guessing container engine is docker.
Wrote new configuration file: biobase_example/bulker_config.yaml

bulker load biobase -f $HOME/code/hub.bulker.io/bulker/biobase.yaml

Bulker config: /home/nsheff/code/bulker/docs_jupyter/biobase_example/bulker_config.yaml
Importing crate 'bulker/alpine:default' from '/home/nsheff/bulker_crates/bulker/alpine/default'.
Importing crate 'bulker/coreutils:default' from '/home/nsheff/bulker_crates/bulker/coreutils/default'.
Loading manifest: 'bulker/biobase:default'. Activate with 'bulker activate bulker/biobase:default'.
Commands available: aws, ascp, bamtools, bedClip, bedCommonRegions, bedGraphToBigWig, bedIntersect, bedItemOverlapCount, bedops, bedPileUps, bedToBigBed, bedtools, bigWigAverageOverBed, bigWigCat, bigWigSummary, bismark, bissnp, blast, bowtie2, bowtie, bsmap, bwa, cellranger, cufflinks, curl, cutadapt, faSplit, fastq-dump, fasterq-dump, vdb-config, fastqc, gatk, gt, hisat2, homer, kallisto, khmer, liftOver, macs2, mashmap, picard, pigz, prefetch, R, repeatmasker, rg, Rscript, sambamba, salmon, samtools, segway, seqkit, seqtk, skewer, samblaster, STAR, tabix, trim_galore, trimmomatic, vep, wigToBigWig

You can see this crate offers many common bioinformatics tools, like samtools and bowtie2. You can see this crate in your list of available crates:

bulker list

Available crates:
bulker/biobase:default -- /home/nsheff/bulker_crates/bulker/biobase/default

Before activating the crate, these commands are not available because they are not installed natively on this sytem:

samtools --version


Command 'samtools' not found, but can be installed with:

sudo apt install samtools

kallisto


Command 'kallisto' not found, but can be installed with:

sudo apt install kallisto

So, we activate the crate. In a shell you just type:

bulker activate biobase

Due to a limitation with jupyter, because bulker spawns a new shell by default, you have to things a bit differently...there 2 ways to use bulker with a jupyter note book:

You can simply activate the crate before starting the notebook (the easier way).
You can use this eval trick below to activate the crate in the existing jupyter shell.

Either way should have the same affect as typing bulker activate biobase outside of jupyter.

eval "$(bulker activate -e -p biobase)"

Bulker config: /home/nsheff/pCloudSync/env/bulker_config/zither.yaml
Activating bulker crate: biobase

It's worth emphasizing: that single line of code is all you need to do to "install" dozens of common bioinformatics tools. Now, in just a few minutes you've given yourself the ability to create a portable computing environment instantly, on demand.

Now we can inspect what commands are made available by this crate:

bulker inspect

Bulker config: /home/nsheff/code/bulker/docs_jupyter/biobase_example/bulker_config.yaml
Bulker manifest: bulker/biobase
Crate path: /home/nsheff/bulker_crates/bulker/biobase/default
Available commands: ['cellranger', 'fasterq-dump', 'cksum', 'hisat2', 'paste', 'chmod', 'aws', 'picard', 'arch', 'tail', 'unexpand', 'liftOver', 'cutadapt', 'expr', 'rm', 'STAR', 'base32', 'runcon', 'timeout', 'gt', 'samtools', 'nohup', 'rmdir', 'kallisto', 'sleep', 'vdir', 'bowtie2', 'pigz', 'samblaster', 'echo', 'base64', 'bwa', 'which', 'bash', 'shuf', 'homer', 'trimmomatic', 'salmon', 'stty', 'fastqc', 'cp', 'chgrp', 'ptx', 'blast', 'mkfifo', 'mashmap', 'seq', 'du', 'bigWigCat', 'logname', 'factor', 'vep', 'dircolors', 'bedCommonRegions', 'pwd', 'bigWigAverageOverBed', 'expand', 'sh', 'mkdir', 'whoami', 'bedops', 'R', 'bedPileUps', 'tsort', 'users', 'prefetch', 'fmt', 'truncate', 'csplit', 'repeatmasker', 'bissnp', 'gatk', 'uptime', 'tee', 'touch', 'false', 'od', 'mv', 'numfmt', 'bismark', 'seqtk', 'fastq-dump', 'b2sum', 'bamtools', 'groups', 'id', 'docker', 'seqkit', 'basename', 'cat', 'cufflinks', 'bedtools', 'nproc', 'unlink', 'md5sum', 'link', 'cut', 'sambamba', 'bowtie', 'pr', 'shred', 'bedItemOverlapCount', 'bsmap', 'bedGraphToBigWig', 'yes', 'lesspipe', 'dir', 'sha512sum', 'skewer', 'tr', 'split', 'hostname', 'macs2', 'Rscript', 'sha256sum', 'chown', 'tty', 'uname', 'pinky', 'ascp', 'vdb-config', 'grep', 'true', 'bedClip', 'wc', 'faSplit', 'printf', 'tabix', 'install', 'bigWigSummary', 'stdbuf', 'ln', 'bedIntersect', 'khmer', 'trim_galore', 'mknod', 'df', 'comm', 'sha224sum', 'realpath', 'dd', 'pathchk', 'uniq', 'nl', 'test', 'curl', 'ls', 'env', 'sha384sum', 'tac', 'segway', 'fold', 'bedToBigBed', 'who', 'wigToBigWig', 'hostid', 'printenv', 'sha1sum', 'join', 'nice', 'stat', 'date', 'awk', 'sed', 'sum', 'sync', 'rg', 'dirname', 'head', 'sort']

Any of these commands can be run as if they are native, but they will actually be running in containers.

samtools --version

samtools 1.10
Using htslib 1.10.2
Copyright (C) 2019 Genome Research Ltd.

kallisto

kallisto 0.46.2

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index 
    quant         Runs the quantification algorithm 
    bus           Generate BUS files for single-cell data 
    pseudo        Runs the pseudoalignment step 
    merge         Merges several batch runs 
    h5dump        Converts HDF5-formatted results to plaintext
    inspect       Inspects and gives information about an index
    version       Prints version information
    cite          Prints citation information

Running kallisto <CMD> without arguments prints usage information for <CMD>

cutadapt --version

Running an analysis

Now that we've proven we can run each of these commands, let's put them all together and run a whole pipeline. Since the environment we've just activated has all those commands available as if they were installed natively, all we have to do is run a bunch of commands in succession and they will automatically run in individual containers.

Let's grab some reads from SRA:

wget  https://sra-download.ncbi.nlm.nih.gov/traces/sra10/SRR/011497/SRR11773889

--2020-07-10 14:36:20--  https://sra-download.ncbi.nlm.nih.gov/traces/sra10/SRR/011497/SRR11773889
Resolving sra-download.ncbi.nlm.nih.gov (sra-download.ncbi.nlm.nih.gov)... 130.14.250.25, 130.14.250.24, 165.112.9.231
Connecting to sra-download.ncbi.nlm.nih.gov (sra-download.ncbi.nlm.nih.gov)|130.14.250.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15095418 (14M) [application/octet-stream]
Saving to: ‘SRR11773889’

SRR11773889         100%[===================>]  14.40M  49.3KB/s    in 2m 31s  

2020-07-10 14:38:53 (97.4 KB/s) - ‘SRR11773889’ saved [15095418/15095418]

fasterq-dump -L info SRR11773889 -O raw_reads

spots read      : 742,462
reads read      : 742,462
reads written   : 742,462

Now we want to try to align these. We'll use refgenie to easily pull some bowtie2 indexes. Refgenie is included in the biobase.

export REFGENIE="biobase_example/refgenie.yaml"
refgenie init -c $REFGENIE -s http://staging.refgenomes.databio.org
refgenie pull t7/bowtie2_index

Requirement already satisfied: refgenie in /home/nsheff/.local/lib/python3.5/site-packages (0.7.0.dev0)
Requirement already satisfied: ubiquerg>=0.4.8 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.4.10.dev0)
Requirement already satisfied: yacman>=0.6.1 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.6.2)
Requirement already satisfied: piper>=0.12.1 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.12.1)
Requirement already satisfied: logmuse>=0.2 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.2.4)
Requirement already satisfied: refgenconf>=0.4.0 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.5.0.dev0)
Requirement already satisfied: attmap>=0.12.5 in /home/nsheff/.local/lib/python3.5/site-packages (from refgenie) (0.12.9)
Requirement already satisfied: oyaml in /home/nsheff/.local/lib/python3.5/site-packages (from yacman>=0.6.1->refgenie) (0.9)
Requirement already satisfied: pyyaml>=3.13 in /home/nsheff/.local/lib/python3.5/site-packages (from yacman>=0.6.1->refgenie) (5.1)
Requirement already satisfied: pandas in /home/nsheff/.local/lib/python3.5/site-packages (from piper>=0.12.1->refgenie) (0.24.2)
Requirement already satisfied: psutil in /home/nsheff/.local/lib/python3.5/site-packages (from piper>=0.12.1->refgenie) (5.6.1)
Requirement already satisfied: future in /home/nsheff/.local/lib/python3.5/site-packages (from refgenconf>=0.4.0->refgenie) (0.17.1)
Requirement already satisfied: requests in /home/nsheff/.local/lib/python3.5/site-packages (from refgenconf>=0.4.0->refgenie) (2.21.0)
Requirement already satisfied: tqdm in /home/nsheff/.local/lib/python3.5/site-packages (from refgenconf>=0.4.0->refgenie) (4.32.1)
Requirement already satisfied: numpy>=1.12.0 in /home/nsheff/.local/lib/python3.5/site-packages (from pandas->piper>=0.12.1->refgenie) (1.16.2)
Requirement already satisfied: pytz>=2011k in /home/nsheff/.local/lib/python3.5/site-packages (from pandas->piper>=0.12.1->refgenie) (2018.9)
Requirement already satisfied: python-dateutil>=2.5.0 in /home/nsheff/.local/lib/python3.5/site-packages (from pandas->piper>=0.12.1->refgenie) (2.8.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/nsheff/.local/lib/python3.5/site-packages (from requests->refgenconf>=0.4.0->refgenie) (2019.3.9)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/nsheff/.local/lib/python3.5/site-packages (from requests->refgenconf>=0.4.0->refgenie) (1.24.3)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/nsheff/.local/lib/python3.5/site-packages (from requests->refgenconf>=0.4.0->refgenie) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /home/nsheff/.local/lib/python3.5/site-packages (from requests->refgenconf>=0.4.0->refgenie) (2.8)
Requirement already satisfied: six>=1.5 in /home/nsheff/.local/lib/python3.5/site-packages (from python-dateutil>=2.5.0->pandas->piper>=0.12.1->refgenie) (1.12.0)
WARNING: You are using pip version 19.2.3, however version 19.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Initializing refgenie genome configuration
Wrote new refgenie genome configuration file: peppro_example/refgenie.yaml
'hs38d1/fasta:default' archive size: 1.7MB
Downloading URL: http://staging.refgenomes.databio.org/v2/asset/hs38d1/fasta/archive
hs38d1/fasta:default: 1.74MB [00:00, 28.8MB/s]
Download complete: /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/fasta__default.tgz
Extracting asset tarball and saving to: /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/fasta/default
Default tag for 'hs38d1/fasta' set to: default
'hs38d1/bowtie2_index:default' archive size: 8.8MB
Downloading URL: http://staging.refgenomes.databio.org/v2/asset/hs38d1/bowtie2_index/archive
hs38d1/bowtie2_index:default: 9.22MB [00:00, 21.1MB/s]                          
Download complete: /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/bowtie2_index__default.tgz
Extracting asset tarball and saving to: /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/bowtie2_index/default
Default tag for 'hs38d1/bowtie2_index' set to: default

bowtie2 -x `refgenie seek rCRSd/bowtie2_index` raw_reads/SRR11773889.fastq -S aligned.sam

742462 reads; of these:
  742462 (100.00%) were unpaired; of these:
    742461 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    1 (0.00%) aligned >1 times
0.00% overall alignment rate

bowtie2 -x `refgenie seek t7/bowtie2_index` raw_reads/SRR11773889.fastq -S aligned.sam

742462 reads; of these:
  742462 (100.00%) were unpaired; of these:
    49655 (6.69%) aligned 0 times
    689848 (92.91%) aligned exactly 1 time
    2959 (0.40%) aligned >1 times
93.31% overall alignment rate

This pipeline also requires reference genome assets that are managed by refgenie. If you don't already have an initialized refgenie config, you can easily initialize one like this:

Now execute the example code to run it.

./peppro_example/peppro/pipelines/peppro.py \
  --sample-name test \
  --genome hs38d1 \
  --input peppro_example/peppro/examples/data/test_r1.fq.gz \
  --single-or-paired single \
  -O peppro_example/output/

### Pipeline run code and environment:

*              Command:  `./peppro_example/peppro/pipelines/peppro.py --sample-name test --genome hs38d1 --input peppro_example/peppro/examples/data/test_r1.fq.gz --single-or-paired single -O peppro_example/output/`
*         Compute host:  puma
*          Working dir:  /home/nsheff/code/bulker/docs_jupyter
*            Outfolder:  /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/
*  Pipeline started at:   (10-21 11:29:03) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  2.7.12
*          Pypiper dir:  `/home/nsheff/.local/lib/python2.7/site-packages/pypiper`
*      Pypiper version:  0.12.0
*         Pipeline dir:  `/home/nsheff/code/bulker/docs_jupyter/peppro_example/peppro/pipelines`
*     Pipeline version:  0.8.1
*        Pipeline hash:  05095a2cc78a2e210916f215dd0828940b894a6f
*      Pipeline branch:  * dev
*        Pipeline date:  2019-10-21 11:21:23 -0400

### Arguments passed to pipeline:

*           `TSS_name`:  `None`
*            `adapter`:  `fastp`
*          `anno_name`:  `None`
*         `complexity`:  `True`
*        `config_file`:  `peppro.yaml`
*              `cores`:  `1`
*           `coverage`:  `False`
*              `dedup`:  `seqkit`
*              `dirty`:  `False`
*  `ensembl_gene_body`:  `None`
*        `ensembl_tss`:  `None`
*          `exon_name`:  `None`
*       `force_follow`:  `False`
*    `genome_assembly`:  `hs38d1`
*              `input`:  `['peppro_example/peppro/examples/data/test_r1.fq.gz']`
*             `input2`:  `None`
*        `intron_name`:  `None`
*               `keep`:  `False`
*             `logdev`:  `False`
*            `max_len`:  `30`
*                `mem`:  `4000`
*          `new_start`:  `False`
*            `no_fifo`:  `False`
*      `output_parent`:  `peppro_example/output/`
*         `paired_end`:  `False`
*              `parts`:  `4`
*           `pre_name`:  `None`
*      `prealignments`:  `[]`
*           `protocol`:  `pro`
*            `recover`:  `False`
*        `sample_name`:  `test`
*              `scale`:  `False`
*             `silent`:  `False`
*   `single_or_paired`:  `single`
*                `sob`:  `False`
*           `testmode`:  `False`
*            `trimmer`:  `seqtk`
*          `umi_fastp`:  `False`
*            `umi_len`:  `0`
*          `verbosity`:  `None`

----------------------------------------

Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments:
  Assets not existing: refgene_tss (--TSS-name), ensembl_tss (--pi-tss), ensembl_gene_body (--pi-body), refgene_pre_mRNA (--pre-name), feat_annotation (--anno-name), refgene_exon (--exon-name), refgene_intron (--intron-name)
Local input file: peppro_example/peppro/examples/data/test_r1.fq.gz

> `File_mb` 1.02    PEPPRO  _RES_

> `Read_type`   single  PEPPRO  _RES_

> `Genome`  hs38d1  PEPPRO  _RES_
Detected PRO input

### Merge/link and fastq conversion:  (10-21 11:29:03) elapsed: 0.0 _TIME_

Number of input file sets: 1
Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/raw/test.fastq.gz`  

> `ln -sf /home/nsheff/code/bulker/docs_jupyter/peppro_example/peppro/examples/data/test_r1.fq.gz /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/raw/test.fastq.gz` (20905)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.0GB.  
  PID: 20905;   Command: ln;    Return code: 0; Memory used: 0.0GB

Local input file: '/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/raw/test.fastq.gz'
Found .fastq.gz file
Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1.fastq`  

> `gzip -f -d -c /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/raw/test.fastq.gz > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1.fastq` (20907)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.002GB.  
  PID: 20907;   Command: gzip;  Return code: 0; Memory used: 0.002GB


> `Raw_reads`   12500   PEPPRO  _RES_

> `Fastq_reads` 12500   PEPPRO  _RES_
['/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/raw/test.fastq.gz']

### FASTQ processing:  (10-21 11:29:03) elapsed: 0.0 _TIME_

You set adapter arg to 'fastp' but you must select 'cutadapt' for single end data. Overriding.

> `cutadapt --version`
Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1_processed.fastq`  

> `( cutadapt -j 1 -m 18 -a TGGAATTCTCGGGTGCCAAGG /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1.fastq | seqtk trimfq -b 0 -L 30 - | seqtk seq -r - > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1_processed.fastq ) 2> /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/cutadapt/test_cutadapt.txt` (20985)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.057GB.  
  PID: 20985;   Command: ;  Return code: 0; Memory used: 0.057GB

Evaluating read trimming

> `Trimmed_reads`   11880   PEPPRO  _RES_

> `Trim_loss_rate`  4.96    PEPPRO  _RES_

### Plot adapter insertion distribution (10-21 11:29:06) elapsed: 2.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/cutadapt/test_adapter_insertion_distribution.pdf`  

> `Rscript ./peppro_example/peppro/tools/PEPPRO.R cutadapt -i /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/cutadapt/test_cutadapt.txt -o /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/cutadapt` (21191)
<pre>
Error: Install the "PEPPROr" R package before proceeding.
i.e. devtools::install_github("databio/peppro", subdir="PEPPROr")
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.057GB.  
  PID: 21191;   Command: Rscript;   Return code: 0; Memory used: 0.019GB

> `Adapter insertion distribution`  cutadapt/test_adapter_insertion_distribution.pdf    Adapter insertion distribution  cutadapt/test_adapter_insertion_distribution.png    PEPPRO  _OBJ_

### Prealignments (10-21 11:29:07) elapsed: 1.0 _TIME_

You may use `--prealignments` to align to references before the genome alignment step. See docs.

### Map to genome (10-21 11:29:07) elapsed: 0.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam`  

> `bowtie2 -p 1 --very-sensitive -X 2000 --rg-id test -x /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/bowtie2_index/default/hs38d1 -U /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/fastq/test_R1_processed.fastq | samtools view -bS - -@ 1  | samtools sort - -@ 1 -T /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/tmpjLFtWP -o /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam` (21283,21287,21313)
<pre>
11880 reads; of these:
  11880 (100.00%) were unpaired; of these:
    10476 (88.18%) aligned 0 times
    372 (3.13%) aligned exactly 1 time
    1032 (8.69%) aligned >1 times
11.82% overall alignment rate
</pre>
Command completed. Elapsed time: 0:00:02. Running peak memory: 0.057GB.  
  PID: 21287;   Command: samtools;  Return code: 0; Memory used: 0.018GB  
  PID: 21283;   Command: bowtie2;   Return code: 0; Memory used: 0.018GB  
  PID: 21313;   Command: samtools;  Return code: 0; Memory used: 0.018GB


> `samtools view -q 10 -b -@ 1 -U /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_fail_qc.bam /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam` (21527)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.057GB.  
  PID: 21527;   Command: samtools;  Return code: 0; Memory used: 0.018GB


> `samtools depth /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam | awk '{counter++;sum+=$3}END{print sum/counter}'`

> `Mapped_reads`    1404    PEPPRO  _RES_

> `QC_filtered_reads`   989.0   PEPPRO  _RES_

> `Aligned_reads`   415 PEPPRO  _RES_

> `Alignment_rate`  3.49    PEPPRO  _RES_

> `Total_efficiency`    3.32    PEPPRO  _RES_

> `Read_depth`  1.26    PEPPRO  _RES_

### Compress all unmapped read files (10-21 11:29:11) elapsed: 4.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam.bai`  

> `samtools index /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam` (21800)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.057GB.  
  PID: 21800;   Command: samtools;  Return code: 0; Memory used: 0.019GB


> `samtools idxstats /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam | grep -we 'chrM' -we 'chrMT' -we 'M' -we 'MT' -we 'rCRSd' -we 'rCRSd_3k'| cut -f 3`

> `samtools stats /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam | grep '^SN' | cut -f 2- | grep 'maximum length:' | cut -f 2-`

### Calculate NRF, PBC1, and PBC2 (10-21 11:29:13) elapsed: 2.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam.bai`  

> `samtools index /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam` (22028)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.057GB.  
  PID: 22028;   Command: samtools;  Return code: 0; Memory used: 0.019GB

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/test_bamQC.tsv`  

> `./peppro_example/peppro/tools/bamQC.py --silent -i /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam -c 1 -o /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/test_bamQC.tsv --silent` (22096)
<pre>
</pre>
Command completed. Elapsed time: 0:00:03. Running peak memory: 0.068GB.  
  PID: 22096;   Command: ./peppro_example/peppro/tools/bamQC.py;    Return code: 0; Memory used: 0.068GB


> `awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "NRF") c=i } getline; print $c }' /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/test_bamQC.tsv`

> `awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "PBC1") c=i } getline; print $c }' /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/test_bamQC.tsv`

> `awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "PBC2") c=i } getline; print $c }' /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/test_bamQC.tsv`

> `NRF` 0.89    PEPPRO  _RES_

> `PBC1`    0.95    PEPPRO  _RES_

> `PBC2`    30.75   PEPPRO  _RES_
Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_unmap.bam`  

> `samtools view -b -@ 1 -f 4  /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_unmap.bam` (22108)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 22108;   Command: samtools;  Return code: 0; Memory used: 0.019GB


> `samtools view -c -f 4 -@ 1 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_temp.bam`

> `Unmapped_reads`  10476.0 PEPPRO  _RES_

### Split BAM by strand (10-21 11:29:18) elapsed: 5.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam`,`/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam`  

> `samtools view -bh -F 20 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam` (22257)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 22257;   Command: samtools;  Return code: 0; Memory used: 0.019GB


> `samtools view -bh -f 16 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam` (22322)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 22322;   Command: samtools;  Return code: 0; Memory used: 0.019GB

Skipping TSS -- TSS enrichment requires TSS annotation file: refgene_tss
Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/chr_order.txt`  

> `samtools view -H /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_sort.bam | grep 'SN:' | awk -F':' '{print $2,$3}' | awk -F' ' -v OFS=' ' '{print $1,$3}' > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/chr_order.txt` (22386,22391,22395,22403)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 22386;   Command: samtools;  Return code: 0; Memory used: 0.018GB  
  PID: 22395;   Command: awk;   Return code: 0; Memory used: 0.004GB  
  PID: 22391;   Command: grep;  Return code: 0; Memory used: 0.001GB  
  PID: 22403;   Command: awk;   Return code: 0; Memory used: 0.004GB

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/chr_keep.txt`  

> `cut -f 1 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/chr_order.txt > /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/QC_hs38d1/chr_keep.txt` (22455)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.068GB.  
  PID: 22455;   Command: cut;   Return code: 0; Memory used: 0.0GB

Skipping PI -- Pause index requires 'TSS' and 'gene body' annotation files: ensembl_tss and ensembl_gene_body
Skipping FRiP -- Fraction of reads in pre-mRNA requires pre-mRNA annotation file: refgene_pre_mRNA

### Calculate fraction of reads in features (FRiF) (10-21 11:29:20) elapsed: 2.0 _TIME_


### Plot FRiF (10-21 11:29:20) elapsed: 0.0 _TIME_


> `samtools view -@ 1 -q 10 -c -F4 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam`

> `samtools view -@ 1 -q 10 -c -F4 /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam`

### Produce bigWig files (10-21 11:29:21) elapsed: 1.0 _TIME_

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_plus_body_0-mer.bw`  

> `samtools index /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam` (22609)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 22609;   Command: samtools;  Return code: 0; Memory used: 0.02GB


> `./peppro_example/peppro/tools/bamSitesToWig.py -i /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam -c /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/fasta/default/hs38d1.chrom.sizes -o /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_plus_body_0-mer.bw -p 1 --variable-step --tail-edge` (22674)
<pre>
Registering input file: '/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_plus.bam'
Temporary files will be stored in: 'tmp_test_plus_cuttrace_qU7o0L'
Processing with 1 cores...
stdin is empty of data
Reduce step (merge files)...
Merging 121 files into output file: '/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_plus_body_0-mer.bw'
Couldn't open tmp_test_plus_cuttrace_qU7o0L/JTFH01000350.1.txt.bw
</pre>
Command completed. Elapsed time: 0:01:19. Running peak memory: 0.068GB.  
  PID: 22674;   Command: ./peppro_example/peppro/tools/bamSitesToWig.py;    Return code: 0; Memory used: 0.057GB

Target to produce: `/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_minus_body_0-mer.bw`  

> `samtools index /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam` (31803)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.068GB.  
  PID: 31803;   Command: samtools;  Return code: 0; Memory used: 0.019GB


> `./peppro_example/peppro/tools/bamSitesToWig.py -i /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam -c /home/nsheff/code/bulker/docs_jupyter/peppro_example/hs38d1/fasta/default/hs38d1.chrom.sizes -o /home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_minus_body_0-mer.bw -p 1 --variable-step --tail-edge` (31872)
<pre>
Registering input file: '/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/aligned_hs38d1/test_minus.bam'
Temporary files will be stored in: 'tmp_test_minus_cuttrace_ljeMBw'
Processing with 1 cores...
Reduce step (merge files)...
Merging 124 files into output file: '/home/nsheff/code/bulker/docs_jupyter/peppro_example/output/test/signal_hs38d1/test_minus_body_0-mer.bw'
</pre>
Command completed. Elapsed time: 0:01:23. Running peak memory: 0.068GB.  
  PID: 31872;   Command: ./peppro_example/peppro/tools/bamSitesToWig.py;    Return code: 0; Memory used: 0.057GB

Starting cleanup: 5 files; 4 conditional files for cleanup

Cleaning up flagged intermediate files. . .

Cleaning up conditional list. . .

### Pipeline completed. Epilogue
*        Elapsed time (this run):  0:03:01
*  Total elapsed time (all runs):  0:02:59
*         Peak memory (this run):  0.0675 GB
*        Pipeline completed time: 2019-10-21 11:32:04

The pipeline has now completed successfully and we can explore the results:

tree peppro_example/output/

peppro_example/output/
└── test
    ├── aligned_hs38d1
    │   ├── test_fail_qc.bam
    │   ├── test_minus.bam
    │   ├── test_minus.bam.bai
    │   ├── test_plus.bam
    │   ├── test_plus.bam.bai
    │   ├── test_sort.bam
    │   ├── test_sort.bam.bai
    │   └── test_unmap.bam
    ├── cutadapt
    │   └── test_cutadapt.txt
    ├── fastq
    ├── fastqc
    ├── objects.tsv
    ├── PEPPRO_cleanup.sh
    ├── PEPPRO_commands.sh
    ├── PEPPRO_completed.flag
    ├── PEPPRO_log.md
    ├── PEPPRO_profile.tsv
    ├── QC_hs38d1
    │   └── test_bamQC.tsv
    ├── raw
    │   └── test.fastq.gz -> /home/nsheff/code/bulker/docs_jupyter/peppro_example/peppro/examples/data/test_r1.fq.gz
    ├── signal_hs38d1
    │   └── test_minus_body_0-mer.bw
    └── stats.tsv

8 directories, 19 files

We've successfully run a complete pipeline without having to install any of the software that runs the commands in the workflow. We're also able to interactively explore the environment that ran the workflow.

Conclusion

That's basically it. If you're a workflow developer, all you need to do is write your own manifest and distribute it with your workflow; in 3 lines of code, users will be able to run your workflow using modular containers, using the container engine of their choice.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search