Variant Calling
You can find this application in the demos
folder of your Jupyter notebook environment.
- sample.csv
- sarek_workflow.ipynb
This tutorial demonstrates how to run a variant calling workflow using Camber, which simplifies configuring and executing genomics pipelines at scale. Variant calling is a key step in analyzing whole genome and targeted sequencing data, identifying germline and somatic mutations that can be important for understanding genetic diseases, cancer, and other biological questions. In this example, we use the nf-core/sarek pipeline, a standard workflow for variant detection that supports tumor/normal comparisons and joint variant calling.
The first step is to import the camber package:
import camber
Here’s an example of how to configure and execute a job:
pipeline="nf-core/sarek"
: specifies the pipeline to run.engine_size="MICRO"
: indicate engine size to perform the job.num_engines=4
: indicate number of engines to run workflow tasks in parallel when possible.
Pipeline parameters must be defined in params
argument. To ensure the pipeline works as expected, please take note that:
"--input": "./samplesheet.csv"
: the relative path ofsamplesheet.csv
file to the current notebook. In case of using local fastq files, the locations insamplesheet.csv
file content are relative."--outdir": "/camber_outputs"
: the location stores output data of the job."--tools": "freebayes"
: specifies the tool that will be used to perform variant calling
nf_sarek_job = camber.nextflow.create_job(
pipeline="nf-core/sarek",
engine_size="MICRO",
num_engines=4,
params={
"--input": "./samplesheet.csv",
"--outdir": "/camber_outputs",
"-r": "3.5.1",
"--tools": "freebayes",
},
)
This step is to check job status:
nf_sarek_job.status
To monitor job exectution, you can show job logs in real-time by read_logs
method:
nf_sarek_job.read_logs()
When the job is done, you can discover and download the results and logs of the job by two ways:
- View data directly in notebook environment by visiting the
jobs/{JOB_ID}
directory in the root of your notebook container:
- Go to the Stash UI and visit the
jobs/{JOB_ID}
directory:
The resulting VCF files from variant calling are available in the variant_calling
directory and can be downloaded or further analyzed directly within this notebook.