Variant Calling
You can find this application in the demos
folder of your Jupyter notebook environment.
- sample.csv
- sarek_workflow.ipynb
This tutorial demonstrates how to run a variant calling workflow using Camber, which simplifies configuring and executing genomics pipelines at scale. Variant calling is a key step in analyzing whole genome and targeted sequencing data, identifying germline and somatic mutations that can be important for understanding genetic diseases, cancer, and other biological questions. In this example, we use the nf-core/sarek pipeline, a standard workflow for variant detection that supports tumor/normal comparisons and joint variant calling.
The first step is to import the camber package:
import camber
Here’s an example of how to configure and execute a job:
command
: The full Nextflow command to run the nf-core/sarek pipeline.--input
:"./samplesheet.csv"
: the relative path ofsamplesheet.csv
file to the current notebook. In case of using local fastq files, the locations insamplesheet.csv
file content are relative.--outdir
:"./outputs"
: the location stores output data of the job.--tools
:"freebayes"
: specifies the tool that will be used to perform variant calling
engine_size
="MICRO"
: indicate engine size to perform the job.num_engines
=4
: indicate number of engines to run workflow tasks in parallel when possible.
command = "nextflow run nf-core/sarek \
--input ./samplesheet.csv \
--outdir ./outputs \
--tools freebayes \
-r 3.5.1"
nf_sarek_job = camber.nextflow.create_job(
command=command,
engine_size="SMALL",
num_engines=4,
)
This step is to check job status:
nf_sarek_job.status
To monitor job exectution, you can show job logs in real-time by read_logs
method:
nf_sarek_job.read_logs()
When the job is done, you can discover and download the results of the job by two ways:
- View data directly in notebook environment by visiting the
--outdir
directory in the root of your notebook container:
- Go to the Stash UI and visit the
--outdir
directory:
The resulting VCF files from variant calling are available in the variant_calling
directory and can be downloaded or further analyzed directly within this notebook.
Note: Please note that the files and folders saved in the demos
directory are temporary and will be reset after each JupyterHub session. We recommend changing the value of --outdir
to a different location if you wish to store your data permanently.