Variant Calling

You can find this application in the demos folder of your Jupyter notebook environment.

    • sample.csv
    • sarek_workflow.ipynb
  • This tutorial demonstrates how to run a variant calling workflow using Camber, which simplifies configuring and executing genomics pipelines at scale. Variant calling is a key step in analyzing whole genome and targeted sequencing data, identifying germline and somatic mutations that can be important for understanding genetic diseases, cancer, and other biological questions. In this example, we use the nf-core/sarek pipeline, a standard workflow for variant detection that supports tumor/normal comparisons and joint variant calling.

    The first step is to import the camber package:

    import camber

    Here’s an example of how to configure and execute a job:

    • command: The full Nextflow command to run the nf-core/sarek pipeline.

      • --input: "./samplesheet.csv": the relative path of samplesheet.csv file to the current notebook. In case of using local fastq files, the locations in samplesheet.csv file content are relative.

      • --outdir: "./outputs": the location stores output data of the job.

      • --tools: "freebayes": specifies the tool that will be used to perform variant calling

    • engine_size="MICRO": indicate engine size to perform the job.

    • num_engines=4: indicate number of engines to run workflow tasks in parallel when possible.

    command = "nextflow run nf-core/sarek \
        --input ./samplesheet.csv \
        --outdir ./outputs \
        --tools freebayes \
        -r 3.5.1"
    nf_sarek_job = camber.nextflow.create_job(
        command=command,
        engine_size="SMALL",
        num_engines=4,
    )

    This step is to check job status:

    nf_sarek_job.status

    To monitor job exectution, you can show job logs in real-time by read_logs method:

    nf_sarek_job.read_logs()

    When the job is done, you can discover and download the results of the job by two ways:

    1. View data directly in notebook environment by visiting the --outdir directory in the root of your notebook container:

    image

    1. Go to the Stash UI and visit the --outdir directory:

    image

    The resulting VCF files from variant calling are available in the variant_calling directory and can be downloaded or further analyzed directly within this notebook.

    Note: Please note that the files and folders saved in the demos directory are temporary and will be reset after each JupyterHub session. We recommend changing the value of --outdir to a different location if you wish to store your data permanently.