Variant Calling

You can find this application in the demos folder of your Jupyter notebook environment.

    • sample.csv
    • sarek_workflow.ipynb
  • This tutorial demonstrates how to run a variant calling workflow using Camber, which simplifies configuring and executing genomics pipelines at scale. Variant calling is a key step in analyzing whole genome and targeted sequencing data, identifying germline and somatic mutations that can be important for understanding genetic diseases, cancer, and other biological questions. In this example, we use the nf-core/sarek pipeline, a standard workflow for variant detection that supports tumor/normal comparisons and joint variant calling.

    The first step is to import the camber package:

    import camber

    Here’s an example of how to configure and execute a job:

    • pipeline="nf-core/sarek": specifies the pipeline to run.
    • engine_size="MICRO": indicate engine size to perform the job.
    • num_engines=4: indicate number of engines to run workflow tasks in parallel when possible.

    Pipeline parameters must be defined in params argument. To ensure the pipeline works as expected, please take note that:

    • "--input": "./samplesheet.csv": the relative path of samplesheet.csv file to the current notebook. In case of using local fastq files, the locations in samplesheet.csv file content are relative.
    • "--outdir": "/camber_outputs": the location stores output data of the job.
    • "--tools": "freebayes": specifies the tool that will be used to perform variant calling
    nf_sarek_job = camber.nextflow.create_job(
        pipeline="nf-core/sarek",
        engine_size="MICRO",
        num_engines=4,
        params={
            "--input": "./samplesheet.csv",
            "--outdir": "/camber_outputs",
            "-r": "3.5.1",
            "--tools": "freebayes",
        },
    )

    This step is to check job status:

    nf_sarek_job.status

    To monitor job exectution, you can show job logs in real-time by read_logs method:

    nf_sarek_job.read_logs()

    When the job is done, you can discover and download the results and logs of the job by two ways:

    1. View data directly in notebook environment by visiting the jobs/{JOB_ID} directory in the root of your notebook container:

    image

    1. Go to the Stash UI and visit the jobs/{JOB_ID} directory:

    image

    The resulting VCF files from variant calling are available in the variant_calling directory and can be downloaded or further analyzed directly within this notebook.