Metagenomic Profiling

You can find this application in the demos folder of your Jupyter notebook environment.

samplesheet.csv
mag_workflow.ipynb

Metagenomic analysis enables the study of microbial communities, providing insights into their diversity and roles in various environments. The nf-core/mag pipeline offers a powerful, reproducible approach to metagenomic assembly and profiling. By using the Nextflow Engine on Camber, researchers can easily run and manage complex workflows, ensuring efficient analysis and scalability.

The first step is to import the nextflow package:

from camber import nextflow

Here’s an example of how to setup configurations and execute a job:

pipeline="nf-core/mag": specify pipeline to run.
engine_size="XXSMALL": indicate engine size to perform the job.
num_engines=8: indicate number of engines to run workflow tasks in parallel.

Pipeline parameters must be defined in params argument. To ensure the pipeline works as expected, please take note that:

"--input": "./samplesheet.csv": the relative path of samplesheet.csv file to the current notebook. In case of using local FastQ files, the locations of them in samplesheet.csv file content are relative also.
"--outdir": "/camber_outputs": the location stores output data of the job.

# Declare URLs to download necessary files
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"

nf_mag_job = nextflow.create_job(
    pipeline="nf-core/mag",
    engine_size="XXSMALL",
    num_engines=8,
    params={
        "--input": "samplesheet.csv",
        "--outdir": "/camber_outputs",
        "-r": "3.4.0",
        "--kraken2_db": kraken2_db,
        "--centrifuge_db": centrifuge_db,
        "--busco_db": busco_db,
        "--skip_krona": "true",
        "--skip_gtdbtk": "true",
        "--skip_maxbin2": "true",
    },
)

This step is to check job status:

nf_mag_job.status

To monitor job exectution, you can show job logs in real-time by read_logs method:

nf_mag_job.read_logs()

When the job is done, you can discover and download the results and logs of the job by two ways:

Browser data directly in notebook environment:

Go to the Stash UI:

This tutorial demonstrates how Camber simplifies running the nf-core/mag pipeline. You can try it with your own metagenomic data, easily setting up the pipeline, monitoring job status, and retrieving results. With Camber’s cloud infrastructure, you can scale your analysis effortlessly and focus on deriving insights from your data.