Metagenomic Profiling
You can find this application in the demos
folder of your Jupyter notebook environment.
- samplesheet.csv
- mag_workflow.ipynb
Metagenomic analysis enables the study of microbial communities, providing insights into their diversity and roles in various environments. The nf-core/mag pipeline offers a powerful, reproducible approach to metagenomic assembly and profiling. By using the Nextflow Engine on Camber, researchers can easily run and manage complex workflows, ensuring efficient analysis and scalability.
The first step is to import the nextflow package:
from camber import nextflow
Here’s an example of how to setup configurations and execute a job:
pipeline="nf-core/mag"
: specify pipeline to run.engine_size="XXSMALL"
: indicate engine size to perform the job.num_engines=8
: indicate number of engines to run workflow tasks in parallel.
Pipeline parameters must be defined in params
argument. To ensure the pipeline works as expected, please take note that:
"--input": "./samplesheet.csv"
: the relative path ofsamplesheet.csv
file to the current notebook. In case of using local FastQ files, the locations of them insamplesheet.csv
file content are relative also."--outdir": "/camber_outputs"
: the location stores output data of the job.
# Declare URLs to download necessary files
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"
nf_mag_job = nextflow.create_job(
pipeline="nf-core/mag",
engine_size="XXSMALL",
num_engines=8,
params={
"--input": "samplesheet.csv",
"--outdir": "/camber_outputs",
"-r": "3.4.0",
"--kraken2_db": kraken2_db,
"--centrifuge_db": centrifuge_db,
"--busco_db": busco_db,
"--skip_krona": "true",
"--skip_gtdbtk": "true",
"--skip_maxbin2": "true",
},
)
This step is to check job status:
nf_mag_job.status
To monitor job exectution, you can show job logs in real-time by read_logs
method:
nf_mag_job.read_logs()
When the job is done, you can discover and download the results and logs of the job by two ways:
- Browser data directly in notebook environment:
- Go to the Stash UI:
This tutorial demonstrates how Camber simplifies running the nf-core/mag
pipeline. You can try it with your own metagenomic data, easily setting up the pipeline, monitoring job status, and retrieving results. With Camber’s cloud infrastructure, you can scale your analysis effortlessly and focus on deriving insights from your data.