Metagenome-Assembled Genomes
You can find this application in the demos
folder of your Jupyter notebook environment.
- samplesheet.csv
- mag_workflow.ipynb
This tutorial demonstrates that Nextflow Engine
can handle nf-core/mag pipeline.
The first step is to import the nextflow package:
from camber import nextflow
Here’s an example of how to setup configurations and execute a job:
pipeline="nf-core/mag"
: specify pipeline to run.engine_size="XXSMALL"
: indicate engine size to perform the job.num_engines=8
: indicate number of engines to run workflow tasks in parallel.
Pipeline parameters must be defined in params
argument. To ensure the pipeline works as expected, please take note that:
"--input": "./samplesheet.csv"
: the relative path ofsamplesheet.csv
file to the current notebook. In case of using local FastQ files, the locations of them insamplesheet.csv
file content are relative also."--outdir": "/camber_outputs"
: the location stores output data of the job.
# Declare URLs to download necessary files
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"
nf_mag_job = nextflow.create_job(
pipeline="nf-core/mag",
engine_size="XXSMALL",
num_engines=8,
params={
"--input": "samplesheet.csv",
"--outdir": "/camber_outputs",
"-r": "3.4.0",
"--kraken2_db": kraken2_db,
"--centrifuge_db": centrifuge_db,
"--busco_db": busco_db,
"--skip_krona": "true",
"--skip_gtdbtk": "true",
"--skip_maxbin2": "true",
},
)
This step is to check job status:
nf_mag_job.status
To monitor job exectution, you can show job logs in real-time by read_logs
method:
nf_mag_job.read_logs()
When the job is done, you can discover and download the results and logs of the job by two ways:
- Browser data directly in notebook environment:
- Go to the Stash UI: