camber.nextflow
The camber.nextflow
handles workloads that uses the Nextflow framework and is deployed on Kubernetes as execution platform. It is a subclass of the CamberEngine
that encapsulates the Python interface for Nextflow CLI.
Basic Usage
from camber import nextflow
nf_hello_job = nextflow.create_job(
pipeline="nextflow-io/hello",
engine_size="XSMALL",
num_engines=4
)
# More nextflow workflow below
Methods
create_job
Creates a job to run a given pipeline using the NextflowEngine
. Please note that if the pipeline requires the output directory, it should be placed in /camber_outputs
. So when a job finishes, the output data will appear in the private stash jobs/<job_id>/outputs
.
Args
pipeline
: str- Supply a remote nextflow pipeline or a local one in the private stash. For remote pipelines, you can discover them more in nf-core pipelines.
engine_size
: str- The size of the engine. One of
XMICRO
,MICRO
,XXSMALL
,XSMALL
,SMALL
,MEDIUM
, orLARGE
. - Default is
XSMALL
. num_engines
: Optional[int]- The number of engines will handle multiple tasks in a parallel manner.
- Default is
1
params
: Optional[Dict]- Define parameters for the pipeline. In case the pipeline requires passing input files to the
params
, the path of input files should be relative to the current working directory. - Example:
{"--input": "./samplesheet.csv"}
Returns
CamberJob
- An instance of the
CamberJob
class representing the created job.
Examples
Example 1: Basic “Hello World”
This example demonstrates running a simple Nextflow pipeline that writes a greeting to a file.
1. Create your Nextflow pipeline:
Store this file in your Camber Stash (e.g., at ./my_nextflow_pipelines/hello.nf
).
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
params.greeting = "Hello from Camber Nextflow!"
// Camber will provide params.outdir, which defaults to /camber_outputs in the job
params.outdir = "./results" // This default may be overridden by Camber.
process GREET {
// Publish to a subdirectory within the Camber-managed output directory
publishDir "${params.outdir}/greeting_output", mode: 'copy'
output:
path "greeting.txt"
script:
"""
echo "${params.greeting}" > greeting.txt
"""
}
workflow {
GREET()
}
2. Python script to run the pipeline:
import camber.nextflow
# Path to your Nextflow script in Camber Stash
pipeline_path = "./my_nextflow_pipelines/hello.nf" # Adjust if needed
# Custom parameters for the Nextflow pipeline
job_params = {
"--greeting": "'My Custom Hello via Camber!'"
}
hello_job = camber.nextflow.create_job(
pipeline=pipeline_path,
engine_size="XSMALL",
params=job_params
)
print(f"Submitted Nextflow job ID: {hello_job.job_id}")
# View logs directly
hello_job.read_logs(tail_lines=50)
Example 2: End-to-End Nextflow: nf-core/sarek
with Camber Stash
This tutorial demonstrates how to run the nf-core/sarek
Nextflow pipeline on Camber. The pipeline will use a samplesheet uploaded to your Camber Stash to process genomic data. Output files will also be written back to a Camber-managed output directory, typically accessible via Stash.
1. Prepare and Upload Your Samplesheet to Camber Stash
The nf-core/sarek
pipeline requires a samplesheet to define the input samples and their data.
1.1 Download the Samplesheet (samplesheet.csv
):
The samplesheet for nf-core/sarek
typically has columns like patient
, sample
, lane
, fastq_1
, fastq_2
.
You can download an example template or a pre-filled example of samplesheet.csv here: Download samplesheet.csv
1.2 Upload to Stash:
- Open your Camber Stash in your web browser.
- Click the upload button.
- In the upload screen that appears, select the
samplesheet.csv
file from your local machine. - Start the upload and wait for confirmation that the upload was successful.
Your samplesheet.csv
should now be in your Stash.
2. Create and Run the Nextflow Job via Jupyter Notebook
Now, go to your Camber Hub and open or create a new Jupyter Notebook. You’ll use the Camber SDK to define and launch the nf-core/sarek
pipeline.
import camber.nextflow
nf_sarek_job = camber.nextflow.create_job(
pipeline="nf-core/sarek",
engine_size="MICRO",
num_engines=4,
params={
"--input": "./samplesheet.csv", # Adjust if needed
"--outdir": "/camber_outputs",
"-r": "3.5.1",
"--tools": "freebayes",
},
)
print(f"Submitted Nextflow job ID: {nf_sarek_job.job_id}")
# View logs directly
nf_sarek_job.read_logs(tail_lines=50)