RNNR is a bioinformatics task processing system for distributed computing environments. It was designed to work with a workflow management systems, such as Cromwell and Nextflow, through the Task Execution Service (TES) API. The workflow manager submits tasks to RNNR that distributes the processing load among computational nodes connected in a local network. The system is composed of the Main instance and one or more Worker instances. A distributed file system (NFS for example) is required.
Source code is available at https://github.com/labbcb/rnnr
Run RNNR main server. Requires MongoDB server. The main server and worker instances do not manage any workflow or task files.
rnnr main
RNNR main server endpoint is http://localhost:8080/v1/tasks.
Run RNNR worker server. Requires Docker server.
rnnr worker
Add worker node.
rnnr add localhost
Cromwell utilizes TES backend to submit jobs. For more information see https://cromwell.readthedocs.io/en/stable/backends/TES/. Get latest Cromwell at https://github.com/broadinstitute/cromwell/releases.
java -Dconfig.file=examples/cromwell.conf -jar cromwell-48.jar run examples/hello.wdl
Cromwell server is also supported and it is the recommended mode in production settings.
java -Dconfig.file=examples/cromwell.conf -jar cromwell-48.jar server
Cromwell server endpoint is http://localhost:8000
Nextflow is a domain specific language and workflow execution system. This system supports GA4GH TES API. The following code runs a very simple workflow using Nextflow and RNNR.
export NXF_WORK=/home/nfs/tmp/nextflow-executions
export NXF_MODE=ga4gh
export NXF_EXECUTOR=tes
export NXF_EXECUTOR_TES_ENDPOINT='http://localhost:8080'
nextflow run examples/tutorial.nf
Cromwell and RNNR server instances can be deployed using Docker and Docker Swarm. In this session we show how we use these services at BCBLab (https://bcblab.org).
At BCBLab we have 5 computing server and 1 NFS server in a local network.
NFS server (hostname nfs
) has /home/nfs
directory configured as NFS).
All computing server mounts NFS directory using same path (/home/nfs
).
This is very important for Cromwell to create hard links and generate task command.
All computing servers have Docker installed.
One compute server runs Cromwell and RNNR main server instances including databases (hostname main
).
This server will not execute tasks because Cromwell instance uses lot of memory when executing complex workflows.
The other 4 compute servers run RNNR worker server instances (hostname worker1
, worker2
, worker3
, worker4
).
First we have to deploy RNNR main server.
docker network create rnnr
docker volume create rnnr-data
docker container run \
--rm \
--detach \
--name rnnr-db \
--network rnnr \
--volume rnnr-data:/data/db \
mongo:4
docker container run \
--detach \
--name rnnr \
--publish 8080:8080 \
--network rnnr \
welliton/rnnr:latest \
main --database mongodb://rnnr-db:27017
It creates
rnnr
network for communication between RNNR and MongoDB; andrnnr-data
volume to store database files. It startsrnnr-db
container mounting the previous volume; andrnnr
container exposing port8080
.
At this point we need a Cromwell configuration file.
See examples/cromwell-docker.yml for example.
It set Cromwell workflow logs to /home/nfs/tmp/cromwell-workflow-logs
and workflow root to /home/nfs/tmp/cromwell-executions
.
Also set URL of MySQL to jdbc:mysql://cromwell-db/cromwell?rewriteBatchedStatements=true
, where cromwell-db
is the name of MySQL container.
The actor factory cromwell.backend.impl.tes.TesBackendLifecycleActorFactory
tells Cromwell to use TES backend.
Endpoint http://main:8080/v1/tasks
is the RNNR main endpoint, where main
is the hostname of the server that runs RNNR main.
We copy this file to /etc/crowmell.conf
.
Next deploy Cromwell in server mode.
docker network create cromwell
docker volume create cromwell-data
docker container run \
--rm \
--detach \
--name cromwell-db \
--network cromwell \
--volume cromwell-data:/var/lib/mysql \
-e MYSQL_DATABASE=cromwell \
-e MYSQL_ROOT_PASSWORD=secret \
mysql:5.7
docker run \
--name cromwell \
--detach \
--network cromwell \
--publish 8000:8000 \
--volume /etc/cromwell.conf:/application.conf \
--volume /home/nfs:/home/nfs \
-e JAVA_OPTS=-Dconfig.file=/application.conf \
broadinstitute/cromwell:48 server
Create
cromwell
network andcromwell-data
volume. Createcromwell-db
container settingMYSQL_DATABASE
andMYSQL_ROOT_PASSWORD
according to/etc/cromwell.conf
file. Createcromwell
container exposing port8000
; mounting/etc/cromwell.conf
file as/application.conf
inside container; and mounting/home/nfs
NFS directory as same path inside container. We have tested Cromwell release version 48.
You may notice that these service don’t require Docker at all. Cromwell delegates tasks to RNNR main which remotely runs them at active worker nodes. However, deploying these services as Docker container simplifies system management.
Now, at each computing node, start a RNNR worker instance. Docker is a requirement. We don’t want to start task-related containers inside RNNR worker container (will not work anyway). To solve this issue we have to mount Docker socket.
docker container run \
--detach \
--name rnnr \
--publish 50051:50051 \
--volume /var/run/docker.sock:/var/run/docker.sock \
welliton/rnnr:latest \
worker
Create
rnnr
container exposing port50051
and mounting Docker socket (/var/run/docker.sock
).
Finally, we add the worker nodes to main server. For each server, we set -2 CPU cores less memory as maximum computing resources. Since RNNR is not aware of external process, this avoids any over consumption.
rnnr enable --host main worker1 --cpu 14 --ram 180
rnnr enable --host main worker2 --cpu 14 --ram 130
rnnr enable --host main worker3 --cpu 48 --ram 120
rnnr enable --host main worker4 --cpu 48 --ram 70
Done. Cromwell will be available to run submitted workflows.
java -jar cromwell-48.jar submit --host http://main:8000 examples/hello.wdl
Export all tasks as JSON.
rnnr ls --all --format json > rnnr_tasks.json
Export worker nodes as JSON.
rnnr nodes --format json > rnnr_nodes.json
Direct dependencies
Generate Go code from ProtoBuffer file
export GO111MODULE=on
go get google.golang.org/protobuf/cmd/protoc-gen-go \
google.golang.org/grpc/cmd/protoc-gen-go-grpc
protoc --go_out=. \
--go_opt=paths=source_relative \
--go-grpc_out=. \
--go-grpc_opt=paths=source_relative \
proto/worker.proto
Build Docker image and publish to Docker Hub
docker build -t welliton/rnnr:latest .
docker push welliton/rnnr:latest
Start MongoDB inside container exposing 27017 TCP port
docker container run --rm --publish 27017:27017 mongo:4
Canonical error codes are used to differentiate gRPC network communication error from other errors.
Unavailable (14) always return a NetworkError
.