Enhancers

Pipenv

class snakeboost.PipEnv(root, flags='', packages=None, requirements=None)

Functions to handle the creation of pip virtualenvs for Snakemake rules

Creates a virtualenv in the directory of choice intended for use in Snakemake rules. Both packages and requirements.txt files can be specified, and all will be installed, first requirements.txt (in the order specified), then packages. The virtualenv is stored in a directory under root named according to the hash of the package names and the contents of the requirements.txt files. Thus, multiple virtualenvs can easily be created, but each venv will not be made more than once.

Supports thread-safe installation, so multiple jobs depending on the same venv may be run simultaneously.

Parameters
  • root (Path or str) – The directory in which to place the virtualenv. Intended to be a temporary directory

  • flags (str) – Flags to include on every call of pip install (e.g. custom wheelhouse paths)

  • packages (List[str]) – List of packages to install. Can be any valid pip package identifier (with or without version specification)

  • requirements (List[str]) – List of paths to requirements.txt files

venv

Path to the venv dir

Type

str

bin

Path to the venv bin dir (e.g. venv/bin)

Type

str

python_path

Path of the python executable (e.g. venv/bin/python)

Type

str

property get_venv

Script to check for venv, installing if necessary

This can be embedded at the beginning of a shell script to ensure the existance of the venv.

Typically, this should NOT be used. Prefer the use of make_venv(), or any of the other methods of PipEnv

Returns

Bash script to look for a venv and create one if necessary

Return type

str

make_venv(cmd)

Ensure of existence of venv and run any arbitrary command

Parameters

cmd – Command to run

Returns

Modified shell script

Return type

str

python(cmd)

Ensure existance of venv then run python command

Prepends the path of the python executable to the shell script. This can be used to run a python file (with a fully resolved path) or a python module (using the -m flag).

When using multiple enhancers, this must ALWAYS be the last one before the command.

Parameters

cmd (str) – Command to run

Returns

Modified shell script

Return type

str

script(cmd)

Ensure existance of venv then run python script

This appends the path of the venv /bin directory to the shell script. The very first item in the script should thus be the name of an executable python script installed in the /bin dir.

When using multiple enhancers, this must ALWAYS be the last one before the command.

Parameters

cmd (str) – Command to run

Returns

Modified shell script

Return type

str

Pyscript

class snakeboost.Pyscript(snakefile_dir)

Functions to run python scripts

Runs python scripts similarly to the script directive in Snakemake, but can be used with Snakeboost PipEnvs. Like the script directive, inputs, outputs, params, and any other Snakemake data can be passed to the script.

Pyscript can be combined with any other snakeboost function. It should take the place of the bash script. It can also be combined with Pipenv by wrapping it with the PipEnv.script() function.

Currently, only items serializable as strings can be provided. This includes text, numbers, Paths, etc. Complex objects may be supported in the future.

The data will be provided to the script via SnakemakeArgs.

Example

To preserve named data, such as:

input:
    first="/path/to/first",
    second="/path/to/second"

the names of the data must be provided when calling the script. See the __call__() method for more details.

Parameters
  • snakefile_dir (Path or str) – Path to the snakemake app directory or Snakefile directory. This, combined with the script path provided later, should form a fully resolved path to the script, e.g. snakefile_dir/script_path.py

  • python_path (Path or str) – python executable with which to call the script

__call__(script, *, python_path=None, input=None, output=None, params=None, wildcards=None, resources=None, log=None)

Generate bash command to call python script.

Any data passed to the function will be passed to the script under the appropriate variable names. Data names can also be provided here using the parameters. Each parameter takes a list of variable names associated with the data type. For example, if there are three params: x, y, and z, the params argument here could be set to [“x”, “z”]. This would cause x and z to be passed to the script. These arguments take precedence over data passed through the Pyscript methods.

Any data types not annotated via Pyscript methods or call parameters will be passed to the script as a List.

Parameters
  • script (str) – Path of the script to run. This, when combined with the snakemake_dir provided to Pyscript, should form a fully resolved path to the script.

  • input (List of str) –

  • output (List of str) –

  • params (List of str) –

  • wildcards (List of str) –

  • resources (List of str) –

  • log (List of str) –

Returns

Bash command to be passed to the snakemake shell directive

Return type

str

Raises

FileExistsError – Raised if the specified script does not exist

input(**kwargs)

Set named inputs to the pyscript

Wrap this function around your rule inputs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.input(…)

Returns

Dict of name, value pairs. This should be unpacked using a double

asterisk

Return type

Dict

log(**kwargs)

Set named logs to the pyscript

Wrap this function around your rule logs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.log(…)

Returns

Dict of name, value pairs. This should be unpacked using a double

asterisk

Return type

Dict

output(**kwargs)

Set named outputs to the pyscript

Wrap this function around your rule outputs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.output(…)

Returns

Dict of name, value pairs. This should be unpacked using a double

asterisk

Return type

Dict

params(**kwargs)

Set named params to the pyscript

Wrap this function around your rule params. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.params(…)

Returns

Dict of name, value pairs. This should be unpacked using a double

asterisk

Return type

Dict

resources(**kwargs)

Set named resources to the pyscript

Wrap this function around your rule resources. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.resources(…)

Returns

Dict of name, value pairs. This should be unpacked using a double asterisk

Return type

Dict

wildcards(**kwargs)

Set named wildcards to the pyscript

Wrap this function around your rule wildcards. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.wildcards(…)

Returns

Dict of name, value pairs. This should be unpacked using a double

asterisk

Return type

Dict

Tar

class snakeboost.Tar(root, inputs=None, outputs=None, modify=None, clear_mounts=None)

Functions to handle manipulation of .tar files in Snakemake

Supports the creation of new tarfile outputs, the modification of existing tarfiles, and the opening of existing tar files as inputs.

root

The directory in which to place the open tarfile directories. Intended to be a temporary directory

Type

Path or str

__call__(cmd)

Modify shell script to manipulate .tar files as directories

Parameters

cmd (str) – Command to run

Returns

Modified shell script

Return type

str

using(inputs=None, outputs=None, modify=None, clear_mounts=None)

Set inputs, outputs, and modifies for tarring, and other settings

Setting inputs and outputs

Use wildcard inputs and outputs using “{input.foo}” or similar, or any arbitrary path, e.g. “{params.atlas}”.

  • Inputs: Extracts tar file inputs into a directory of your choice. The tar file is renamed (with a .swap suffix) and a symlink of the same name as the tarfile is made to the unpacked directory. Upon completion or failure of the job, the symlink is automatically closed.

  • Modify: Opens the tarfile as with inputs. Upon successful completion of the job, the directory is packaged into a new tarfile, and the old tarfile is deleted.

  • Outputs: Creates a new directory symlinked by the name of the tarfile. Upon successful completion of the job, the directory is packaged into a tarfile. Previous tarfiles produced by the rule will be overwritten, as is usual for Snakemake, however an error will be thrown if any output.swap is found (e.g. file.tar.gz.out)

All files are g-zipped, so .tar.gz should be used as the extension for all inputs and outputs affected by the function

Clearing mounts

Tar typically does not delete any extracted tarfile contents. This way, if multiple rules use the same input tarball, the file only needs to be unpackked once. A problem occurs, however, when one of those rules modifies the unpacked contents. Because the other rules read the same unpacked contents, the modifications will be propogated to all following rules, which is likely not desired. Thus, when closing an input tar file, Tar will check if the unpacked contents have been modified in any way. If modifications are found, the mount will be cleared, forcing future rules to unpack a fresh instance of the input tarball.

Checking for modifications may take a considerable amount of time on very large directories. In such cases, you may wish to manually set clear_mounts. True will force the clearing of input tarball mounts, and False will disable clearing. Note that you should never disable clearing to purposefully allow modifications made by one rule to propogate to another rule, as this can lead to inconsistent behaviour. Instead, save any modifications to a new tarball using output or save your modifications to the existing tarball using modify.

Parameters
  • inputs (List of str) – List of inputs. Use “{input.foo}” for wildcard paths

  • outputs (list of str) – List of outputs. Use “{output.foo}” for wildcard paths

  • modify (list of str) – List of files to modify

  • clear_mounts – (optional bool): Force the deletion or preservation of tar directories following rule completion

Returns

A fresh Tar instance with the update inputs, outputs, and modifies

Return type

Tar

X-server

class snakeboost.XvfbRun

Functions to enable virtual x11 servers on compute clusters

xvfb-run is only used if $DISPLAY is not set

__call__(cmd)

Start a virtual x11 server on compute clusters

Computers without graphic support, such as compute clusters, cannot typically run commands requiring and x-server. This function wraps commands with xvfb-run, which starts a virtual x-server. This command is thread safe

Parameters

cmd (str) – The command to run

Returns

The modified shell script

Return type

str