Enhancers
Pipenv
- class snakeboost.PipEnv(root, flags='', packages=None, requirements=None)
Functions to handle the creation of pip virtualenvs for Snakemake rules
Creates a virtualenv in the directory of choice intended for use in Snakemake rules. Both packages and requirements.txt files can be specified, and all will be installed, first requirements.txt (in the order specified), then packages. The virtualenv is stored in a directory under root named according to the hash of the package names and the contents of the requirements.txt files. Thus, multiple virtualenvs can easily be created, but each venv will not be made more than once.
Supports thread-safe installation, so multiple jobs depending on the same venv may be run simultaneously.
- Parameters
root (Path or str) – The directory in which to place the virtualenv. Intended to be a temporary directory
flags (str) – Flags to include on every call of pip install (e.g. custom wheelhouse paths)
packages (List[str]) – List of packages to install. Can be any valid pip package identifier (with or without version specification)
requirements (List[str]) – List of paths to requirements.txt files
- property get_venv
Script to check for venv, installing if necessary
This can be embedded at the beginning of a shell script to ensure the existance of the venv.
Typically, this should NOT be used. Prefer the use of
make_venv()
, or any of the other methods ofPipEnv
- Returns
Bash script to look for a venv and create one if necessary
- Return type
- make_venv(cmd)
Ensure of existence of venv and run any arbitrary command
- Parameters
cmd – Command to run
- Returns
Modified shell script
- Return type
- python(cmd)
Ensure existance of venv then run python command
Prepends the path of the python executable to the shell script. This can be used to run a python file (with a fully resolved path) or a python module (using the -m flag).
When using multiple enhancers, this must ALWAYS be the last one before the command.
- script(cmd)
Ensure existance of venv then run python script
This appends the path of the venv /bin directory to the shell script. The very first item in the script should thus be the name of an executable python script installed in the /bin dir.
When using multiple enhancers, this must ALWAYS be the last one before the command.
Pyscript
- class snakeboost.Pyscript(snakefile_dir)
Functions to run python scripts
Runs python scripts similarly to the script directive in Snakemake, but can be used with Snakeboost
PipEnvs
. Like the script directive, inputs, outputs, params, and any other Snakemake data can be passed to the script.Pyscript can be combined with any other snakeboost function. It should take the place of the bash script. It can also be combined with Pipenv by wrapping it with the
PipEnv.script()
function.Currently, only items serializable as strings can be provided. This includes text, numbers, Paths, etc. Complex objects may be supported in the future.
The data will be provided to the script via SnakemakeArgs.
Example
To preserve named data, such as:
input: first="/path/to/first", second="/path/to/second"
the names of the data must be provided when calling the script. See the
__call__()
method for more details.- Parameters
snakefile_dir (Path or str) – Path to the snakemake app directory or Snakefile directory. This, combined with the script path provided later, should form a fully resolved path to the script, e.g. snakefile_dir/script_path.py
python_path (Path or str) – python executable with which to call the script
- __call__(script, *, python_path=None, input=None, output=None, params=None, wildcards=None, resources=None, log=None)
Generate bash command to call python script.
Any data passed to the function will be passed to the script under the appropriate variable names. Data names can also be provided here using the parameters. Each parameter takes a list of variable names associated with the data type. For example, if there are three params: x, y, and z, the params argument here could be set to [“x”, “z”]. This would cause x and z to be passed to the script. These arguments take precedence over data passed through the Pyscript methods.
Any data types not annotated via Pyscript methods or call parameters will be passed to the script as a List.
- Parameters
script (str) – Path of the script to run. This, when combined with the snakemake_dir provided to Pyscript, should form a fully resolved path to the script.
input (List of str) –
output (List of str) –
params (List of str) –
wildcards (List of str) –
resources (List of str) –
log (List of str) –
- Returns
Bash command to be passed to the snakemake shell directive
- Return type
- Raises
FileExistsError – Raised if the specified script does not exist
- input(**kwargs)
Set named inputs to the pyscript
Wrap this function around your rule inputs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.input(…)
- Returns
- Dict of name, value pairs. This should be unpacked using a double
asterisk
- Return type
Dict
- log(**kwargs)
Set named logs to the pyscript
Wrap this function around your rule logs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.log(…)
- Returns
- Dict of name, value pairs. This should be unpacked using a double
asterisk
- Return type
Dict
- output(**kwargs)
Set named outputs to the pyscript
Wrap this function around your rule outputs. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.output(…)
- Returns
- Dict of name, value pairs. This should be unpacked using a double
asterisk
- Return type
Dict
- params(**kwargs)
Set named params to the pyscript
Wrap this function around your rule params. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.params(…)
- Returns
- Dict of name, value pairs. This should be unpacked using a double
asterisk
- Return type
Dict
- resources(**kwargs)
Set named resources to the pyscript
Wrap this function around your rule resources. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.resources(…)
- Returns
Dict of name, value pairs. This should be unpacked using a double asterisk
- Return type
Dict
- wildcards(**kwargs)
Set named wildcards to the pyscript
Wrap this function around your rule wildcards. Be sure to include a double asterisk before the function to unpack the dict, e.g. **pyscript.wildcards(…)
- Returns
- Dict of name, value pairs. This should be unpacked using a double
asterisk
- Return type
Dict
Tar
- class snakeboost.Tar(root, inputs=None, outputs=None, modify=None, clear_mounts=None)
Functions to handle manipulation of .tar files in Snakemake
Supports the creation of new tarfile outputs, the modification of existing tarfiles, and the opening of existing tar files as inputs.
- root
The directory in which to place the open tarfile directories. Intended to be a temporary directory
- Type
Path or str
- __call__(cmd)
Modify shell script to manipulate .tar files as directories
- using(inputs=None, outputs=None, modify=None, clear_mounts=None)
Set inputs, outputs, and modifies for tarring, and other settings
Setting inputs and outputs
Use wildcard inputs and outputs using “{input.foo}” or similar, or any arbitrary path, e.g. “{params.atlas}”.
Inputs: Extracts tar file inputs into a directory of your choice. The tar file is renamed (with a .swap suffix) and a symlink of the same name as the tarfile is made to the unpacked directory. Upon completion or failure of the job, the symlink is automatically closed.
Modify: Opens the tarfile as with inputs. Upon successful completion of the job, the directory is packaged into a new tarfile, and the old tarfile is deleted.
Outputs: Creates a new directory symlinked by the name of the tarfile. Upon successful completion of the job, the directory is packaged into a tarfile. Previous tarfiles produced by the rule will be overwritten, as is usual for Snakemake, however an error will be thrown if any output.swap is found (e.g. file.tar.gz.out)
All files are g-zipped, so .tar.gz should be used as the extension for all inputs and outputs affected by the function
Clearing mounts
Tar typically does not delete any extracted tarfile contents. This way, if multiple rules use the same input tarball, the file only needs to be unpackked once. A problem occurs, however, when one of those rules modifies the unpacked contents. Because the other rules read the same unpacked contents, the modifications will be propogated to all following rules, which is likely not desired. Thus, when closing an input tar file, Tar will check if the unpacked contents have been modified in any way. If modifications are found, the mount will be cleared, forcing future rules to unpack a fresh instance of the input tarball.
Checking for modifications may take a considerable amount of time on very large directories. In such cases, you may wish to manually set clear_mounts. True will force the clearing of input tarball mounts, and False will disable clearing. Note that you should never disable clearing to purposefully allow modifications made by one rule to propogate to another rule, as this can lead to inconsistent behaviour. Instead, save any modifications to a new tarball using output or save your modifications to the existing tarball using modify.
- Parameters
inputs (List of str) – List of inputs. Use “{input.foo}” for wildcard paths
outputs (list of str) – List of outputs. Use “{output.foo}” for wildcard paths
modify (list of str) – List of files to modify
clear_mounts – (optional bool): Force the deletion or preservation of tar directories following rule completion
- Returns
A fresh Tar instance with the update inputs, outputs, and modifies
- Return type
X-server
- class snakeboost.XvfbRun
Functions to enable virtual x11 servers on compute clusters
xvfb-run is only used if
$DISPLAY
is not set- __call__(cmd)
Start a virtual x11 server on compute clusters
Computers without graphic support, such as compute clusters, cannot typically run commands requiring and x-server. This function wraps commands with xvfb-run, which starts a virtual x-server. This command is thread safe