Cluster and batch workflows
This chapter is for everyone who installs StochasticGene.jl (Pkg.add("StochasticGene")) and needs to:
- run many
fitjobs on a cluster (including NIH Biowulf) using command files (scheduler-agnostic) or swarm files (Biowulf naming) from the stage helpers [make_fitscript][makefitscript], [`makefitscriptsfromcsv][make_fitscripts_from_csv], [makecommandfile`][makecommandfile], [make_commandfile_from_csv][makecommandfilefromcsv], [`makefitscriptsandcommandfilefromcsv][make_fitscripts_and_commandfile_from_csv], and compatibility wrappers [makeswarmfilefromcsv`][makeswarmfilefromcsv], [make_fitscripts_and_swarm_from_csv][makefitscriptsandswarmfrom_csv]; - follow the recommended coupled-model workflow: fit individual units first, then merge those fitted rates into one initial rate file for the coupled model.
The implementations live in stage.jl (stage-native scripts + command files), biowulf.jl (legacy and Biowulf-specific wrappers), and io.jl (merging rate tables). Function signatures and defaults are in the docstrings; this page is the narrative guide published with the GitHub-hosted documentation.
Source entry points:
Stage API (preferred)
Use these in new code:
| Step | Functions |
|---|---|
| Write one key script | [make_fitscript][make_fitscript] |
| Write scripts from keys CSV | [make_fitscripts_from_csv][makefitscriptsfrom_csv] |
| Build one launch command string | [build_julia_script_command][buildjuliascript_command] |
| Write command file from script list | [make_commandfile][makecommandfile], [`writejuliacommandfile`][writejuliacommand_file] |
| Write command file from keys CSV | [make_commandfile_from_csv][makecommandfilefrom_csv] |
| Write scripts + command file from CSV | [make_fitscripts_and_commandfile_from_csv][makefitscriptsandcommandfilefrom_csv] |
| Biowulf naming wrappers | [make_swarmfile_from_csv][makeswarmfilefromcsv], [`makefitscriptsandswarmfromcsv`][makefitscriptsandswarmfrom_csv] |
Gene panels: [write_fitfile_genes][writefitfilegenes] and [makeswarm_genes][makeswarm_genes] produce one shared script with gene=ARGS[1] and one swarm line per gene (still the recommended pattern for many genes, one model).
Coupled models: recommended workflow (single units → merge → coupled fit)
For coupled transcriptional units (e.g. enhancer + gene, with or without a hidden unit), the suggested workflow is:
Fit each unit as its own single-unit model Run separate fits (e.g. enhancer-only and gene-only traces or histograms) so each produces a standard MCMC
rates_*.txt(rows = posterior samples, columns = rate headers for that unit). Use normal [fit][fit] calls or batch them with [makeswarm_models][makeswarm_models] / [makeswarmfiles][makeswarmfiles] in single-unit mode.Merge the fitted rates into one wide table Stack the columns from the two (or more) unit files and append coupling placeholder columns using [
create_combined_file][createcombinedfile] for two-unit models, or [create_combined_file_mult][createcombinedfilemult] for models with more than two units. You chooseNenh/Ngene(or per-unit column counts) to match how each set of rates is laid out in your files (see docstrings). For many keys (e.g. from a CSV of model names), use [`createcombinedfiles`][createcombinedfiles] or [`createcombinedfilesdriver][create_combined_files_driver], which callcreatecombinedfileonce per key and name outputs with [combinedrateskey`][combinedrateskey].Run the coupled fit using the combined file as the starting rates Point
datapath/resultfolder/label(or your run spec) at that merged file so the coupled MCMC warm-starts from the stacked single-unit posteriors. The coupled [fit][fit] uses tupleG,R,coupling, joint datatype (e.g.tracejoint), etc. Coupling strengths are then estimated in the coupled run (the placeholder columns from step 2 get updated).Optional: batch everything on the cluster Use stage-native [
make_fitscripts_from_csv][makefitscriptsfromcsv] + [`makecommandfilefromcsv][make_commandfile_from_csv] (or [makefitscriptsandcommandfilefromcsv`][makefitscriptsandcommandfilefromcsv]) so each job runsfit(; key=..., ...)from prewritteninfo_<key>specs (see Run specification (info TOML)). For Biowulf-style*.swarmnaming, use [make_swarmfile_from_csv][makeswarmfilefrom_csv].
This order—individual fits → merge → coupled fit—is the standard way to get a sensible initial combined rate file for coupled models without fitting all parameters cold.
Scheduler launch files and Biowulf swarm
Stage helpers do not submit jobs to a scheduler by themselves. They write files you submit with Biowulf’s swarm or another launcher (sbatch, GNU parallel, custom wrappers):
<commandfile>(defaultfit.commands) — one command line per run key (julia ... fitscript_<key>.jl).<swarmfile>.swarm— same content style, Biowulf-compatible extension via [make_swarmfile_from_csv][makeswarmfilefrom_csv].fitscript_<key>.jlper key — typically callsfit(; key="<key>", ...)with shared options (resultfolder,maxtime,samplesteps,warmupsteps,inference_method,device,parallel,gradient, etc.).
Typical use on Biowulf
- Install StochasticGene in your Julia environment (see Installation, including Biowulf Installation).
- From Julia (interactive session or batch script), run something like:
using StochasticGene
# Stage-native pipeline: scripts + command file from a keys CSV.
# CSV requires a key column, default: Model_name
make_fitscripts_and_commandfile_from_csv(
"keys.csv";
filedir = "my_jobs",
commandfile = "fit.commands",
juliafile = "fitscript",
project = "/path/to/your/StochasticGene.jl",
nprocs = 4,
nthreads = 1,
resultfolder = "my_results",
root = ".",
maxtime = 72000.0,
samplesteps = 1_000_000,
)
# If you specifically want .swarm naming:
# make_fitscripts_and_swarm_from_csv("keys.csv"; filedir="my_jobs", swarmfile="fit", ...)- Submit the swarm from the shell (example):
cd my_jobs
swarm -g 4 -t 16 -b 1 --time 24:00:00 --module julialang -f fit.swarmAdjust -g, -t, time, and module to match your allocation and Julia module name on Biowulf.
Generating keys and info_<key> in bulk
- [
write_run_spec_preset][writerunspecpreset] — write `info<key>.jld2` + marker TOML for one key. - [
makeswarm_models][makeswarm_models] — sweep single-unitG,R,S,insertstep, write presets, then call Biowulf-oriented writers. - [
makeswarmfiles][makeswarmfiles] — unified legacy Biowulf entry: coupled key lists (CSV, explicitbase_keys, or H3 grids) or single-unit sweeps; writes presets and emits swarm+scripts. - [
makeswarmfiles_h3_latent][makeswarmfilesh3latent] — convenience for H3 latent key grids.
Swarm julia -p, nchains, merged info_<key>, and root
Parallel workers: The swarm command should use
-p N(or equivalent) consistent with how many chains run in parallel. For [makeswarmfiles][makeswarmfiles] / [makeswarm_models][makeswarmmodels], if you do not pass an explicit swarm-onlynchains=in kwargs, the generated-pis taken from each run spec’snchains(e.g. coupled defaults often use 16), so it stays aligned withfit(; …, nchains=…). See the [makeswarmfiles][makeswarmfiles] docstring. For NUTS/ADVI,nchainsstill controls how many independent chains are launched; within-chain parallelism follows each method’s options (parallelon [NUTSOptions][NUTSOptions] / [ADVIOptions][ADVIOptions], set via [`loadoptions`][load_options] from the run spec).Merged presets: With
merge_existing_info=true(default), olderinfo_<key>.jld2files are merged into new specs. Legacytrace_specssometimes used a huget_end(historical “open end” sentinel). When saving, [write_run_spec_preset][writerunspecpreset] runs [`normalizetracespecslegacytend!][normalize_trace_specs_legacy_t_end] so those values are rewritten to **tend = -1.0**, matching current [defaulttracespecsforcoupled`][defaulttracespecsforcoupled] and avoiding invalid frame indices in [`readtracefiles`][read_tracefiles].rootin generated fit scripts: Scripts listrootexactly as in the run spec (no forcedabspath). Useroot="."if the job’s working directory is the project root (setcdin the swarm or submit from the right folder). Paths resolved in an interactive Biowulf session can differ from batch jobs;"."avoids baking in an interactive-only absolute path.
Key-based naming
Many batch helpers assume a string key per run:
results/<resultfolder>/info_<key>.tomlandinfo_<key>.jld2rates_<key>.txt
See Run specification (info TOML). Presets for cluster reruns are written with [write_run_spec_preset][writerunspec_preset].
Combined rate files (io.jl) — reference
- [
read_rates_table][readratestable], [write_rates_table][writeratestable] - [
merge_coupled_two_unit_rates][mergecoupledtwounitrates], [merge_coupled_stacked_units][mergecoupledstacked_units] - [
create_combined_file][createcombinedfile], [create_combined_file_mult][createcombinedfile_mult] - [
read_combined_file_specs_csv][readcombinedfilespecscsv], [create_combined_files_driver][createcombinedfilesdriver], [`createcombinedfiles`][createcombinedfiles], [`createcombinedfilesh3latent`][createcombinedfilesh3_latent]
After the coupled fit
Post-processing examples: [write_correlation_functions][writecorrelationfunctions], [write_traces][write_traces], and other analysis functions in the API Reference.
See also
- Run specification (info TOML)
- Coupled model analysis (example-focused; batch mechanics are on this page)
- Installation (includes a Biowulf subsection)
- Model fitting (
fit)
Maintainer note: where to document what
| Topic | Canonical place |
|---|---|
| User workflows, stage command files, Biowulf submission, coupled merge order | This page (hosted docs) |
info_<key> file format | runspectoml.md |
| README on GitHub | Short pointer + link to hosted documents |
| Exact function signatures | Docstrings in stage.jl / biowulf.jl / io.jl |
[ADVIOptions]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=ADVIOptions&type=code [NUTSOptions]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=NUTSOptions&type=code [buildjuliascriptcommand]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=buildjuliascriptcommand&type=code [combinedrateskey]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=combinedrateskey&type=code [createcombinedfile]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=createcombinedfile&type=code [createcombinedfilemult]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=createcombinedfilemult&type=code [createcombinedfiles]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=createcombinedfiles&type=code [createcombinedfilesdriver]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=createcombinedfilesdriver&type=code [createcombinedfilesh3latent]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=createcombinedfilesh3latent&type=code [defaulttracespecsforcoupled]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=defaulttracespecsforcoupled&type=code [fit]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=fit&type=code [loadoptions]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=loadoptions&type=code [makecommandfile]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makecommandfile&type=code [makecommandfilefromcsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makecommandfilefromcsv&type=code [makefitscript]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makefitscript&type=code [makefitscriptsandcommandfilefromcsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makefitscriptsandcommandfilefromcsv&type=code [makefitscriptsandswarmfromcsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makefitscriptsandswarmfromcsv&type=code [makefitscriptsfromcsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makefitscriptsfromcsv&type=code [makeswarmfilefromcsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makeswarmfilefromcsv&type=code [makeswarmgenes]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makeswarmgenes&type=code [makeswarmmodels]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makeswarmmodels&type=code [makeswarmfiles]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makeswarmfiles&type=code [makeswarmfilesh3latent]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=makeswarmfilesh3latent&type=code [mergecoupledstackedunits]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=mergecoupledstackedunits&type=code [mergecoupledtwounitrates]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=mergecoupledtwounitrates&type=code [normalizetracespecslegacytend]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=normalizetracespecslegacytend%21&type=code [readcombinedfilespecscsv]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=readcombinedfilespecscsv&type=code [readratestable]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=readratestable&type=code [readtracefiles]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=readtracefiles&type=code [writecorrelationfunctions]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writecorrelationfunctions&type=code [writefitfilegenes]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writefitfilegenes&type=code [writejuliacommandfile]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writejuliacommandfile&type=code [writeratestable]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writeratestable&type=code [writerunspecpreset]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writerunspecpreset&type=code [writetraces]: https://github.com/nih-niddk-mbs/StochasticGene.jl/search?q=writetraces&type=code