fit Function
The full keyword list, coupling description, key-based loading, and return values are maintained in the in-source docstring for fit. In the Julia REPL, use ?fit after using StochasticGene. This page is a short summary; details may lag the code.
Fit steady state or transient GM/GRSM model to RNA data for a single gene, write the result (through function finalize), and return fit results and diagnostics.
For coupled transcribing units, arguments transitions, G, R, S, insertstep, and trace become tuples of the single unit type, e.g. If two types of transcription models are desired with G=2 and G=3 then G = (2,3).
Syntax
fits = fit(; kwargs...)Arguments
Basic Model Parameters
G::Int = 2: Number of gene statesR::Int = 0: Number of pre-RNA stepsS::Int = 0: Number of splice sites (must be ≤ R - insertstep + 1)insertstep::Int = 1: Reporter insertion step (must be ≥ 1; ignored when R = 0)transitions::Tuple: Tuple of vectors specifying state transitions
Data Parameters
datatype::String = "rna": Data type:- "rna": mRNA count distributions
- "trace": Intensity traces
- "rnadwelltime": Combined RNA and dwell time data
- "tracejoint": Simultaneous recorded traces
datapath::String = "": Path to data file or folderdatacond::String = "": Data condition identifiercell::String = "": Cell typegene::String = "MYC": Gene namenalleles::Int = 1: Number of alleles
Fitting Parameters
nchains::Int = 2: Number of parallel chains for Metropolis–Hastings (inference=:mh). For NUTS (inference=:nuts), must be1(single chain).inference = :mh: Posterior algorithm —INFERENCE_MH(MH, default),INFERENCE_NUTS(NUTS via AdvancedHMC; usesamplesteps/warmupstepsasn_samples/n_adapts), orINFERENCE_ADVI(not wired throughfit; callrun_advi).steady_state_solver = :augmented: Passed to the likelihood when using NUTS.ad_likelihood = nothing: Passed to NUTS (nothingselects AD likelihood for RNA count data when appropriate).maxtime = 60: Maximum total wall time for the MH phase, including warmup and sampling together. A numeric value is seconds; you may also pass a string with suffixm(minutes) orh(hours), e.g."90m","2h". Setsamplestepslarge and usemaxtimeas the primary stop (e.g. cluster time limits). Seemaxtime_seconds. (NUTS stopping is controlled bysamplesteps/warmupsteps;maxtimeis not applied to NUTS in the currentfitwiring.)samplesteps::Int = 1000000: MH: max sampling steps (may stop earlier atmaxtime). NUTS:n_samples.warmupsteps::Int = 0: MH: discarded warmup (sharedmaxtimewith sampling). NUTS:n_adapts.propcv::Float64 = 0.01: Proposal distribution coefficient of variationtemp::Float64 = 1.0: MCMC temperature
Prior Parameters
priormean::Vector{Float64} = Float64[]: Mean rates for prior distributionpriorcv::Float64 = 10.0: Prior distribution coefficient of variationnoisepriors::Vector = []: Observation noise priorsfittedparams::Vector{Int} = Int[]: Indices of rates to fitfixedeffects::Tuple = tuple(): Fixed effects specification
Trace Parameters (for trace data)
traceinfo::Tuple = (1.0, 1., -1, 1.): Trace parameters:- Frame interval (minutes)
- Starting frame time (minutes)
- Ending frame time (-1 for last)
- Fraction of active traces
datacol::Int = 3: Data column indexprobfn::Function = prob_Gaussian: Observation probability functionnoiseparams::Int = 4: Number of noise parameterszeromedian::Bool = true: Subtract median from traces
Output Parameters
resultfolder::String = "test": Results output folderlabel::String = "": Output file labelinfolder::String = "": Folder for initial parametersinlabel::String = "": Label of initial parameter fileswritesamples::Bool = false: Write MCMC samples
Run specification and key-based naming
key = nothing: When nothing, fit uses the keyword arguments you pass (and defaults). When a string (e.g.key = "33il"), fit looks forinfo_<key>.tomlin the results folder; if found, loads that spec (kwargs override spec); if not found, uses kwargs and defaults. All outputs use that stem (e.g.rates_<key>.txt,info_<key>.toml). See Run specification (info TOML).
Returns
fits: MCMC fit results containing:- Posterior samples
- Log-likelihoods
- Acceptance rates
Examples
Basic RNA Histogram Fit
fits = fit(
G = 2,
R = 0,
transitions = ([1,2], [2,1]),
datatype = "rna",
datapath = "data/HCT116_testdata/",
gene = "MYC",
datacond = "MOCK"
)Trace Data Fit
fits = fit(
G = 3,
R = 2,
S = 2,
insertstep = 1,
transitions = ([1,2], [2,1], [2,3], [3,1]),
datatype = "trace",
datapath = "data/testtraces",
cell = "TEST",
gene = "test",
datacond = "testtrace",
traceinfo = (1.0, 1., -1, 1.),
noisepriors = [40., 20., 200., 10.],
nchains = 4
)Notes
Rate Order
- G transitions
- R transitions
- S transitions
- Decay
- Noise parameters
MCMC Convergence
- R-hat should be close to 1 (ideally < 1.05)
- Increase
maxtimeorsamplestepsif R-hat is high - Use
warmupstepsto improve proposal distribution
Proposal Covariance and Warmup
- First run with expensive models: Set
propcv=0.01andwarmupsteps> 0. The covariance is learned during warmup and saved toproposal-cov_*.jld2. - Subsequent runs with same model: Use
propcv=-0.01to attempt loading the saved covariance. If successful, warmup is automatically skipped (even ifwarmupsteps > 0). If loading fails, falls back to abs(propcv) and warmup runs normally. - Warmup adapts periodically every ~1/3 of warmup steps, targeting an acceptance rate that scales with dimension (30% for typical d=5–20 gene models).
- Time allocation: warmup receives
maxtime × (warmupsteps / total_steps). For very expensive steps (minutes each), increasemaxtimeor reduce total steps if warmup times out.
- First run with expensive models: Set
Key-Based Workflows
- Use
key="<name>"to load run specifications frominfo_<name>.tomland ensure consistent naming across runs. All outputs (includingproposal-cov_<name>.jld2) use that stem.
- Use