TJProdEst
Documentation for TJProdEst.
TJProdEst.ResultsTJProdEst.SetupTJProdEst.bootstrap_tj_prodestTJProdEst.draw_sampleTJProdEst.fastOLSTJProdEst.jt_data_prepTJProdEst.lagging_that_panel!TJProdEst.panel_lag!TJProdEst.polynomial_fnc_fast!TJProdEst.polynomial_fnc_fast!TJProdEst.res_struct_initTJProdEst.setup_struct_initTJProdEst.superscript_this!TJProdEst.tj_onestep_estimatorTJProdEst.tj_print_res_bigestimatorTJProdEst.tj_prod_estTJProdEst.tj_prod_reg!TJProdEst.tj_prod_reg!TJProdEst.tj_prodest_criterionTJProdEst.tj_prodest_estimation!TJProdEst.tj_std_error_stats
TJProdEst.Results — TypeResultsMutable struct storing estimation results with nested NamedTuples for production function and ω law-of-motion parameters.
Fields
point_estimates::NamedTuple: Point estimates with(prd_fnc, ω_lom)structurestd_errors::NamedTuple: Standard errorsvariance::NamedTuple: Variance estimatesp_values::NamedTuple: p-values for hypothesis testst_statistics::NamedTuple: t-statisticsconf_intervals::NamedTuple: Confidence intervals (2-element vectors)criterion_value::Float64: Final GMM criterion value
TJProdEst.Setup — TypeSetupImmutable struct storing estimation configuration and data settings.
Fields
output::Symbol: Output variable column nameflexible_input::Symbol: Flexible input variable (e.g., materials)fixed_inputs::Union{Symbol,Vector{Symbol}}: Fixed input(s) (e.g., capital, labor)flexible_input_price::Symbol: Price for flexible inputall_inputs::Vector{Symbol}: Concatenation offixed_inputsandflexible_inputoutput_price::Symbol: Output price variableω_lom_degree::Int: Polynomial degree for ω law-of-motionω_shifter::Union{Symbol, Vector{Symbol}}: Additional shifters in ω LOMtime::Union{Symbol, Missing}: Time variable for panel dataid::Union{Symbol, Missing}: Firm/panel identifierprd_fnc_form::String: Production function form (e.g., "CobbDouglas")std_err_estimation::Bool: Whether to compute standard errorsstd_err_type::String: Standard error method (e.g., "Bootstrap")boot_reps::Int: Number of bootstrap replicationsmaximum_boot_tries::Int: Maximum retry attempts for bootstrapoptimizer_options::NamedTuple: Optimization settings (bounds, startvals, optimizer, optim_options)
TJProdEst.bootstrap_tj_prodest — Methodbootstrap_tj_prodest(data, Setup, Results) -> Matrix{Float64}Generate bootstrap estimates by resampling firms and re-estimating the model. Returns a matrix where each row contains parameter estimates from one bootstrap replication.
Arguments
data::DataFrame: Estimation datasetSetup::Setup: Configuration withboot_repsandmaximum_boot_triesResults::Results: Results struct (used for parameter structure)
Returns
Matrix{Float64}: Bootstrap estimates matrix of size(boot_reps × n_params)where n_params = production function params + ω LOM params. Each row is one bootstrap replication.
Details
- Uses panel bootstrap: samples firms with replacement, keeps entire time series
- Parallelized across bootstrap repetitions using
Threads.@threads - Retries failed estimations up to
maximum_boot_triestimes per replication
TJProdEst.draw_sample — Methoddraw_sample(; data, id, sample_size=length(unique(data[!, id])), with_replacement=true) -> DataFrameDraw bootstrap sample of firms from panel data. Assigns unique IDs to resampled firms to avoid duplicate ID issues in subsequent panel operations.
Keyword Arguments
data::DataFrame: Panel dataset to sample fromid::Symbol: Firm identifier columnsample_size::Int: Number of firms to sample (default: all unique firms)with_replacement::Bool=true: Sample with or without replacement
TJProdEst.fastOLS — MethodfastOLS(; Y, X, multicolcheck=true, force=false) -> Vector{Float64}Compute OLS coefficients β̂ = (X'X)⁻¹X'Y using in-place Cholesky decomposition for minimal allocations. Optionally checks and handles multicollinearity.
Keyword Arguments
Y::Union{Matrix, Vector}: Dependent variable(s)X::Union{Matrix{<:Number}, Vector{<:Number}}: Design matrix of regressorsmulticolcheck::Bool=true: Check for perfect multicollinearityforce::Bool=false: Auto-drop multicollinear columns with warning instead of error
TJProdEst.jt_data_prep — Methodjt_data_prep(data::DataFrame, Setup::Setup) -> DataFramePrepare panel data for estimation by computing proxy variable (log flexible input share) and creating lagged variables. Returns filtered dataset with complete observations.
TJProdEst.lagging_that_panel! — Methodlagging_that_panel!(; data, id, time, variable, lag_prefix="lag_", lags=1, drop_missings=false) -> DataFrameInternal helper for panel_lag!. Computes lags using ShiftedArrays.lag within groups, validates time gaps (sets to missing if gap ≠ lags), joins back to data, and renames columns.
TJProdEst.panel_lag! — Methodpanel_lag!(; data, id, time, variable, lag_prefix="lag_", lags=1, drop_missings=false, force=false) -> DataFrameCompute panel lags in-place using ShiftedArrays.lag within groups. Sorts by id and time, creates lag columns with specified prefix, and validates time gaps.
Keyword Arguments
data::DataFrame: Data frame to mutateid::Symbol: Panel identifier columntime::Symbol: Time variable for orderingvariable::Union{Vector{Symbol},Symbol}: Column(s) to laglag_prefix::String="lag_": Prefix for lag column nameslags::Int=1: Lag distancedrop_missings::Bool=false: Drop rows with missing lagsforce::Bool=false: Remove existing lag columns if present
TJProdEst.polynomial_fnc_fast! — Methodpolynomial_fnc_fast!(poly_mat, degree; par_cal=false) -> poly_matIn-place computation of polynomial terms. Fills columns 2 through degree of poly_mat with powers 2 through degree of column 1. Preallocated matrix avoids allocations during repeated calls.
Arguments
poly_mat::Union{Array{<:Number}, SubArray{<:Number}}: Matrix with base values in column 1degree::Int: Highest polynomial degree to compute
Keyword Arguments
par_cal::Bool=false: UseThreads.@threadsfor parallel computation
TJProdEst.polynomial_fnc_fast! — Methodpolynomial_fnc_fast!(poly_mat, degree; par_cal=false) -> poly_matIn-place computation of polynomial terms from a base variable. This function mutates poly_mat by filling its columns with successive powers of the first column: the i-th column is set to (first column)^i.
This is a performance-optimized version designed for repeated evaluations where the matrix has already been preallocated. It avoids dynamic allocations at runtime by mutating the input matrix directly (hence the trailing !).
Arguments
poly_mat::Union{Array{<:Number}, SubArray{<:Number}}: A matrix where the first column contains the base values and columns 2 throughdegreewill be filled with polynomial terms. Must have at leastdegreecolumns.degree::Int: The highest polynomial degree to compute. For example, ifdegree=3, columns 2 and 3 will contain the squared and cubed values of column 1, respectively.
Keyword Arguments
par_cal::Bool=false: Iftrue, uses parallel computation viaThreads.@threadsto compute polynomial columns concurrently.
Returns
- Returns the mutated
poly_matwith polynomial columns filled in-place.
Notes
- The function assumes
poly_mathas been preallocated with sufficient columns (at leastdegreecolumns). No bounds checking is performed. - Column 1 is never modified; it serves as the base for all polynomial terms.
- Columns 2 through
degreeare overwritten with powers 2 throughdegree. - For small datasets or low degrees,
par_cal=false(sequential) is typically faster due to threading overhead.
Example
# Preallocate a matrix with 3 columns for base values and polynomials up to degree 3
poly_mat = zeros(100, 3)
poly_mat[:, 1] .= rand(100) # Fill first column with base values
# Compute polynomial terms in-place
polynomial_fnc_fast!(poly_mat, 3)
# Now poly_mat[:, 2] contains squared values, poly_mat[:, 3] contains cubed valuesTJProdEst.res_struct_init — Methodres_struct_init(Setup::Setup) -> ResultsInitialize a Results struct with missing values based on Setup configuration. Creates nested NamedTuples for production function (constant + all_inputs) and ω law-of-motion (ω terms + shifters) parameters.
TJProdEst.setup_struct_init — Methodsetup_struct_init(data, output, flexible_input, fixed_inputs, flexible_input_price,
output_price, ω_lom_degree, time, id, prd_fnc_form, options) -> SetupConstruct a Setup struct with all estimation configuration. Merges user-provided optimizer_options with defaults and builds all_inputs by concatenating fixed_inputs and flexible_input.
Arguments
data::DataFrame: the input dataset.output::Symbol: dependent variable (output) column name.flexible_input::Vector{Symbol}: vector of flexible input column names.fixed_inputs::Vector{Symbol}: vector of fixed input column names.flexible_input_price::Symbol: price variable for the flexible input.output_price::Symbol: price variable for the output.ω_lom_degree::Int: degree for the ω polynomial (order of LOM terms).time::Union{Symbol, Missing}: time variable column (ormissing).id::Union{Symbol, Missing}: firm identifier column (ormissing).prd_fnc_form::String: production function form (e.g. "CobbDouglas").options::Dict{Symbol,Any}: extra options passed to the estimator.
Returns
Setup: a filledSetupstruct ready to be used by the estimation routine.
TJProdEst.superscript_this! — Methodsuperscript_this!(c::String) -> CharConvert first character of string to its Unicode superscript equivalent using superscript_map. Returns original character if no superscript exists.
TJProdEst.tj_onestep_estimator — Methodtj_onestep_estimator(data, Setup, Results) -> NamedTuplePerform one-step GMM estimation of production function parameters.
Arguments
data::DataFrame: Prepared estimation dataset containing output, inputs, prices, and lagged variables. Should be the output ofjt_data_prep.Setup::Setup: Configuration struct containing model specification (variable names, polynomial degree, optimizer settings, etc.).Results::Results: Results struct that will store the criterion value and is used to determine the structure of output estimates.
Returns
NamedTuple: A nested NamedTuple with two fields:prd_fnc: Production function parameters as a NamedTuple with keys like:constant,:K,:L,:M(depends onSetup.all_inputs)ω_lom: Productivity law-of-motion parameters as a NamedTuple with keys like:ω,:ω², etc. (depends onSetup.ω_lom_degreeandSetup.ω_shifter)
Optimization Details
The optimization uses Optim.jl's numerical optimization routines and allows to set all optimizer options with Optim.Options(...). One can set Box constraints providing the lower_bound and upper_bound arguments in TJProdEst.tj_prod_est combined with a Optim.Fminbox optimizer.
Throws
- Throws an error with message "Estimation did not converge..." if the optimizer fails to converge.
- Throws an error for unsupported production function forms (currently only "CobbDouglas" is supported).
TJProdEst.tj_print_res_bigestimator — Methodtj_print_res_bigestimator(data::DataFrame, Results::Results, Setup::Setup)Print formatted estimation results table using PrettyTables. Displays production function and ω law-of-motion parameters with standard errors, t-statistics, p-values, and confidence intervals.
TJProdEst.tj_prod_est — Methodtj_prod_est(; data, output, flexible_input, fixed_inputs, flexible_input_price, output_price, ω_lom_degree=1, ω_shifter=[], time, id, std_err_estimation=true, std_err_type="Bootstrap", boot_reps=200, maximum_boot_tries=10, optimizer_options=NamedTuple())Top-level estimation entry point for production function estimation using the approach described in Trunschke and Judd (2024). Returns a tuple (Results, Setup) with parameter estimates and configuration.
Keyword Arguments
data::DataFrame: Input dataset with (firm-time) panel structureoutput::Symbol: Output variable column nameflexible_input::Symbol: Flexible input variable (e.g., materials)fixed_inputs::Union{Symbol,Vector{Symbol}}: Fixed input variable(s) (e.g., capital, labor)flexible_input_price::Symbol: Price of flexible inputoutput_price::Symbol: Output priceω_lom_degree::Int=1: Polynomial degree for productivity law-of-motionω_shifter::Union{Symbol,Vector{Symbol}}=[]: Optional productivity shifter variablestime::Symbol: Time period identifierid::Symbol: Firm/unit identifierstd_err_estimation::Bool=true: Whether to compute standard errorsstd_err_type::String="Bootstrap": Type of standard errors ("Bootstrap" only currently)boot_reps::Int=200: Number of bootstrap replicationsmaximum_boot_tries::Int=10: Max retry attempts per failed bootstrap iterationoptimizer_options::NamedTuple=NamedTuple(): Optimization settings (see Optim.jl)
Returns
NamedTuple:(Results, Setup)containing estimates and configuration
Example
results = tj_prod_est(
data = df,
output = :Y,
flexible_input = :M,
fixed_inputs = [:K, :L],
flexible_input_price = :Pᴹ,
output_price = :Pʸ,
time = :year,
id = :firm_id,
)TJProdEst.tj_prod_reg! — Functiontj_prod_reg!(data, Setup, β, c) -> NothingIn-place computation of productivity (ω) terms and the law-of-motion (LOM) regression for the production function estimation. This function implements the core calculations for the proxy variable approach, computing current and lagged productivity, estimating the productivity LOM via OLS, and calculating the structural error (ξ).
This is a mutating function (hence the !) that updates the preallocated arrays in the c cache NamedTuple in-place for efficiency.
Arguments
data::DataFrame: The estimation dataset containing output, inputs, prices, and lagged variables. Must include columns for current and lagged values of all production function variables.Setup::Setup: The setup struct containing configuration parameters including variable names, polynomial degree, and production function form.β::Union{Vector{<:Number}, SubArray{...}}: Production function parameters vector. The ordering is: [constant, fixedinputcoeffs..., flexibleinputcoeff]. For Cobb-Douglas: β = [α₀, αₖ₁, αₖ₂, ..., αₘ] where K's are fixed inputs and M is the flexible input.c::NamedTuple: Cache NamedTuple containing preallocated arrays for intermediate calculations. Must include fields:ω_array,ω_lom_array,ρ_hat,ξ_hat. These arrays are mutated in-place.
Returns
- Nothing (the function mutates
cin-place)
Side Effects (Mutations)
The function updates the following fields in c:
c.ω_array: Filled with current productivity (ω) computed from the production function residual after accounting for inputs and the proxy variable.c.ω_lom_array: First column filled with lagged productivity (lagω), then polynomial terms computed via `polynomialfnc_fast!`, and ω-shifter columns (if present) that were preallocated during setup.c.ρ_hat: Filled with OLS coefficients from regressing ω on its lagged polynomial terms and ω-shifters (the LOM parameters).c.ξ_hat: Filled with structural error (innovation to productivity), computed as ξ = ω - LOM(lagω, ωshifters).
Implementation Details
The productivity term ω is recovered from the production function by solving:
ω = ln(Y) + ln(Pᴹ·M / Pʸ·Y) - ln(M) - β₀ - βₘ - ∑βₖ·ln(K) - (βₘ - 1)·ln(M)where the proxy variable relationship is used to invert for productivity.
The law-of-motion is estimated via OLS:
ω = ρ₀ + ρ₁·lag_ω + ρ₂·lag_ω² + ... + ρₚ·lag_ωᵖ + ρ_shifters·shifters + ξExample
# Called internally during GMM criterion evaluation
tj_prod_reg!(est_data, Setup, β_current, cache)
# cache.ξ_hat now contains the structural errors
# cache.ρ_hat contains the LOM parametersSee Also
tj_prodest_criterion: Uses this function to compute moment conditionspolynomial_fnc_fast!: Computes polynomial terms for the LOMfastOLS: Estimates the LOM coefficients
TJProdEst.tj_prod_reg! — Functiontj_prod_reg!(data, Setup, β, c) -> NothingCompute productivity (ω) and its law-of-motion via in-place OLS regression. Mutates the cache c with productivity terms, LOM parameters, and structural errors.
Arguments
data::DataFrame: Estimation dataset with output, inputs, and lagged variablesSetup::Setup: Configuration (variable names, polynomial degree)β::Vector{<:Number}: Production function parameters [constant, fixedinputs..., flexibleinput]c::NamedTuple: Preallocated cache to mutate
Returns
Nothing: The function mutates the cachecin-place.
Side Effects
Updates c fields:
ω_array: Current productivityω_lom_array: Lagged productivity and polynomial termsρ_hat: LOM coefficients (from OLS of ω on lag_ω polynomials + shifters)ξ_hat: Productivity innovations (ω - predicted LOM)
TJProdEst.tj_prodest_criterion — Methodtj_prodest_criterion(; data, Setup, β, weight, c)Compute the GMM (Generalized Method of Moments) criterion function value for the production function estimation. This function evaluates the weighted squared sum of moment conditions given a candidate parameter vector β.
The criterion is minimized during optimization to find the parameter estimates that best satisfy the moment conditions (orthogonality between instruments and residuals).
Keyword arguments
data::DataFrame: prepared estimation dataset with lagged variables and transformed columns.Setup::Setup: configuration struct containing model specification (inputs, variables, degree of ω polynomial, etc.).β::Vector{<:Number}: candidate parameter vector ordered as [constant, fixedinputcoeffs..., flexibleinputcoeff].weight::Union{Array,UniformScaling}: weighting matrix for the moment conditions. Often set to identity matrixIfor exactly identified models.c::NamedTuple: preallocated cache containing arrays for intermediate calculations (ϵ, ξhat, ρhat, mmat, ωarray, etc.) to avoid repeated allocations.
Returns
Float64: the GMM criterion value (weighted sum of squared moments), scaled by sample size.
Notes
- The function computes moment conditions based on orthogonality between productivity shocks (ϵ, ξ) and observables (proxy variable, fixed inputs).
- Calls
tj_prod_reg!internally to compute residuals and ω law-of-motion parameters. - The criterion is minimized by the optimizer in
tj_onestep_estimator.
TJProdEst.tj_prodest_estimation! — Method tj_prodest_estimation!(; data, Setup, Results)Run the production estimation pipeline for the provided dataset. This high-level helper performs the core estimation steps and populates the Results object with point estimates and (optionally) standard error information.
Keyword arguments
data::DataFrame: prepared data to use for estimation (usually the output ofjt_data_prep).Setup::Setup: configuration struct describing inputs, options and model form.Results::Results: mutable results container that will be filled by the estimation routines.
Behavior
- Calls the single-step estimator
tj_onestep_estimatorto compute point estimates and writes them intoResults. - If
Setup.std_err_estimationis true, callstj_se_estimation!to compute standard errors (mutatesResults).
Side effects
- This function mutates the
Resultsobject in-place. It does not return a newResultsinstance; it returnsnothingimplicitly.
Example
# prepare data and setup
results, setup = tj_prod_est(data = df, output = :Y, flexible_input = [:M], fixed_inputs = [:K,:L], flexible_input_price = :Pᴹ, output_price = :Pʸ, ω_lom_degree = 1, time = :year, id = :ID)
# run lower-level estimation directly (results is mutated)
tj_prodest_estimation!(data = prepared_df, Setup = setup, Results = results)TJProdEst.tj_std_error_stats — Methodtj_std_error_stats(data, Setup, Results) -> NothingCompute standard errors, t-statistics, p-values, and confidence intervals via bootstrap resampling. Mutates the Results struct in-place with statistical inference results.
Arguments
data::DataFrame: Estimation datasetSetup::Setup: Configuration including bootstrap settings (boot_reps,std_err_type)Results::Results: Results struct to update with inference statistics
Side Effects
Populates the following fields in Results:
variance: Bootstrap variance estimatesstd_errors: Standard errors (√variance)t_statistics: t-statistics for hypothesis testingp_values: Two-sided p-values (assuming normality)conf_intervals: 95% confidence intervals (±1.96 × SE)
Notes
- Currently only supports bootstrap standard errors
- Uses
bootstrap_tj_prodestto generate bootstrap samples