TJProdEst

Documentation for TJProdEst.

TJProdEst.ResultsType
Results

Mutable struct storing estimation results with nested NamedTuples for production function and ω law-of-motion parameters.

Fields

  • point_estimates::NamedTuple: Point estimates with (prd_fnc, ω_lom) structure
  • std_errors::NamedTuple: Standard errors
  • variance::NamedTuple: Variance estimates
  • p_values::NamedTuple: p-values for hypothesis tests
  • t_statistics::NamedTuple: t-statistics
  • conf_intervals::NamedTuple: Confidence intervals (2-element vectors)
  • criterion_value::Float64: Final GMM criterion value
source
TJProdEst.SetupType
Setup

Immutable struct storing estimation configuration and data settings.

Fields

  • output::Symbol: Output variable column name
  • flexible_input::Symbol: Flexible input variable (e.g., materials)
  • fixed_inputs::Union{Symbol,Vector{Symbol}}: Fixed input(s) (e.g., capital, labor)
  • flexible_input_price::Symbol: Price for flexible input
  • all_inputs::Vector{Symbol}: Concatenation of fixed_inputs and flexible_input
  • output_price::Symbol: Output price variable
  • ω_lom_degree::Int: Polynomial degree for ω law-of-motion
  • ω_shifter::Union{Symbol, Vector{Symbol}}: Additional shifters in ω LOM
  • time::Union{Symbol, Missing}: Time variable for panel data
  • id::Union{Symbol, Missing}: Firm/panel identifier
  • prd_fnc_form::String: Production function form (e.g., "CobbDouglas")
  • std_err_estimation::Bool: Whether to compute standard errors
  • std_err_type::String: Standard error method (e.g., "Bootstrap")
  • boot_reps::Int: Number of bootstrap replications
  • maximum_boot_tries::Int: Maximum retry attempts for bootstrap
  • optimizer_options::NamedTuple: Optimization settings (bounds, startvals, optimizer, optim_options)
source
TJProdEst.bootstrap_tj_prodestMethod
bootstrap_tj_prodest(data, Setup, Results) -> Matrix{Float64}

Generate bootstrap estimates by resampling firms and re-estimating the model. Returns a matrix where each row contains parameter estimates from one bootstrap replication.

Arguments

  • data::DataFrame: Estimation dataset
  • Setup::Setup: Configuration with boot_reps and maximum_boot_tries
  • Results::Results: Results struct (used for parameter structure)

Returns

  • Matrix{Float64}: Bootstrap estimates matrix of size (boot_reps × n_params) where n_params = production function params + ω LOM params. Each row is one bootstrap replication.

Details

  • Uses panel bootstrap: samples firms with replacement, keeps entire time series
  • Parallelized across bootstrap repetitions using Threads.@threads
  • Retries failed estimations up to maximum_boot_tries times per replication
source
TJProdEst.draw_sampleMethod
draw_sample(; data, id, sample_size=length(unique(data[!, id])), with_replacement=true) -> DataFrame

Draw bootstrap sample of firms from panel data. Assigns unique IDs to resampled firms to avoid duplicate ID issues in subsequent panel operations.

Keyword Arguments

  • data::DataFrame: Panel dataset to sample from
  • id::Symbol: Firm identifier column
  • sample_size::Int: Number of firms to sample (default: all unique firms)
  • with_replacement::Bool=true: Sample with or without replacement
source
TJProdEst.fastOLSMethod
fastOLS(; Y, X, multicolcheck=true, force=false) -> Vector{Float64}

Compute OLS coefficients β̂ = (X'X)⁻¹X'Y using in-place Cholesky decomposition for minimal allocations. Optionally checks and handles multicollinearity.

Keyword Arguments

  • Y::Union{Matrix, Vector}: Dependent variable(s)
  • X::Union{Matrix{<:Number}, Vector{<:Number}}: Design matrix of regressors
  • multicolcheck::Bool=true: Check for perfect multicollinearity
  • force::Bool=false: Auto-drop multicollinear columns with warning instead of error
source
TJProdEst.jt_data_prepMethod
jt_data_prep(data::DataFrame, Setup::Setup) -> DataFrame

Prepare panel data for estimation by computing proxy variable (log flexible input share) and creating lagged variables. Returns filtered dataset with complete observations.

source
TJProdEst.lagging_that_panel!Method
lagging_that_panel!(; data, id, time, variable, lag_prefix="lag_", lags=1, drop_missings=false) -> DataFrame

Internal helper for panel_lag!. Computes lags using ShiftedArrays.lag within groups, validates time gaps (sets to missing if gap ≠ lags), joins back to data, and renames columns.

source
TJProdEst.panel_lag!Method
panel_lag!(; data, id, time, variable, lag_prefix="lag_", lags=1, drop_missings=false, force=false) -> DataFrame

Compute panel lags in-place using ShiftedArrays.lag within groups. Sorts by id and time, creates lag columns with specified prefix, and validates time gaps.

Keyword Arguments

  • data::DataFrame: Data frame to mutate
  • id::Symbol: Panel identifier column
  • time::Symbol: Time variable for ordering
  • variable::Union{Vector{Symbol},Symbol}: Column(s) to lag
  • lag_prefix::String="lag_": Prefix for lag column names
  • lags::Int=1: Lag distance
  • drop_missings::Bool=false: Drop rows with missing lags
  • force::Bool=false: Remove existing lag columns if present
source
TJProdEst.polynomial_fnc_fast!Method
polynomial_fnc_fast!(poly_mat, degree; par_cal=false) -> poly_mat

In-place computation of polynomial terms. Fills columns 2 through degree of poly_mat with powers 2 through degree of column 1. Preallocated matrix avoids allocations during repeated calls.

Arguments

  • poly_mat::Union{Array{<:Number}, SubArray{<:Number}}: Matrix with base values in column 1
  • degree::Int: Highest polynomial degree to compute

Keyword Arguments

  • par_cal::Bool=false: Use Threads.@threads for parallel computation
source
TJProdEst.polynomial_fnc_fast!Method
polynomial_fnc_fast!(poly_mat, degree; par_cal=false) -> poly_mat

In-place computation of polynomial terms from a base variable. This function mutates poly_mat by filling its columns with successive powers of the first column: the i-th column is set to (first column)^i.

This is a performance-optimized version designed for repeated evaluations where the matrix has already been preallocated. It avoids dynamic allocations at runtime by mutating the input matrix directly (hence the trailing !).

Arguments

  • poly_mat::Union{Array{<:Number}, SubArray{<:Number}}: A matrix where the first column contains the base values and columns 2 through degree will be filled with polynomial terms. Must have at least degree columns.
  • degree::Int: The highest polynomial degree to compute. For example, if degree=3, columns 2 and 3 will contain the squared and cubed values of column 1, respectively.

Keyword Arguments

  • par_cal::Bool=false: If true, uses parallel computation via Threads.@threads to compute polynomial columns concurrently.

Returns

  • Returns the mutated poly_mat with polynomial columns filled in-place.

Notes

  • The function assumes poly_mat has been preallocated with sufficient columns (at least degree columns). No bounds checking is performed.
  • Column 1 is never modified; it serves as the base for all polynomial terms.
  • Columns 2 through degree are overwritten with powers 2 through degree.
  • For small datasets or low degrees, par_cal=false (sequential) is typically faster due to threading overhead.

Example

# Preallocate a matrix with 3 columns for base values and polynomials up to degree 3
poly_mat = zeros(100, 3)
poly_mat[:, 1] .= rand(100)  # Fill first column with base values

# Compute polynomial terms in-place
polynomial_fnc_fast!(poly_mat, 3)
# Now poly_mat[:, 2] contains squared values, poly_mat[:, 3] contains cubed values
source
TJProdEst.res_struct_initMethod
res_struct_init(Setup::Setup) -> Results

Initialize a Results struct with missing values based on Setup configuration. Creates nested NamedTuples for production function (constant + all_inputs) and ω law-of-motion (ω terms + shifters) parameters.

source
TJProdEst.setup_struct_initMethod
setup_struct_init(data, output, flexible_input, fixed_inputs, flexible_input_price,
                  output_price, ω_lom_degree, time, id, prd_fnc_form, options) -> Setup

Construct a Setup struct with all estimation configuration. Merges user-provided optimizer_options with defaults and builds all_inputs by concatenating fixed_inputs and flexible_input.

Arguments

  • data::DataFrame: the input dataset.
  • output::Symbol: dependent variable (output) column name.
  • flexible_input::Vector{Symbol}: vector of flexible input column names.
  • fixed_inputs::Vector{Symbol}: vector of fixed input column names.
  • flexible_input_price::Symbol: price variable for the flexible input.
  • output_price::Symbol: price variable for the output.
  • ω_lom_degree::Int: degree for the ω polynomial (order of LOM terms).
  • time::Union{Symbol, Missing}: time variable column (or missing).
  • id::Union{Symbol, Missing}: firm identifier column (or missing).
  • prd_fnc_form::String: production function form (e.g. "CobbDouglas").
  • options::Dict{Symbol,Any}: extra options passed to the estimator.

Returns

  • Setup: a filled Setup struct ready to be used by the estimation routine.
source
TJProdEst.superscript_this!Method
superscript_this!(c::String) -> Char

Convert first character of string to its Unicode superscript equivalent using superscript_map. Returns original character if no superscript exists.

source
TJProdEst.tj_onestep_estimatorMethod
tj_onestep_estimator(data, Setup, Results) -> NamedTuple

Perform one-step GMM estimation of production function parameters.

Arguments

  • data::DataFrame: Prepared estimation dataset containing output, inputs, prices, and lagged variables. Should be the output of jt_data_prep.
  • Setup::Setup: Configuration struct containing model specification (variable names, polynomial degree, optimizer settings, etc.).
  • Results::Results: Results struct that will store the criterion value and is used to determine the structure of output estimates.

Returns

  • NamedTuple: A nested NamedTuple with two fields:
    • prd_fnc: Production function parameters as a NamedTuple with keys like :constant, :K, :L, :M (depends on Setup.all_inputs)
    • ω_lom: Productivity law-of-motion parameters as a NamedTuple with keys like , :ω², etc. (depends on Setup.ω_lom_degree and Setup.ω_shifter)

Optimization Details

The optimization uses Optim.jl's numerical optimization routines and allows to set all optimizer options with Optim.Options(...). One can set Box constraints providing the lower_bound and upper_bound arguments in TJProdEst.tj_prod_est combined with a Optim.Fminbox optimizer.

Throws

  • Throws an error with message "Estimation did not converge..." if the optimizer fails to converge.
  • Throws an error for unsupported production function forms (currently only "CobbDouglas" is supported).
source
TJProdEst.tj_print_res_bigestimatorMethod
tj_print_res_bigestimator(data::DataFrame, Results::Results, Setup::Setup)

Print formatted estimation results table using PrettyTables. Displays production function and ω law-of-motion parameters with standard errors, t-statistics, p-values, and confidence intervals.

source
TJProdEst.tj_prod_estMethod
tj_prod_est(; data, output, flexible_input, fixed_inputs, flexible_input_price, output_price, ω_lom_degree=1, ω_shifter=[], time, id, std_err_estimation=true, std_err_type="Bootstrap", boot_reps=200, maximum_boot_tries=10, optimizer_options=NamedTuple())

Top-level estimation entry point for production function estimation using the approach described in Trunschke and Judd (2024). Returns a tuple (Results, Setup) with parameter estimates and configuration.

Keyword Arguments

  • data::DataFrame: Input dataset with (firm-time) panel structure
  • output::Symbol: Output variable column name
  • flexible_input::Symbol: Flexible input variable (e.g., materials)
  • fixed_inputs::Union{Symbol,Vector{Symbol}}: Fixed input variable(s) (e.g., capital, labor)
  • flexible_input_price::Symbol: Price of flexible input
  • output_price::Symbol: Output price
  • ω_lom_degree::Int=1: Polynomial degree for productivity law-of-motion
  • ω_shifter::Union{Symbol,Vector{Symbol}}=[]: Optional productivity shifter variables
  • time::Symbol: Time period identifier
  • id::Symbol: Firm/unit identifier
  • std_err_estimation::Bool=true: Whether to compute standard errors
  • std_err_type::String="Bootstrap": Type of standard errors ("Bootstrap" only currently)
  • boot_reps::Int=200: Number of bootstrap replications
  • maximum_boot_tries::Int=10: Max retry attempts per failed bootstrap iteration
  • optimizer_options::NamedTuple=NamedTuple(): Optimization settings (see Optim.jl)

Returns

  • NamedTuple: (Results, Setup) containing estimates and configuration

Example

results = tj_prod_est(
    data = df,
    output = :Y,
    flexible_input = :M,
    fixed_inputs = [:K, :L],
    flexible_input_price = :Pᴹ,
    output_price = :Pʸ,
    time = :year,
    id = :firm_id,
)
source
TJProdEst.tj_prod_reg!Function
tj_prod_reg!(data, Setup, β, c) -> Nothing

In-place computation of productivity (ω) terms and the law-of-motion (LOM) regression for the production function estimation. This function implements the core calculations for the proxy variable approach, computing current and lagged productivity, estimating the productivity LOM via OLS, and calculating the structural error (ξ).

This is a mutating function (hence the !) that updates the preallocated arrays in the c cache NamedTuple in-place for efficiency.

Arguments

  • data::DataFrame: The estimation dataset containing output, inputs, prices, and lagged variables. Must include columns for current and lagged values of all production function variables.
  • Setup::Setup: The setup struct containing configuration parameters including variable names, polynomial degree, and production function form.
  • β::Union{Vector{<:Number}, SubArray{...}}: Production function parameters vector. The ordering is: [constant, fixedinputcoeffs..., flexibleinputcoeff]. For Cobb-Douglas: β = [α₀, αₖ₁, αₖ₂, ..., αₘ] where K's are fixed inputs and M is the flexible input.
  • c::NamedTuple: Cache NamedTuple containing preallocated arrays for intermediate calculations. Must include fields: ω_array, ω_lom_array, ρ_hat, ξ_hat. These arrays are mutated in-place.

Returns

  • Nothing (the function mutates c in-place)

Side Effects (Mutations)

The function updates the following fields in c:

  • c.ω_array: Filled with current productivity (ω) computed from the production function residual after accounting for inputs and the proxy variable.
  • c.ω_lom_array: First column filled with lagged productivity (lagω), then polynomial terms computed via `polynomialfnc_fast!`, and ω-shifter columns (if present) that were preallocated during setup.
  • c.ρ_hat: Filled with OLS coefficients from regressing ω on its lagged polynomial terms and ω-shifters (the LOM parameters).
  • c.ξ_hat: Filled with structural error (innovation to productivity), computed as ξ = ω - LOM(lagω, ωshifters).

Implementation Details

The productivity term ω is recovered from the production function by solving:

ω = ln(Y) + ln(Pᴹ·M / Pʸ·Y) - ln(M) - β₀ - βₘ - ∑βₖ·ln(K) - (βₘ - 1)·ln(M)

where the proxy variable relationship is used to invert for productivity.

The law-of-motion is estimated via OLS:

ω = ρ₀ + ρ₁·lag_ω + ρ₂·lag_ω² + ... + ρₚ·lag_ωᵖ + ρ_shifters·shifters + ξ

Example

# Called internally during GMM criterion evaluation
tj_prod_reg!(est_data, Setup, β_current, cache)
# cache.ξ_hat now contains the structural errors
# cache.ρ_hat contains the LOM parameters

See Also

  • tj_prodest_criterion: Uses this function to compute moment conditions
  • polynomial_fnc_fast!: Computes polynomial terms for the LOM
  • fastOLS: Estimates the LOM coefficients
source
TJProdEst.tj_prod_reg!Function
tj_prod_reg!(data, Setup, β, c) -> Nothing

Compute productivity (ω) and its law-of-motion via in-place OLS regression. Mutates the cache c with productivity terms, LOM parameters, and structural errors.

Arguments

  • data::DataFrame: Estimation dataset with output, inputs, and lagged variables
  • Setup::Setup: Configuration (variable names, polynomial degree)
  • β::Vector{<:Number}: Production function parameters [constant, fixedinputs..., flexibleinput]
  • c::NamedTuple: Preallocated cache to mutate

Returns

  • Nothing: The function mutates the cache c in-place.

Side Effects

Updates c fields:

  • ω_array: Current productivity
  • ω_lom_array: Lagged productivity and polynomial terms
  • ρ_hat: LOM coefficients (from OLS of ω on lag_ω polynomials + shifters)
  • ξ_hat: Productivity innovations (ω - predicted LOM)
source
TJProdEst.tj_prodest_criterionMethod
tj_prodest_criterion(; data, Setup, β, weight, c)

Compute the GMM (Generalized Method of Moments) criterion function value for the production function estimation. This function evaluates the weighted squared sum of moment conditions given a candidate parameter vector β.

The criterion is minimized during optimization to find the parameter estimates that best satisfy the moment conditions (orthogonality between instruments and residuals).

Keyword arguments

  • data::DataFrame: prepared estimation dataset with lagged variables and transformed columns.
  • Setup::Setup: configuration struct containing model specification (inputs, variables, degree of ω polynomial, etc.).
  • β::Vector{<:Number}: candidate parameter vector ordered as [constant, fixedinputcoeffs..., flexibleinputcoeff].
  • weight::Union{Array,UniformScaling}: weighting matrix for the moment conditions. Often set to identity matrix I for exactly identified models.
  • c::NamedTuple: preallocated cache containing arrays for intermediate calculations (ϵ, ξhat, ρhat, mmat, ωarray, etc.) to avoid repeated allocations.

Returns

  • Float64: the GMM criterion value (weighted sum of squared moments), scaled by sample size.

Notes

  • The function computes moment conditions based on orthogonality between productivity shocks (ϵ, ξ) and observables (proxy variable, fixed inputs).
  • Calls tj_prod_reg! internally to compute residuals and ω law-of-motion parameters.
  • The criterion is minimized by the optimizer in tj_onestep_estimator.
source
TJProdEst.tj_prodest_estimation!Method
    tj_prodest_estimation!(; data, Setup, Results)

Run the production estimation pipeline for the provided dataset. This high-level helper performs the core estimation steps and populates the Results object with point estimates and (optionally) standard error information.

Keyword arguments

  • data::DataFrame: prepared data to use for estimation (usually the output of jt_data_prep).
  • Setup::Setup: configuration struct describing inputs, options and model form.
  • Results::Results: mutable results container that will be filled by the estimation routines.

Behavior

  • Calls the single-step estimator tj_onestep_estimator to compute point estimates and writes them into Results.
  • If Setup.std_err_estimation is true, calls tj_se_estimation! to compute standard errors (mutates Results).

Side effects

  • This function mutates the Results object in-place. It does not return a new Results instance; it returns nothing implicitly.

Example

# prepare data and setup
results, setup = tj_prod_est(data = df, output = :Y, flexible_input = [:M], fixed_inputs = [:K,:L], flexible_input_price = :Pᴹ, output_price = :Pʸ, ω_lom_degree = 1, time = :year, id = :ID)
# run lower-level estimation directly (results is mutated)
tj_prodest_estimation!(data = prepared_df, Setup = setup, Results = results)
source
TJProdEst.tj_std_error_statsMethod
tj_std_error_stats(data, Setup, Results) -> Nothing

Compute standard errors, t-statistics, p-values, and confidence intervals via bootstrap resampling. Mutates the Results struct in-place with statistical inference results.

Arguments

  • data::DataFrame: Estimation dataset
  • Setup::Setup: Configuration including bootstrap settings (boot_reps, std_err_type)
  • Results::Results: Results struct to update with inference statistics

Side Effects

Populates the following fields in Results:

  • variance: Bootstrap variance estimates
  • std_errors: Standard errors (√variance)
  • t_statistics: t-statistics for hypothesis testing
  • p_values: Two-sided p-values (assuming normality)
  • conf_intervals: 95% confidence intervals (±1.96 × SE)

Notes

  • Currently only supports bootstrap standard errors
  • Uses bootstrap_tj_prodest to generate bootstrap samples
source