API Reference
This comprehensive API reference provides detailed documentation for all public functions, types, and modules in GeneExpressionProgramming.jl. The API is organized by functionality to help you quickly find the components you need.
Core Types
GepRegressor
The main regressor for scalar symbolic regression tasks.
GepRegressor(number_features::Int; kwargs...)Parameters:
number_features::Int: Number of input featurespopulation_size::Int = 1000: Size of the populationgene_count::Int = 2: Number of genes per chromosomehead_len::Int = 7: Head length of each genemax_arity::Int = 2: Maximum arity of functionsfunction_set::Vector{Symbol} = [:+, :-, :*, :/]: Available functionsnumber_of_objectives::Int = 1: Number of objectives (1 for single-objective)considered_dimensions::Dict{Symbol,Vector{Float16}} = Dict(): Physical dimensionsmax_permutations_lib::Int = 1000: Maximum permutations for dimensional analysisrounds::Int = 5: Tree depth for dimensional checking
Fields:
best_models_::Vector: Best evolved modelsfitness_history_: Training history (if available)
Example:
regressor = GepRegressor(3;
population_size=500,
gene_count=3,
head_len=5,
function_set=[:+, :-, :*, :/, :sin, :cos])GepTensorRegressor
Specialized regressor for tensor (vector/matrix) symbolic regression.
GepTensorRegressor(number_features::Int, gene_count::Int, head_len::Int; kwargs...)Parameters:
number_features::Int: Number of input featuresgene_count::Int: Number of genes per chromosomehead_len::Int: Head length of each genefeature_names::Vector{String} = []: Names for features (for interpretability)
Example:
regressor = GepTensorRegressor(5, 2, 3;
feature_names=["x1", "x2", "U1", "U2", "U3"])Core Functions
fit!
Train the GEP regressor on data.
fit!(regressor, epochs::Int, population_size::Int, x_data, y_data; kwargs...)
fit!(regressor, epochs::Int, population_size::Int, loss_function)Parameters:
regressor: GepRegressor or GepTensorRegressor instanceepochs::Int: Number of generations to evolvepopulation_size::Int: Population size for evolutionx_data: Input features (features as rows, samples as columns)y_data: Target valuesloss_function: Custom loss function (for tensor regression or multi objective)
Keyword Arguments:
x_test = nothing: Test features for validationy_test = nothing: Test targets for validationloss_fun::String = "mse": Loss function ("mse", "mae", "rmse")target_dimension = nothing: Target physical dimension
Examples:
# Basic regression
fit!(regressor, 1000, 1000, x_train', y_train; loss_fun="mse")
# With validation data
fit!(regressor, 1000, 1000, x_train', y_train;
x_test=x_test', y_test=y_test, loss_fun="rmse")
# With physical dimensions
fit!(regressor, 1000, 1000, x_train', y_train;
target_dimension=target_dim)
# Tensor regression with custom loss
fit!(regressor, 100, 500, custom_loss_function)Prediction
Make predictions using trained regressor.
(regressor::GepRegressor)(x_data)
(regressor::GepTensorRegressor)(input_data)Parameters:
x_data: Input features (features as rows, samples as columns)input_data: Input data tuple for tensor regression
Returns:
- Predictions as vector (scalar regression) or vector of tensors (tensor regression)
Examples:
# Scalar predictions
predictions = regressor(x_test')
# Tensor predictions
tensor_predictions = tensor_regressor(input_tuple)Utility Functions
Data Utilities
traintestsplit
train_test_split(X, y; test_ratio=0.2, random_state=42)Split data into training and testing sets.
Parameters:
X: Feature matrixy: Target vectortest_ratio::Float64 = 0.2: Proportion of data for testingrandom_state::Int = 42: Random seed
Returns:
(X_train, X_test, y_train, y_test): Split data
Example:
X_train, X_test, y_train, y_test = train_test_split(X, y; test_ratio=0.3)Expression Utilities
printkarvastrings
print_karva_strings(solution)Print the Karva notation representation of an evolved solution.
Parameters:
solution: Evolved solution frombest_models_
Example:
best_solution = regressor.best_models_[1]
print_karva_strings(best_solution)Loss Functions
Built-in Loss Functions
The package provides several built-in loss functions accessible via string names:
"mse" - Mean Squared Error
mse(y_true, y_pred) = mean((y_true .- y_pred).^2)"mae" - Mean Absolute Error
mae(y_true, y_pred) = mean(abs.(y_true .- y_pred))"rmse" - Root Mean Squared Error
rmse(y_true, y_pred) = sqrt(mean((y_true .- y_pred).^2))Custom Loss Functions
For advanced applications, you can define custom loss functions:
Single-Objective Custom Loss
function custom_loss(y_true, y_pred)
# Your custom loss calculation
return loss_value::Float64
end
# Use with fit!
fit!(regressor, epochs, population_size, x_data', y_data; loss_fun=custom_loss)Multi-Objective Custom Loss
@inline function multi_objective_loss(elem, validate::Bool)
if isnan(mean(elem.fitness)) || validate
model = elem.compiled_function
try
y_pred = model(x_data')
# Objective 1: Accuracy
mse = mean((y_true .- y_pred).^2)
# Objective 2: Complexity
complexity = expression_complexity(model)
elem.fitness = (mse, complexity)
catch
elem.fitness = (typemax(Float64), typemax(Float64))
end
end
end
# Use with multi-objective regressor
regressor = GepRegressor(n_features; number_of_objectives=2)
fit!(regressor, epochs, population_size, multi_objective_loss)Tensor Custom Loss
@inline function tensor_loss(elem, validate::Bool)
if isnan(mean(elem.fitness)) || validate
model = elem.compiled_function
try
predictions = model(input_data)
# Calculate tensor-specific loss
total_error = 0.0
for i in 1:length(target_tensors)
error = norm(predictions[i] - target_tensors[i])^2
total_error += error
end
elem.fitness = (total_error / length(target_tensors),)
catch
elem.fitness = (typemax(Float64),)
end
end
endSelection Methods
Tournament Selection
Default selection method that chooses the best individul based on the tournament selections.
Configuration:
regressor = GepRegressor(n_features)NSGA-II Selection
Multi-objective selection using Non-dominated Sorting Genetic Algorithm II.
Configuration:
regressor = GepRegressor(n_features;
number_of_objectives=2)Genetic Operators
Genetic Operators
The package implements several genetic operators. Here the can be adjusted in advance using the dictinary GENE_COMMON_PROBS, which is available after loading the GeneExpressionProgramming.jl
- Point Mutation: Random symbol replacement
- Inversion: Sequence reversal
- IS Transposition: Insertion sequence transposition
- RIS Transposition: Root insertion sequence transposition
Configuration:
using GeneExpressionProgramming
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["mutation_prob"] = 1.0 # Probability for a chromosome of facing a mutation
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["mutation_rate"] = 0.1 # Proportion of the gene beeing changed
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["inversion_prob"] = 0.1 # Setting the prob. for the operation to take place
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["reverse_insertion_tail"] = 0.1 # Setting IS
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["reverse_insertion"] = 0.1 # Setting RIS
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["gene_transposition"] = 0.0 # Setting Transposition
Crossover Operators
Available crossover operators: Similar to the gene
- One-Point Crossover: Single crossover point
- Two-Point Crossover: Two crossover points
Configuration:
using GeneExpressionProgramming
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["one_point_cross_over_prob"] = 0.5 # Setting the one-point crossover
GeneExpressionProgramming.RegressionWrapper.GENE_COMMON_PROBS["two_point_cross_over_prob"] = 0.3 # Setting the two-point crossoverFunction Sets
Basic Arithmetic
basic_functions = [:+, :-, :*, :/]Extended Mathematical Functions
extended_functions = [:+, :-, :*, :/, :sin, :cos, :tan, :exp, :log, :sqrt, :abs]Power Functions
power_functions = [:^, :sqrt]Trigonometric Functions
trig_functions = [:sin, :cos, :tan, :asin, :acos, :atan, :sinh, :cosh, :tanh]Physical Dimensionality
Dimension Representation
Physical dimensions are represented as 7-element vectors corresponding to SI base units:
# [Mass, Length, Time, Temperature, Current, Amount, Luminosity]
velocity_dim = Float16[0, 1, -1, 0, 0, 0, 0] # [L T⁻¹]
force_dim = Float16[1, 1, -2, 0, 0, 0, 0] # [M L T⁻²]
energy_dim = Float16[1, 2, -2, 0, 0, 0, 0] # [M L² T⁻²]Dimensional Constraints
feature_dims = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[1, 0, 0, 0, 0, 0, 0], # Mass
:x2 => Float16[0, 1, 0, 0, 0, 0, 0], # Length
:x3 => Float16[0, 0, 1, 0, 0, 0, 0], # Time
)
target_dim = Float16[0, 1, -1, 0, 0, 0, 0] # Velocity
regressor = GepRegressor(3;
considered_dimensions=feature_dims,
max_permutations_lib=10000)
fit!(regressor, epochs, population_size, x_data', y_data;
target_dimension=target_dim)Tensor Operations (under constructions)
Supported Tensor Types
The tensor regression module supports various tensor types through Tensors.jl:
using Tensors
# Vectors (rank-1 tensors)
vector_3d = rand(Tensor{1,3})
# Matrices (rank-2 tensors)
matrix_2x2 = rand(Tensor{2,2})
# Higher-order tensors
tensor_3x3x3 = rand(Tensor{3,3})Tensor Operations
Available tensor operations include:
- Element-wise operations:
+,-,*,/ - Tensor products:
⊗(outer product) - Contractions:
⋅(dot product),⊡(double contraction) - Norms:
norm(),tr()(trace) - Decompositions:
eigen(),svd()
Error Handling
Common Error
ArgumentError: collection must be non-empty
Thrown when the argument vector for the selection process is empty. This happens, when all the loss returns Inf for all fit values.
Performance Tuning
Memory Management
# Monitor memory usage
using Profile
@profile fit!(regressor, epochs, population_size, x_data', y_data)
Profile.print()
# Force garbage collection
GC.gc()Configuration Examples
Basic Configuration
regressor = GepRegressor(3)
fit!(regressor, 1000, 1000, x_data', y_data)Advanced Configuration
regressor = GepRegressor(
5; # 5 input features
population_size = 2000, # Large population
gene_count = 3, # 3 genes per chromosome
head_len = 8, # Longer expressions
function_set = [:+, :-, :*, :/, :sin, :cos, :exp]
)
fit!(regressor, 1500, 2000, x_train', y_train;
x_test = x_test',
y_test = y_test,
loss_fun = "rmse")Multi-Objective Configuration
regressor = GepRegressor(
3;
number_of_objectives = 2,
population_size = 1500,
gene_count = 2,
head_len = 6
)
fit!(regressor, 1000, 1500, loss_function=multi_objective_loss)Physical Dimensionality Configuration
feature_dims = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[1, 0, 0, 0, 0, 0, 0], # Mass
:x2 => Float16[0, 1, 0, 0, 0, 0, 0], # Length
:x3 => Float16[0, 0, 1, 0, 0, 0, 0], # Time
)
regressor = GepRegressor(
3;
considered_dimensions = feature_dims,
max_permutations_lib = 15000,
rounds = 8
)
target_dim = Float16[1, 1, -2, 0, 0, 0, 0] # Force
fit!(regressor, 1200, 1200, x_data', y_data;
target_dimension = target_dim)Tensor Regression Configuration
regressor = GepTensorRegressor(
5, # 5 features
3, # 3 genes
4; # Head length 4
feature_names = ["scalar1", "scalar2", "vector1", "vector2", "matrix1"]
)
fit!(regressor, 150, 800, tensor_loss_function)Version Information
# Get package version
using Pkg
Pkg.status("GeneExpressionProgramming")
# Check for updates
Pkg.update("GeneExpressionProgramming")Debugging and Diagnostics
Verbose Output
# Enable verbose output during training
fit!(regressor, epochs, population_size, x_data', y_data; verbose=true)Fitness History
# Access fitness evolution
if hasfield(typeof(regressor), :fitness_history_)
history = regressor.fitness_history_
plot(history.train_loss)
endExpression Analysis
# Analyze best expressions
for (i, model) in enumerate(regressor.best_models_)
println("Model $i: $(model.compiled_function)")
println("Fitness: $(model.fitness)")
println("Complexity: $(expression_complexity(model))")
endThis API reference provides comprehensive coverage of all public interfaces in GeneExpressionProgramming.jl. For additional examples and use cases, refer to the Examples.
For the most up-to-date API documentation, always refer to the package source code and docstrings.