Genetic Algorithm for Protein Allocation Model Parameter Estimation

This subdirectory contains the framework for using a genetic algorithm to optimize the parametrization of Protein Allocation Models (PAMs). For more information about the terminology and principles of genetic algorithms for metabolic optimization refer to this paper.
The DEAP toolbox (DEAP documentation) is used for basic genetic algorithm functionalities. This framework in based on the Genetic-algorithm-suite for optimization problems in metabolic models, with as initial usecase the computation of a minimal set of metabolic genes preserving certain core metabolic functionalities.

Software Structure

The genetic algorithm in this module is build up out of 3 main parts:

  1. core_parametrizer: the object which connects all other parts of the genetic algorithm.

  2. ga_params: the actual genetic algorithm functionality

  3. Evaluation/Fitfun_params: all the functions to handle a specific use cases of the genetic algorithm. It allows for a tailored fitness function, mutation function, individual format and storage of results

When the core_parametrizer is called, it will initialize both a genetic algorithm instance and the fitness function. The core object will initialize the DEAP toolbox and configure the individual. When the start function is called, the core will initialize the genetic algorithm with a population, optionally in a multithreaded manner. The core will also start the genetic algorithm function in the genetic algorithm object and parse the results.

The ga_params file contains the Genetic_Algorithm (genetic algorithm for parameter optimization) object. The GAPO’s main function is called by the core. In the main function all the steps which are involved in the execution of the genetic algorithm are listed. This includes selection of the elite, crossover, mutation and evaluation of the individuals in a population.

In Evaluation/Fitfun_params the manually FitnessEvaluation class can be found. These class contains all the functions which are required to ‘personalize’ the genetic algorithm for a specific usecase. Here there are functions for initialization of several attributes, parsing of the properties which needs to be written to a file, mutating a kcat value, and importantly, to calculate the fitness of an individual.

Genetic Algorithm workflow

The main function in the Genetic_Algorithm object follows the following steps (in pseudocode)

while (g < self.number_generations) and ((time()-start_time) < self.time_limit):

  1. Elitism:

  • Select the best individual in a population

  • Make a copy of the elite to retain the best solution

  1. Selection:

  • Randomly draw 3 individuals from the population

  • Select best individual of the 3 individuals

  • Replace 2 worse individual by copies of the best individual

  1. Crossover:

  • Cross the lists of kcat values of two individuals with predefined crossover probability

  1. Mutation:

  • Randomly draw a new kcat value from a distribution based on a mutation probability

  • If the model is likely to become infeasible, consider to decrease the mutation probability

  1. Fitness calculation:

  • Calculate fitness for each individual which underwent some change

  • Replace old population with new population

Fitness Evaluation adjustment

The path to a FitnessEvaluation object is one of the inputs for running the genetic algorithm. This allows the user to put in a personalized fitness function, kcat mutation method and to change the attributes which are saved. This repository already contains two: one FitnessEvaluation which mutates the kcat values using gaussian sampling given a specific standard deviation and one which uses uniform sampling, sampling from 0 to 2*kcat, whith the diffusion limit (1e6) providing an upper bound for the maximal kcat value. Please note that the genetic algorithm modifies the kcat<\sub> by changing the corresponding coefficient in the constraint relating the enzyme concentration to the reaction flux, without using the utilities from PAModelpy. The kcat values are thus saved in the genetic algorithm as model coefficients (1/kcat, with kcat in 1/h).

By default the genetic algorithm uses uniform sampling, as this has been proven to yield the best results.