-
-
Notifications
You must be signed in to change notification settings - Fork 481
How to avoid calculating the fitness multiple times? #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @davidoskky. We can check the issue but it would be more helpful if it is possible to share a code sample that replicates the problem. |
Sure, my calculations are quite complex and rely both on external software and external data. I have kept the input as similar as possible, however I use an initial population in my calculations.
|
@davidoskky it seems that each generation the fitness_function will run (100-10) times. I think you can increase the number of keep_elitism. |
@whubaichuan I'm not sure I understand what you mean. I don't have a problem with how many times the fitness function runs each generation. I just don't want to compute redundant calculations: I want the fitness for each solution to be stored so that it is only computed once, no matter how many times the solution shows up in the population. |
Just to clarify things, there was an issue in the code that causes the fitness function to be called even if the solution was already explored and its fitness was already calculated. The issue is solved and it will be published in the new release soon. Here are some more details: First, thanks for the example. I just found an issue with your code in using the According to the
In our case, we do not want to flatten the array. So, the This is the modified code. I added a new variable called import pygad
import numpy as np
num_fitness_calls = 0
def fitness_function(solution, solution_idx):
global num_fitness_calls
num_fitness_calls = num_fitness_calls + 1
return 1/abs(2500-solution[0] + solution[1]**2 - solution[2]**3 + solution[3]**4)
ga_instance = pygad.GA(num_generations=200,
num_parents_mating=50,
fitness_func=fitness_function,
sol_per_pop=100,
mutation_num_genes=1,
num_genes=6,
gene_type=int,
keep_elitism=10,
parallel_processing=["thread", 100],
save_solutions=True,
random_seed=1,
allow_duplicate_genes=True,
gene_space=list(range(2000)))
ga_instance.run()
print(len(ga_instance.solutions) == len(np.unique(ga_instance.solutions)))
print("Total number of solutions", len(ga_instance.solutions))
print("Number of unique solutions", len(np.unique(ga_instance.solutions, axis=0)))
print("Number of calls to fitness_func", num_fitness_calls) These are the outputs of the
Note that the reason why there is a large number of unique solutions is that you selected a wide range for the
|
Thank you @ahmedfgad, sorry about the I'll be waiting for the new version release. Thank you very much for your great work! |
@davidoskky Hi, I see. I am also confused here . I think the multiple times of calculation will influence the best_fitness across the generation because I am using the stochastic fitness function. Hope to see the corrected code soon. |
@ahmedfgad hi, can you share any further information about |
A new release will be published soon with the solution. |
PyGAD 2.19.0 Release Notes 1. A new `summary()` method is supported to return a Keras-like summary of the PyGAD lifecycle. 2. A new optional parameter called `fitness_batch_size` is supported to calculate the fitness function in batches. If it is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed. #136. 3. The `cloudpickle` library (https://github.com./cloudpipe/cloudpickle) is used instead of the `pickle` library to pickle the `pygad.GA` objects. This solves the issue of having to redefine the functions (e.g. fitness function). The `cloudpickle` library is added as a dependancy in the `requirements.txt` file. #159 4. Support of assigning methods to these parameters: `fitness_func`, `crossover_type`, `mutation_type`, `parent_selection_type`, `on_start`, `on_fitness`, `on_parents`, `on_crossover`, `on_mutation`, `on_generation`, and `on_stop`. #92 #138 5. Validating the output of the parent selection, crossover, and mutation functions. 6. The built-in parent selection operators return the parent's indices as a NumPy array. 7. The outputs of the parent selection, crossover, and mutation operators must be NumPy arrays. 8. Fix an issue when `allow_duplicate_genes=True`. #39 9. Fix an issue creating scatter plots of the solutions' fitness. 10. Sampling from a `set()` is no longer supported in Python 3.11. Instead, sampling happens from a `list()`. Thanks `Marco Brenna` for pointing to this issue. 11. The lifecycle is updated to reflect that the new population's fitness is calculated at the end of the lifecycle not at the beginning. #154 (comment) 12. There was an issue when `save_solutions=True` that causes the fitness function to be called for solutions already explored and have their fitness pre-calculated. #160 13. A new instance attribute named `last_generation_elitism_indices` added to hold the indices of the selected elitism. This attribute helps to re-use the fitness of the elitism instead of calling the fitness function. 14. Fewer calls to the `best_solution()` method which in turns saves some calls to the fitness function. 15. Some updates in the documentation to give more details about the `cal_pop_fitness()` method. #79 (comment)
Apologies if I've jumped the gun here. I've just installed the latest from pypi In this version, I believe I'm still seeing redundant calls the the fitness function when Example code:import time
from random import random
import pygad
from pygad import torchga
from torch import nn
# Dummy slow fitness function
def fitness_func(solution, solution_idx):
print("fitness_func()")
time.sleep(1)
return random()
def on_generation(ga_instance):
print("on_generation()")
print("Printing Generation...")
print(
"Generation = {generation}".format(
generation=ga_instance.generations_completed
)
)
print("Printing Best Fitness. The next line should NOT be fitness_func()")
print(
"Best Fitness = {fitness}".format(
fitness=ga_instance.best_solution()[1]
)
)
def on_start(ga_instance):
print("on_start()")
def on_fitness(ga_instance, population_fitness):
print("on_fitness()")
def on_parents(ga_instance, selected_parents):
print("on_parents()")
def on_crossover(ga_instance, offspring_crossover):
print("on_crossover()")
def on_mutation(ga_instance, offspring_mutation):
print("on_mutation()")
def on_stop(ga_instance, last_population_fitness):
print("on_stop()")
if __name__ == "__main__":
model = nn.Sequential(
nn.Linear(24, 10),
nn.ReLU(),
nn.Linear(10, 3),
nn.Softmax(0),
)
torch_ga = torchga.TorchGA(model=model, num_solutions=5)
initial_population = torch_ga.population_weights
ga_instance = pygad.GA(
num_generations=3,
num_parents_mating=2,
fitness_func=fitness_func,
on_generation=on_generation,
initial_population=initial_population,
on_start=on_start,
on_fitness=on_fitness,
on_parents=on_parents,
on_crossover=on_crossover,
on_mutation=on_mutation,
on_stop=on_stop,
save_solutions=True,
suppress_warnings=True,
)
ga_instance.run() Output:
In this example, the value returned by subsequent calls to |
In the def on_generation(ga_instance):
...
fitness=ga_instance.best_solution()[1]
... Please use this: def on_generation(ga_instance):
...
fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]
... This way the |
@ahmedfgad thanks for the tip! As I understand it, though, this only works if the best-fitting solution persists over generations? For example, if I pass: ga_instance = pygad.GA(
...
save_solutions=True,
suppress_warnings=True,
parent_selection_type="random",
keep_elitism=0,
keep_parents=0,
) I get something like:
I assume this is because |
In addition to the You can then find the best solution in the best_solution_idx = numpy.where(ga_instance.best_solutions_fitness == numpy.max(ga_instance.best_solutions_fitness))[0][0]
best_solution = ga_instance.best_solutions[best_solution_idx] Does this solve your issue? |
Thanks! I was able to find the best fitness value by setting def on_generation(ga_instance):
print("on_generation()")
print("Printing Generation...")
print(f"Generation = {ga_instance.generations_completed}")
print("Printing Best Fitness. The next line should NOT be fitness_func()")
print(f"Best Fitness = {max(ga_instance.best_solutions_fitness)}") |
Awesome! |
@ahmedfgad @hodgesmr hi, actually, in your code, the Similar problem here and here . In the official documents, it says:
|
In the above example, the fitness value is calculated randomly using In the next generation and for the same solution, the In the examples you mentioned here and here, there is no complete example to run on my end to replicate the issue. I appreciate if you can share a complete example. |
@ahmedfgad hi, I think I found the reason. For Thus, as @hodgesmr said below, If we use
|
@ahmedfgad hi, I find that if I use multiprocessing, e.g.,
|
PyGAD 2.19.2 Release Notes 1. Fix an issue when paralell processing was used where the elitism solutions' fitness values are not re-used. #160 (comment)
Thanks! There was a bug and a new release (2.19.2) is published that solves the issue. Please update PyGAD: |
BTW, PyGAD 2.19.0 has a parameter called |
My computations for the fitness value require considerably more time than reading it from disk. I'd like the algorithm to remember the fitness values in order not to compute them multiple times.
I have enabled the option save_solutions, but that doesn't appear to prevent computing the fitness several times for the same combination.
Is there any way enforce retrieving previously computed fitness values?
The text was updated successfully, but these errors were encountered: