Skip to content

How to avoid calculating the fitness multiple times? #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidoskky opened this issue Feb 9, 2023 · 21 comments
Closed

How to avoid calculating the fitness multiple times? #160

davidoskky opened this issue Feb 9, 2023 · 21 comments
Labels
question Further information is requested

Comments

@davidoskky
Copy link

My computations for the fitness value require considerably more time than reading it from disk. I'd like the algorithm to remember the fitness values in order not to compute them multiple times.
I have enabled the option save_solutions, but that doesn't appear to prevent computing the fitness several times for the same combination.
Is there any way enforce retrieving previously computed fitness values?

@ahmedfgad
Copy link
Owner

Thanks @davidoskky. We can check the issue but it would be more helpful if it is possible to share a code sample that replicates the problem.

@davidoskky
Copy link
Author

Sure, my calculations are quite complex and rely both on external software and external data.
I have quickly thrown in a basic fitness function, but I do realize it is probably too simple.
Nevertheless, this does compute the same solution several times, I have also controlled this by adding a print(solution) in the fitness function and checking if all solutions were unique.

I have kept the input as similar as possible, however I use an initial population in my calculations.

def fitness_function(solution, solution_idx):
    return 1/abs(2500-solution[0] + solution[1]**2 - solution[2]**3 + solution[3]**4)

ga_instance = pygad.GA(num_generations=200,
                       num_parents_mating=50,
                       fitness_func=fitness_function,
                       on_generation=on_generation,
                       sol_per_pop=100,
                       mutation_num_genes=1,
                       num_genes=6,
                       gene_type=int,
                       keep_elitism=10,
                       parallel_processing=["thread", 100],
                       save_solutions=True,
                       allow_duplicate_genes=True,
                       gene_space=list(range(2000)))
ga_instance.run()

print(len(ga_instance.solutions) == len(np.unique(ga_instance.solutions)))

@whubaichuan
Copy link

@davidoskky it seems that each generation the fitness_function will run (100-10) times. I think you can increase the number of keep_elitism.

@davidoskky
Copy link
Author

@whubaichuan I'm not sure I understand what you mean. I don't have a problem with how many times the fitness function runs each generation. I just don't want to compute redundant calculations: I want the fitness for each solution to be stored so that it is only computed once, no matter how many times the solution shows up in the population.

@ahmedfgad
Copy link
Owner

ahmedfgad commented Feb 22, 2023

@davidoskky,

Just to clarify things, there was an issue in the code that causes the fitness function to be called even if the solution was already explored and its fitness was already calculated. The issue is solved and it will be published in the new release soon.

Here are some more details:

First, thanks for the example. I just found an issue with your code in using the np.unique() function.

According to the numpy.unique() function documentation, the axis parameter has a default value of None. When it is None, then the array will be flattened and then calculates the unique values inside it.

If None, ar will be flattened. If an integer, the subarrays indexed by the given axis will be flattened and treated as the elements of a 1-D array with the dimension of the given axis

In our case, we do not want to flatten the array. So, the axis parameter should be given the value 0 to give the correct number of unique solutions.

This is the modified code. I added a new variable called num_fitness_calls to count the number of calls to the fitness function.

import pygad
import numpy as np

num_fitness_calls = 0
def fitness_function(solution, solution_idx):
    global num_fitness_calls
    num_fitness_calls = num_fitness_calls + 1
    return 1/abs(2500-solution[0] + solution[1]**2 - solution[2]**3 + solution[3]**4)

ga_instance = pygad.GA(num_generations=200,
                       num_parents_mating=50,
                       fitness_func=fitness_function,
                       sol_per_pop=100,
                       mutation_num_genes=1,
                       num_genes=6,
                       gene_type=int,
                       keep_elitism=10,
                       parallel_processing=["thread", 100],
                       save_solutions=True,
                       random_seed=1,
                       allow_duplicate_genes=True,
                       gene_space=list(range(2000)))
ga_instance.run()

print(len(ga_instance.solutions) == len(np.unique(ga_instance.solutions)))
print("Total number of solutions", len(ga_instance.solutions))
print("Number of unique solutions", len(np.unique(ga_instance.solutions, axis=0)))
print("Number of calls to fitness_func", num_fitness_calls)

These are the outputs of the print() statements. After fixing the issue, now the number of unique solutions is 18,093 and the number of calls to the fitness function is the same.

False
Total number of solutions: 20100
Number of unique solutions: 18093
Number of calls to fitness_func: 18093

Note that the reason why there is a large number of unique solutions is that you selected a wide range for the gene_space parameter. When I narrow down the range to just 10 numbers (gene_space=list(range(10))) instead of 2,000, these are the print() outputs. There are only 6,879 unique solutions and the fitness function is called only 6,950 times.

False
Total number of solutions: 20100
Number of unique solutions: 6879
Number of calls to fitness_func: 6950

@davidoskky
Copy link
Author

Thank you @ahmedfgad, sorry about the np.unique() mistake: I actually used a different way to test this and when I quickly made up the example it must have slipped my mind to add the axis.

I'll be waiting for the new version release. Thank you very much for your great work!

@whubaichuan
Copy link

@davidoskky Hi, I see. I am also confused here . I think the multiple times of calculation will influence the best_fitness across the generation because I am using the stochastic fitness function. Hope to see the corrected code soon.

@whubaichuan
Copy link

there was an issue in the code that cause the fitness function to be called even if the solution was already explored and its fitness was already calculated.

@ahmedfgad hi, can you share any further information about there was an issue in the code that cause the fitness function to be called even if the solution was already explored and its fitness was already calculated. I just want to know why the fitness function would be called again if the solution was already explored and its fitness was already calculated. Thanks a lot.

@ahmedfgad
Copy link
Owner

@whubaichuan,

A new release will be published soon with the solution.

ahmedfgad added a commit that referenced this issue Feb 22, 2023
PyGAD 2.19.0 Release Notes
1. A new `summary()` method is supported to return a Keras-like summary of the PyGAD lifecycle.
2. A new optional parameter called `fitness_batch_size` is supported to calculate the fitness function in batches. If it is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed. #136.
3. The `cloudpickle` library (https://github.com./cloudpipe/cloudpickle) is used instead of the `pickle` library to pickle the `pygad.GA` objects. This solves the issue of having to redefine the functions (e.g. fitness function). The `cloudpickle` library is added as a dependancy in the `requirements.txt` file. #159
4. Support of assigning methods to these parameters: `fitness_func`, `crossover_type`, `mutation_type`, `parent_selection_type`, `on_start`, `on_fitness`, `on_parents`, `on_crossover`, `on_mutation`, `on_generation`, and `on_stop`. #92 #138
5. Validating the output of the parent selection, crossover, and mutation functions.
6. The built-in parent selection operators return the parent's indices as a NumPy array.
7. The outputs of the parent selection, crossover, and mutation operators must be NumPy arrays.
8. Fix an issue when `allow_duplicate_genes=True`. #39
9. Fix an issue creating scatter plots of the solutions' fitness.
10. Sampling from a `set()` is no longer supported in Python 3.11. Instead, sampling happens from a `list()`. Thanks `Marco Brenna` for pointing to this issue.
11. The lifecycle is updated to reflect that the new population's fitness is calculated at the end of the lifecycle not at the beginning. #154 (comment)
12. There was an issue when `save_solutions=True` that causes the fitness function to be called for solutions already explored and have their fitness pre-calculated. #160
13. A new instance attribute named `last_generation_elitism_indices` added to hold the indices of the selected elitism. This attribute helps to re-use the fitness of the elitism instead of calling the fitness function.
14. Fewer calls to the `best_solution()` method which in turns saves some calls to the fitness function.
15. Some updates in the documentation to give more details about the `cal_pop_fitness()` method. #79 (comment)
@hodgesmr
Copy link

Apologies if I've jumped the gun here. I've just installed the latest from pypi pygad==2.19.1, which looks to be an attempted fix based on the GitHub release notes.

In this version, I believe I'm still seeing redundant calls the the fitness function when save_solutions=True. Below is an example that uses a slow, dummy fitness function. Based on the output below the codeblock, it seems that calls to ga_instance.best_solution() are still repeating calls to fitness_func.

Example code:

import time
from random import random

import pygad
from pygad import torchga
from torch import nn


# Dummy slow fitness function
def fitness_func(solution, solution_idx):
    print("fitness_func()")
    time.sleep(1)
    return random()


def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(
        "Generation                = {generation}".format(
            generation=ga_instance.generations_completed
        )
    )
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(
        "Best Fitness              = {fitness}".format(
            fitness=ga_instance.best_solution()[1]
        )
    )


def on_start(ga_instance):
    print("on_start()")


def on_fitness(ga_instance, population_fitness):
    print("on_fitness()")


def on_parents(ga_instance, selected_parents):
    print("on_parents()")


def on_crossover(ga_instance, offspring_crossover):
    print("on_crossover()")


def on_mutation(ga_instance, offspring_mutation):
    print("on_mutation()")


def on_stop(ga_instance, last_population_fitness):
    print("on_stop()")


if __name__ == "__main__":
    model = nn.Sequential(
        nn.Linear(24, 10),
        nn.ReLU(),
        nn.Linear(10, 3),
        nn.Softmax(0),
    )

    torch_ga = torchga.TorchGA(model=model, num_solutions=5)
    initial_population = torch_ga.population_weights

    ga_instance = pygad.GA(
        num_generations=3,
        num_parents_mating=2,
        fitness_func=fitness_func,
        on_generation=on_generation,
        initial_population=initial_population,
        on_start=on_start,
        on_fitness=on_fitness,
        on_parents=on_parents,
        on_crossover=on_crossover,
        on_mutation=on_mutation,
        on_stop=on_stop,
        save_solutions=True,
        suppress_warnings=True,
    )

    ga_instance.run()

Output:

on_start()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 1
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.9961562151256017
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 2
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7943525195803095
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 3
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7300229556336595
on_stop()

In this example, the value returned by subsequent calls to ga_instance.best_solution()[1] is incorrect.

@ahmedfgad
Copy link
Owner

@hodgesmr,

In the on_generation() function, instead of using this:

def on_generation(ga_instance):
    ...
            fitness=ga_instance.best_solution()[1]
    ...

Please use this:

def on_generation(ga_instance):
    ...
            fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]
    ...

This way the best_solution() method will not call the fitness function.

@hodgesmr
Copy link

hodgesmr commented Feb 22, 2023

@ahmedfgad thanks for the tip! As I understand it, though, this only works if the best-fitting solution persists over generations? For example, if I pass:

ga_instance = pygad.GA(
    ...
    save_solutions=True,
    suppress_warnings=True,
    parent_selection_type="random",
    keep_elitism=0,
    keep_parents=0,
)

I get something like:

on_start()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 1
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.8947813560282615
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 2
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.795799384679602
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 3
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.6983227260812709
on_stop()

I assume this is because ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness) only finds the max from the last generation, which may not be the max from all generations. Is there a way for best_fitness() to find the max from all generations based on the saved solutions?

@ahmedfgad
Copy link
Owner

ahmedfgad commented Feb 22, 2023

@hodgesmr,

In addition to the save_solutions parameter, there is another parameter named save_best_solutions. If you set it to True, then it will save the best solution in each generation in the best_solutions attribute and their fitness values are saved in the best_solutions_fitness attribute.

You can then find the best solution in the on_generation() callback function:

best_solution_idx = numpy.where(ga_instance.best_solutions_fitness == numpy.max(ga_instance.best_solutions_fitness))[0][0]
best_solution = ga_instance.best_solutions[best_solution_idx]

Does this solve your issue?

@hodgesmr
Copy link

hodgesmr commented Feb 22, 2023

Thanks! I was able to find the best fitness value by setting save_best_solutions=True and updating on_generation():

def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(f"Generation      = {ga_instance.generations_completed}")
    
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(f"Best Fitness   = {max(ga_instance.best_solutions_fitness)}")

@ahmedfgad
Copy link
Owner

Thanks! I was able to find the best fitness value by setting save_best_solutions=True and updating on_generation():

def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print("Generation      = {ga_instance.generations_completed}")
    
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(f"Best Fitness   = {max(ga_instance.best_solutions_fitness)}")

Awesome!

@whubaichuan
Copy link

whubaichuan commented Feb 22, 2023

@ahmedfgad @hodgesmr hi, actually, in your code, the keep_elitism = 1 is in default. And the number of calling fitness_function is 4 after the initial generation(which is corresponding to num_solutions=5 -keep_elitism = 1 ), but why the highest value 0.996 is not passed to the second and third generation???

Similar problem here and here .

In the official documents, it says:

keep_elitism=1. It defaults to 1 which means only the best solution in the current generation is kept in the next generation

Apologies if I've jumped the gun here. I've just installed the latest from pypi pygad==2.19.1, which looks to be an attempted fix based on the GitHub release notes.

In this version, I believe I'm still seeing redundant calls the the fitness function when save_solutions=True. Below is an example that uses a slow, dummy fitness function. Based on the output below the codeblock, it seems that calls to ga_instance.best_solution() are still repeating calls to fitness_func.

Example code:

import time
from random import random

import pygad
from pygad import torchga
from torch import nn


# Dummy slow fitness function
def fitness_func(solution, solution_idx):
    print("fitness_func()")
    time.sleep(1)
    return random()


def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(
        "Generation                = {generation}".format(
            generation=ga_instance.generations_completed
        )
    )
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(
        "Best Fitness              = {fitness}".format(
            fitness=ga_instance.best_solution()[1]
        )
    )


def on_start(ga_instance):
    print("on_start()")


def on_fitness(ga_instance, population_fitness):
    print("on_fitness()")


def on_parents(ga_instance, selected_parents):
    print("on_parents()")


def on_crossover(ga_instance, offspring_crossover):
    print("on_crossover()")


def on_mutation(ga_instance, offspring_mutation):
    print("on_mutation()")


def on_stop(ga_instance, last_population_fitness):
    print("on_stop()")


if __name__ == "__main__":
    model = nn.Sequential(
        nn.Linear(24, 10),
        nn.ReLU(),
        nn.Linear(10, 3),
        nn.Softmax(0),
    )

    torch_ga = torchga.TorchGA(model=model, num_solutions=5)
    initial_population = torch_ga.population_weights

    ga_instance = pygad.GA(
        num_generations=3,
        num_parents_mating=2,
        fitness_func=fitness_func,
        on_generation=on_generation,
        initial_population=initial_population,
        on_start=on_start,
        on_fitness=on_fitness,
        on_parents=on_parents,
        on_crossover=on_crossover,
        on_mutation=on_mutation,
        on_stop=on_stop,
        save_solutions=True,
        suppress_warnings=True,
    )

    ga_instance.run()

Output:

on_start()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 1
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.9961562151256017
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 2
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7943525195803095
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 3
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7300229556336595
on_stop()

In this example, the value returned by subsequent calls to ga_instance.best_solution()[1] is incorrect.

@ahmedfgad
Copy link
Owner

@ahmedfgad @hodgesmr hi, actually, in your code, the keep_elitism = 1 is in default. And the number of calling fitness_function is 4 after the initial generation(which is corresponding to num_solutions=5 -keep_elitism = 1 ), but why the highest value 0.996 is not passed to the second and third generation???

Similar problem here and here .

In the official documents, it says:

keep_elitism=1. It defaults to 1 which means only the best solution in the current generation is kept in the next generation

Apologies if I've jumped the gun here. I've just installed the latest from pypi pygad==2.19.1, which looks to be an attempted fix based on the GitHub release notes.
In this version, I believe I'm still seeing redundant calls the the fitness function when save_solutions=True. Below is an example that uses a slow, dummy fitness function. Based on the output below the codeblock, it seems that calls to ga_instance.best_solution() are still repeating calls to fitness_func.

Example code:

import time
from random import random

import pygad
from pygad import torchga
from torch import nn


# Dummy slow fitness function
def fitness_func(solution, solution_idx):
    print("fitness_func()")
    time.sleep(1)
    return random()


def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(
        "Generation                = {generation}".format(
            generation=ga_instance.generations_completed
        )
    )
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(
        "Best Fitness              = {fitness}".format(
            fitness=ga_instance.best_solution()[1]
        )
    )


def on_start(ga_instance):
    print("on_start()")


def on_fitness(ga_instance, population_fitness):
    print("on_fitness()")


def on_parents(ga_instance, selected_parents):
    print("on_parents()")


def on_crossover(ga_instance, offspring_crossover):
    print("on_crossover()")


def on_mutation(ga_instance, offspring_mutation):
    print("on_mutation()")


def on_stop(ga_instance, last_population_fitness):
    print("on_stop()")


if __name__ == "__main__":
    model = nn.Sequential(
        nn.Linear(24, 10),
        nn.ReLU(),
        nn.Linear(10, 3),
        nn.Softmax(0),
    )

    torch_ga = torchga.TorchGA(model=model, num_solutions=5)
    initial_population = torch_ga.population_weights

    ga_instance = pygad.GA(
        num_generations=3,
        num_parents_mating=2,
        fitness_func=fitness_func,
        on_generation=on_generation,
        initial_population=initial_population,
        on_start=on_start,
        on_fitness=on_fitness,
        on_parents=on_parents,
        on_crossover=on_crossover,
        on_mutation=on_mutation,
        on_stop=on_stop,
        save_solutions=True,
        suppress_warnings=True,
    )

    ga_instance.run()

Output:

on_start()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 1
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.9961562151256017
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 2
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7943525195803095
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 3
Printing Best Fitness. The next line should NOT be fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
Best Fitness              = 0.7300229556336595
on_stop()

In this example, the value returned by subsequent calls to ga_instance.best_solution()[1] is incorrect.

@whubaichuan,

In the above example, the fitness value is calculated randomly using random(). At generation X, the best solution might have the fitness 0.996. This is a random value returned by the random() function.

In the next generation and for the same solution, the random() function will generate a different fitness value different from 0.996. This is why the best solution fitness changed.

In the examples you mentioned here and here, there is no complete example to run on my end to replicate the issue. I appreciate if you can share a complete example.

@whubaichuan
Copy link

whubaichuan commented Feb 22, 2023

@ahmedfgad hi, I think I found the reason. For fitness=ga_instance.best_solution()[1] and keep_elitism > 0, the possible max fitness in the repeated fitness_function will not be passed to the next generation. If we use fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] and keep_elitism > 0 , we can pass the best fitness across all the generations.

Thus, as @hodgesmr said below, If we use fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] and keep_elitism > 0, then we only need to monitor the last generation's best fitness, which should be the maximal fitness across all the generations.

@ahmedfgad thanks for the tip! As I understand it, though, this only works if the best-fitting solution persists over generations? For example, if I pass:

ga_instance = pygad.GA(
    ...
    save_solutions=True,
    suppress_warnings=True,
    parent_selection_type="random",
    keep_elitism=0,
    keep_parents=0,
)

I get something like:

on_start()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 1
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.8947813560282615
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 2
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.795799384679602
on_fitness()
on_parents()
on_crossover()
on_mutation()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
fitness_func()
on_generation()
Printing Generation...
Generation                = 3
Printing Best Fitness. The next line should NOT be fitness_func()
Best Fitness              = 0.6983227260812709
on_stop()

I assume this is because ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness) only finds the max from the last generation, which may not be the max from all generations. Is there a way for best_fitness() to find the max from all generations based on the saved solutions?

@whubaichuan
Copy link

@ahmedfgad hi, I find that if I use multiprocessing, e.g., parallel_processing=['process', 2], then the keep_elitism will be invalid even keep_elitism=5, which means no solution will be kept to the next generation. Below is the example code and you can test it. Can you help to solve this?

import time
from random import random

import pygad
from pygad import torchga
from torch import nn


# Dummy slow fitness function
def fitness_func(solution, solution_idx):
    print("fitness_func()")
    time.sleep(1)
    a = random()
    print(a)
    return a


def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(
        "Generation                = {generation}".format(
            generation=ga_instance.generations_completed
        )
    )
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(
        "Best Fitness              = {fitness}".format(
            fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]
        )
    )
    print(ga_instance.last_generation_elitism_indices)
    print(ga_instance.best_solutions_fitness)


def on_start(ga_instance):
    print("on_start()")


def on_fitness(ga_instance, population_fitness):
    print("on_fitness()")
    print(population_fitness)


def on_parents(ga_instance, selected_parents):
    print("on_parents()")


def on_crossover(ga_instance, offspring_crossover):
    print("on_crossover()")


def on_mutation(ga_instance, offspring_mutation):
    print("on_mutation()")


def on_stop(ga_instance, last_population_fitness):
    print("on_stop()")


if __name__ == "__main__":
    model = nn.Sequential(
        nn.Linear(24, 10),
        nn.ReLU(),
        nn.Linear(10, 3),
        nn.Softmax(0),
    )

    torch_ga = torchga.TorchGA(model=model, num_solutions=10)
    initial_population = torch_ga.population_weights

    ga_instance = pygad.GA(
        num_generations=10,
        num_parents_mating=2,
        fitness_func=fitness_func,
        on_generation=on_generation,
        initial_population=initial_population,
        on_start=on_start,
        on_fitness=on_fitness,
        on_parents=on_parents,
        on_crossover=on_crossover,
        on_mutation=on_mutation,
        on_stop=on_stop,
        save_solutions=False,
        suppress_warnings=True,
        parallel_processing=['process', 2],
        keep_elitism=5
    )


    ga_instance.run()
    print(ga_instance.best_solutions_fitness)

ahmedfgad added a commit that referenced this issue Feb 23, 2023
PyGAD 2.19.2 Release Notes
1. Fix an issue when paralell processing was used where the elitism solutions' fitness values are not re-used. #160 (comment)
@ahmedfgad
Copy link
Owner

@ahmedfgad hi, I find that if I use multiprocessing, e.g., parallel_processing=['process', 2], then the keep_elitism will be invalid even keep_elitism=5, which means no solution will be kept to the next generation. Below is the example code and you can test it. Can you help to solve this?

import time
from random import random

import pygad
from pygad import torchga
from torch import nn


# Dummy slow fitness function
def fitness_func(solution, solution_idx):
    print("fitness_func()")
    time.sleep(1)
    a = random()
    print(a)
    return a


def on_generation(ga_instance):
    print("on_generation()")
    print("Printing Generation...")
    print(
        "Generation                = {generation}".format(
            generation=ga_instance.generations_completed
        )
    )
    print("Printing Best Fitness. The next line should NOT be fitness_func()")
    print(
        "Best Fitness              = {fitness}".format(
            fitness=ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]
        )
    )
    print(ga_instance.last_generation_elitism_indices)
    print(ga_instance.best_solutions_fitness)


def on_start(ga_instance):
    print("on_start()")


def on_fitness(ga_instance, population_fitness):
    print("on_fitness()")
    print(population_fitness)


def on_parents(ga_instance, selected_parents):
    print("on_parents()")


def on_crossover(ga_instance, offspring_crossover):
    print("on_crossover()")


def on_mutation(ga_instance, offspring_mutation):
    print("on_mutation()")


def on_stop(ga_instance, last_population_fitness):
    print("on_stop()")


if __name__ == "__main__":
    model = nn.Sequential(
        nn.Linear(24, 10),
        nn.ReLU(),
        nn.Linear(10, 3),
        nn.Softmax(0),
    )

    torch_ga = torchga.TorchGA(model=model, num_solutions=10)
    initial_population = torch_ga.population_weights

    ga_instance = pygad.GA(
        num_generations=10,
        num_parents_mating=2,
        fitness_func=fitness_func,
        on_generation=on_generation,
        initial_population=initial_population,
        on_start=on_start,
        on_fitness=on_fitness,
        on_parents=on_parents,
        on_crossover=on_crossover,
        on_mutation=on_mutation,
        on_stop=on_stop,
        save_solutions=False,
        suppress_warnings=True,
        parallel_processing=['process', 2],
        keep_elitism=5
    )


    ga_instance.run()
    print(ga_instance.best_solutions_fitness)

@whubaichuan,

Thanks! There was a bug and a new release (2.19.2) is published that solves the issue.

Please update PyGAD: pip3 install --upgrade pygad

@ahmedfgad ahmedfgad added the question Further information is requested label Feb 25, 2023
@ahmedfgad
Copy link
Owner

BTW, PyGAD 2.19.0 has a parameter called fitness_batch_size to call the fitness function once for a batch of solutions. Thanks to this issue #136.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants