Skip to content

[FEATURE] Add Multiprocess Capabilities! :) #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
windowshopr opened this issue Dec 25, 2021 · 7 comments
Closed

[FEATURE] Add Multiprocess Capabilities! :) #78

windowshopr opened this issue Dec 25, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@windowshopr
Copy link

windowshopr commented Dec 25, 2021

I know in the documentation, or on an article I read (can't remember which) it said that PyGAD didn't perform well enough in multiprocessing to warrant adding it as a feature, however I have a GREAT need for it with a lot of my fitness functions that I create using PyGAD. Would be awesome to see it get implemented as another feature before running a GA search.

I envision something like adding a parameter use_multiprocessing = True, and num_workers = multiprocessing.cpu_count(), and if those are enabled, start a process pool for each chromosome in the current population, so each population item gets its own worker. When the generation is done, the pool is closed, and then when the next generation starts, the pool fires up again for the new population. Pseudo-code would look something like:

import concurrent.futures

if use_multiprocessing == True:
    with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
        results = [executor.submit(fitness_func, solution, solution_idx) for solution, solution_idx in current_population]
        for f in concurrent.futures.as_completed(results):
            ind_solution_result = f.result() #[0]
            # Logic for what to do with the individual solution stuff here
        executor.shutdown(wait=True)
else:
    #...the rest of the default PyGAD behaviour

...I recognize this COULD be a big undertaking, but doing it this way would allow the current population of chromosomes/generation to be gone through much quicker than having to wait for a linear progression when more cpu cores are available.

You COULD also create several ga_instance's to run simultaneously yes, but I think being able to get through the generations themselves quicker is a better idea.

Would love to see this get implemented as I love PyGAD and don't really want to switch to DEAP as PyGAD is much easier to control/use IMO.

@Stoops-ML
Copy link

You can parallelise the solutions in each generation as documented in PyGADs documentation here

@windowshopr
Copy link
Author

That’s a great tutorial and I’ve read it before, but that’s pretty specific for PyTorch models/assuming each solution is a new set of model weights, which doesn’t apply at all to what I’m using it for. Would be cool to see that kind of behaviour implemented in the base PyGAD class so it’s a little more extensible?

@windowshopr
Copy link
Author

I took a stab at creating what I needed, untested as of now, but will be checking on it in the next week or so. If it's working, I'll create a PR

@windowshopr
Copy link
Author

See #80

@Stoops-ML
Copy link

That’s a great tutorial and I’ve read it before, but that’s pretty specific for PyTorch models/assuming each solution is a new set of model weights, which doesn’t apply at all to what I’m using it for. Would be cool to see that kind of behaviour implemented in the base PyGAD class so it’s a little more extensible?

The tutorial is not PyTorch specific and can be implemented for PyGAD using one set of model weights. In the tutorial the author overrides the cal_pop_fitness() method so that all solutions within a generation are run in parallel using multiprocessing.Pool.map().

@windowshopr
Copy link
Author

So I've read over the article again, and see what you're saying, however it isn't working on my Windows machine.

I get the freeze_support() error message as the code isn't wrapped in the if __name__ == "__main__":, so I do that, but then get the error:

Traceback (most recent call last):
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\chalu\OneDrive\Desktop\Python_Scripts\Stock_RL_2021\stablebaselines_pygad.py", line 350, in fitness_wrapper
    return fitness_func(solution, 0)
  File "C:\Users\chalu\OneDrive\Desktop\Python_Scripts\Stock_RL_2021\stablebaselines_pygad.py", line 293, in fitness_func
    env = SubprocVecEnv([make_env(env, i) for i in range(num_cpu)])
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 106, in __init__
    process.start()
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\multiprocessing\process.py", line 110, in start
    'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "stablebaselines_pygad.py", line 403, in <module>
    ga_instance.run()
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\site-packages\pygad\pygad.py", line 1251, in run
    self.last_generation_fitness = self.cal_pop_fitness()
  File "stablebaselines_pygad.py", line 358, in cal_pop_fitness
    pop_fitness = pool.map(fitness_wrapper, self.population)
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
AssertionError: daemonic processes are not allowed to have children

I'm assuming this is because I'm using stablebaselines3's SubprocVecEnv's function to create a subprocessed environment, even though I'm only setting the number of CPU's to 1 in that section anyway. But I will keep tweaking/remove that part of the stable baselines and see how I make out. Thanks!

@ahmedfgad
Copy link
Owner

@windowshopr, Supporting parallel processing is indeed a very good feature to be supported internally in PyGAD!

As @Stoops-ML said, the tutorial might be helpful.

Because most of the time the bottleneck is in the fitness function (mutation does not worth parallel processing), this could be internally supported.

Thanks for your suggestions!

ahmedfgad added a commit that referenced this issue Jul 8, 2022
## PyGAD 2.17.0

Release Date: 8 July 2022

1. An issue is solved when the `gene_space` parameter is given a fixed value. e.g. gene_space=[range(5), 4]. The second gene's value is static (4) which causes an exception.
2. Fixed the issue where the `allow_duplicate_genes` parameter did not work when mutation is disabled (i.e. `mutation_type=None`). This is by checking for duplicates after crossover directly. #39
3. Solve an issue in the `tournament_selection()` method as the indices of the selected parents were incorrect. #89
4. Reuse the fitness values of the previously explored solutions rather than recalculating them. This feature only works if `save_solutions=True`.
5. Parallel processing is supported. This is by the introduction of a new parameter named `parallel_processing` in the constructor of the `pygad.GA` class. Thanks to [@windowshopr](https://github.com./windowshopr) for opening the issue [#78](#78) at GitHub. Check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#parallel-processing-in-pygad) section for more information and examples.
ahmedfgad added a commit that referenced this issue Jul 8, 2022
PyGAD 2.17.0
Release Date: 8 July 2022

1. An issue is solved when the `gene_space` parameter is given a fixed value. e.g. gene_space=[range(5), 4]. The second gene's value is static (4) which causes an exception.
2. Fixed the issue where the `allow_duplicate_genes` parameter did not work when mutation is disabled (i.e. `mutation_type=None`). This is by checking for duplicates after crossover directly. #39
3. Solve an issue in the `tournament_selection()` method as the indices of the selected parents were incorrect. #89
4. Reuse the fitness values of the previously explored solutions rather than recalculating them. This feature only works if `save_solutions=True`.
5. Parallel processing is supported. This is by the introduction of a new parameter named `parallel_processing` in the constructor of the `pygad.GA` class. Thanks to [@windowshopr](https://github.com./windowshopr) for opening the issue [#78](#78) at GitHub. Check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#parallel-processing-in-pygad) section for more information and examples.
@ahmedfgad ahmedfgad added the enhancement New feature or request label Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants