-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fixed Dirichlet.random returning output of wrong shapes #4416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The single failing test seems to be more of a broadcasting issue than a pre-pending one. Basically a case in which initially the |
I am on current master 1769258 and >>> import numpy as np
>>> import pymc3 as pm
>>> pm.Dirichlet.dist(a=np.ones((2))).random(size=(2)).shape # expected (2, 2)
(2, 1)
>>> pm.Dirichlet.dist(a=np.ones((2, 2))).random(size=(2, 100)).shape # expected (2, 100, 2, 2)
(2, 2)
>>> pm.Dirichlet.dist(a=np.ones((3, 4))).random(size=(3, 4)).shape # expected (3, 4, 3, 4)
(3, 4, 1, 1)
>>> pm.Dirichlet.dist(a=np.ones((3, 4))).random(size=(3, 4, 100)).shape # expected (3, 4, 100, 3, 4)
(3, 4)
>>> pm.Dirichlet.dist(a=np.ones((3, 4))).random(size=(100)).shape # expected (100, 3, 4)
(3, 4)
>>> pm.Dirichlet.dist(a=np.ones((3, 4))).random(size=1).shape # expected (1, 3, 4)
(3, 4) I suspect shape handling in One possible solution I see, is to broadcast the parameter |
Thanks for the deep dive guys! I'm pinging @michaelosthege and @ricardoV94, because they recently worked on shapes or the Multinomial distribution recently, so they might have good ideas 😉 |
Ah, thanks for the pointer. I'll have a look over there too. |
Actually, I think @bsmith89 might be more informed than me. |
@Sayam753 You were correct. The problem was in Should any tests be added along with this fix ? |
Yes. You can add tests in test_distributions_random.py similar to how Normal distribution is tested here. |
Maybe we can do more efficient sampling by a vectorised implementation. Have a look at this stackoverflow discussion - https://stackoverflow.com/a/15917312/10275861 |
Codecov Report
@@ Coverage Diff @@
## master #4416 +/- ##
=======================================
Coverage 88.00% 88.01%
=======================================
Files 88 88
Lines 14342 14343 +1
=======================================
+ Hits 12622 12624 +2
+ Misses 1720 1719 -1
|
The failing tests seems to be same as reported in #4323 (Hence closing and reopening to restart tests) Also is this the right way to add these tests ? (I'm not familiar with the classes implemented over there.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kc611
Sorry for the delay in reviewing the PR. Have you looked upon the vectorized solution described in https://stackoverflow.com/a/15917312/10275861 and see if we can integrate this?
Do you mean to replace the |
That's a good idea. Even we can avoid using heavy
All the loops and complex shape handling with |
I am more inclined to directly implemment this in
|
That would be even better. Go ahead with this approach. |
@Sayam753 I think this is what you meant to do right ? |
Yes. But, did using |
No, it didn't. This #4416 (comment) is actually a working piece of code . But you mentioned over here #4416 (comment) about the possibility of getting rid of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code definitely looks cleaner and readable 🤩 . Minor nitpicks below -
Finishing up on this PR, there needs to be a mention in RELEASE-NOTES.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! And the test failure is some installation problem - totally unrelated to these changes.
That sounds great. Thanks for tuning in @michaelosthege |
Link to Issue: #4060
A simple fix to the shape broadcasting logic in
generate_samples()
. The issue was mostly fixed in #4061 , leaving out a particular case when the shape tuple equals the size tuple. The issue in this case was here :https://github.com./pymc-devs/pymc3/blob/1769258e459e8f40aa8a56e0ac911aa99e7f67de/pymc3/distributions/distribution.py#L1058-L1066
Since both the tuples were equal this particular piece of logic thought that the shape is prepended to the size tuple. Adding a simple check which necessitates that length of
dist_shape
should be greater thansize_tup
fixes the issue.I think this fix along with #4061 completely resolves #4060