-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 #21957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 #21957
Conversation
def test_count_with_only_nans_in_first_group(self): | ||
# GH21956 | ||
df = DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C': [1, 2]}) | ||
result = df.groupby(['A', 'B']).C.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u do an assert_produces_warning()
here (should not show a warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not produce any warnings for me.
What should look for in travis? Grep "SeriesGroupBy.count"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just look at the warnings in the 3.6 log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will look in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you assert that this produces no warning (e.g. it IS producing a warning now in numpy < 1.13)
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -471,7 +471,7 @@ Groupby/Resample/Rolling | |||
|
|||
- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` with ``as_index=False`` leading to the loss of timezone information (:issue:`15884`) | |||
- Bug in :meth:`DatetimeIndex.resample` when downsampling across a DST boundary (:issue:`8531`) | |||
- | |||
- Bug in :func:`pandas.core.groupby.GroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a note is not necessary this is not user visible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug shown in #21956 is user visible...
pandas/core/groupby/generic.py
Outdated
@@ -1207,7 +1208,10 @@ def count(self): | |||
|
|||
mask = (ids != -1) & ~isna(val) | |||
ids = ensure_platform_int(ids) | |||
out = np.bincount(ids[mask], minlength=ngroups or 0) | |||
minlength = ngroups or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u simplify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the " or 0" part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this a one-liner in the newest update.
2d4ab97
to
d65615b
Compare
Codecov Report
@@ Coverage Diff @@
## master #21957 +/- ##
=========================================
Coverage ? 92.02%
=========================================
Files ? 170
Lines ? 50710
Branches ? 0
=========================================
Hits ? 46664
Misses ? 4046
Partials ? 0
Continue to review full report at Codecov.
|
91343df
to
97ead75
Compare
pandas/core/groupby/generic.py
Outdated
@@ -1207,7 +1208,8 @@ def count(self): | |||
|
|||
mask = (ids != -1) & ~isna(val) | |||
ids = ensure_platform_int(ids) | |||
out = np.bincount(ids[mask], minlength=ngroups or 0) | |||
minlength = ngroups or (None if _np_version_under1p13 else 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant is can you just pass None always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, th doc string says "minlength : int, optional" and it accepts None for numpy >1.13. I'll change it.
doc/source/whatsnew/v0.23.4.txt
Outdated
@@ -31,6 +31,7 @@ Bug Fixes | |||
**Groupby/Resample/Rolling** | |||
|
|||
- Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) | |||
- Bug in :func:`pandas.core.groupby.SeriesGroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there anything user visible atm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as the per the example in #21956:
>>> df = pd.DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C':[1,2]})
>>> df.groupby(['A', 'B']).C.count()
ValueError: minlength must be positive # when numpy <1.13
2b202e0
to
b250bce
Compare
doc/source/whatsnew/v0.23.4.txt
Outdated
@@ -32,6 +32,7 @@ Bug Fixes | |||
|
|||
- Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) | |||
- Bug in ``roll_quantile`` caused a memory leak when calling ``.rolling(...).quantile(q)`` with ``q`` in (0,1) (:issue:`21965`) | |||
- Bug in :func:`pandas.core.groupby.SeriesGroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, is this actually a user facing change? (not against the note), just this is not understandable to user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, new explanation uploaded.
b250bce
to
eea3c76
Compare
def test_count_with_only_nans_in_first_group(self): | ||
# GH21956 | ||
df = DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C': [1, 2]}) | ||
result = df.groupby(['A', 'B']).C.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you assert that this produces no warning (e.g. it IS producing a warning now in numpy < 1.13)
@jreback I got a straight up ValueError in master, so no warning. That ValueError is not in this PR, obviously I've got a feeling I don't understand you. Could you give an example where a warning is emitted? |
eea3c76
to
4e59555
Compare
4e59555
to
5c978e2
Compare
doc/source/whatsnew/v0.23.4.txt
Outdated
@@ -32,6 +32,8 @@ Bug Fixes | |||
|
|||
- Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to 0.24.0, as this wont' backport cleanly.
5c978e2
to
49f30c2
Compare
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 26, 2018 at 12:58 Hours UTC |
49f30c2
to
17ad763
Compare
17ad763
to
32a5fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green.
Something's wrong with travis, else green:
|
thanks @topper-123 |
* master: BENCH: asv csv reading benchmarks no longer read StringIO objects off the end (pandas-dev#21807) BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 (pandas-dev#21224) BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 (pandas-dev#21957) CLN: Vbench to asv conversion script (pandas-dev#22089) consistent docstring (pandas-dev#22066) TST: skip pytables test with not-updated pytables conda package (pandas-dev#22099) CLN: Remove Legacy MultiIndex Index Compatibility (pandas-dev#21740) DOC: Reword doc for filepath_or_buffer in read_csv (pandas-dev#22058) BUG: rolling with MSVC 2017 build (pandas-dev#21813)
so these are the warnings: https://travis-ci.org/pandas-dev/pandas/jobs/410445444
can you PR to fix this. I guess we have to switch on the version and actually pass 0 rather than None for > 113 |
…ust be None for np<1.13 (pandas-dev#21957)
git diff upstream/master -u -- "*.py" | flake8 --diff
See #21956 for details.