Skip to content

index.levels not being updated by groupby #2655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Jan 8, 2013 · 6 comments
Closed

index.levels not being updated by groupby #2655

michaelaye opened this issue Jan 8, 2013 · 6 comments
Labels
API Design Groupby Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@michaelaye
Copy link
Contributor

Summary:

Input:

df.D.ix['c1','d1']
t1    0
t2    0
t3    1
t4    1
t5    1
Name: D

Operation:

grouped = df.groupby('D')
for i,j in grouped:
    print 'D:',i
    print 'Actual index[2]:',j.index[0][2]
    print 'First element of levels[2]:',j.index.levels[2][0]

Output:

D: 0.0
Actual index[2]: t1
First element of levels[2]: t1
D: 1.0
Actual index[2]: t3
First element of levels[2]: t1

Details:

http://nbviewer.ipython.org/4482106/

@michaelaye
Copy link
Contributor Author

Workaround is index.get_level_values(2).unique() (I think?), so maybe index.levels is obsolete API?

@wesm
Copy link
Member

wesm commented Jan 19, 2013

I'm not convinced that discarding other reference values for the "categorical variable" by default is correct. kicking this can down the road; I would use the workaround for now when you need to know the actual observed level values in a chunk of the data

@michaelaye
Copy link
Contributor Author

So, the groupby problem in #2770 shows that there is a lurking bug, no? Deleted lines should not appear anymore. Does that mean that the groupby algorithm uses index.levels instead of working with the index or index.get_level_values?

@michaelaye
Copy link
Contributor Author

10 month down the road can-kicking enough?

@jtratner
Copy link
Contributor

Right now, I don't consider this a bug. Can you help me understand why an end user needs to care about what is actually in the levels?

To be clear, if we don't update them, we can share the levels indexes between all the views and copies of this MI, instead of allocating new ndarrays (and hash tables?) for each.

I could see adding a method to allow consolidation of a MultiIndex, but you can get the same thing now by doing:

new_index = MultiIndex.from_tuples(index.values)

@ghost
Copy link

ghost commented Nov 30, 2013

See #2770 (comment). That's not what levels is for.
Not a bug. closing.

Edit: It should be

new_index = MultiIndex.from_tuples(mi.tolist())

Which does an extra copy (or two).

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Groupby Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants