index.levels not being updated by groupby #2655

michaelaye · 2013-01-08T08:15:50Z

Summary:

Input:

df.D.ix['c1','d1']
t1    0
t2    0
t3    1
t4    1
t5    1
Name: D

Operation:

grouped = df.groupby('D')
for i,j in grouped:
    print 'D:',i
    print 'Actual index[2]:',j.index[0][2]
    print 'First element of levels[2]:',j.index.levels[2][0]

Output:

D: 0.0
Actual index[2]: t1
First element of levels[2]: t1
D: 1.0
Actual index[2]: t3
First element of levels[2]: t1

Details:

http://nbviewer.ipython.org/4482106/

michaelaye · 2013-01-08T10:05:01Z

Workaround is index.get_level_values(2).unique() (I think?), so maybe index.levels is obsolete API?

wesm · 2013-01-19T23:58:13Z

I'm not convinced that discarding other reference values for the "categorical variable" by default is correct. kicking this can down the road; I would use the workaround for now when you need to know the actual observed level values in a chunk of the data

michaelaye · 2013-02-01T03:03:31Z

So, the groupby problem in #2770 shows that there is a lurking bug, no? Deleted lines should not appear anymore. Does that mean that the groupby algorithm uses index.levels instead of working with the index or index.get_level_values?

michaelaye · 2013-11-27T00:27:18Z

10 month down the road can-kicking enough?

jtratner · 2013-11-27T20:47:15Z

Right now, I don't consider this a bug. Can you help me understand why an end user needs to care about what is actually in the levels?

To be clear, if we don't update them, we can share the levels indexes between all the views and copies of this MI, instead of allocating new ndarrays (and hash tables?) for each.

I could see adding a method to allow consolidation of a MultiIndex, but you can get the same thing now by doing:

new_index = MultiIndex.from_tuples(index.values)

ghost · 2013-11-30T12:08:45Z

See #2770 (comment). That's not what levels is for.
Not a bug. closing.

Edit: It should be

new_index = MultiIndex.from_tuples(mi.tolist())

Which does an extra copy (or two).

ghost mentioned this issue Jan 29, 2013

BUG: Indexes still include values that have been deleted #2770

Closed

ghost closed this as completed Nov 30, 2013

ghost mentioned this issue Nov 30, 2013

dataframe.drop(col,axis=1) does not drop column from column.levels in multiindex dataframe #3686

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.levels not being updated by groupby #2655

index.levels not being updated by groupby #2655

michaelaye commented Jan 8, 2013

michaelaye commented Jan 8, 2013

wesm commented Jan 19, 2013

michaelaye commented Feb 1, 2013

michaelaye commented Nov 27, 2013

jtratner commented Nov 27, 2013

ghost commented Nov 30, 2013

index.levels not being updated by groupby #2655

index.levels not being updated by groupby #2655

Comments

michaelaye commented Jan 8, 2013

michaelaye commented Jan 8, 2013

wesm commented Jan 19, 2013

michaelaye commented Feb 1, 2013

michaelaye commented Nov 27, 2013

jtratner commented Nov 27, 2013

ghost commented Nov 30, 2013