Skip to content

dataframe.drop(col,axis=1) does not drop column from column.levels in multiindex dataframe #3686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amol-desai opened this issue May 22, 2013 · 8 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@amol-desai
Copy link

I have a multiindex dataframe from which I am dropping columns using df.drop(col,axis=1). Then, I am looking through column.levels[0] and doing some operations on all the columns. However, when I try to do this, pandas looks for the removed column since it is not removed from column.levels. Is this a bug? Is there a WAR?

Here is the df I am working with:

           ^GSPC         PF
        Adj Close  Adj Close
Date                             
2013-04-22    1562.50    4023.45
2013-04-23    1578.78    4099.20
2013-04-24    1578.79    4094.70
2013-04-25    1585.16    4124.25
2013-04-26    1582.24    4211.65
2013-04-29    1593.61    4340.75
2013-04-30    1597.57    4467.55
2013-05-01    1582.70    4432.25
2013-05-02    1597.59    4494.95
2013-05-03    1614.42    4539.55
2013-05-06    1617.50    4645.95
2013-05-07    1625.96    4624.65
2013-05-08    1632.69    4677.40
2013-05-09    1626.67    4637.25
2013-05-10    1633.70    4602.40
2013-05-13    1633.77    4618.60
2013-05-14    1650.34    4510.85
2013-05-15    1658.78    4362.00
2013-05-16    1650.47    4418.95
2013-05-17    1667.47    4406.95
2013-05-20    1666.29    4503.50
2013-05-21    1669.16    4471.20

Here is how I am dropping the columns:

    data = data.drop(stock.ticker,axis=1,level=0)

Here is where the issue is:

    print data.columns
    MultiIndex
    [(^GSPC, Adj Close), (PF, Adj Close)]

    print data.columns.labels
    [array([2, 3]), array([0, 0])]

    print data.columns.levels
    [Index([nvda, aapl, ^GSPC, PF], dtype=object), Index([Adj Close], dtype=object)]

Method used to generate DF as requested in the comment:
tickers is a list of stock ticker strings.
stock is an object that has a ticker property among others.
portfolio is an object that is a collection of stocks.

data = getdata.get_history(tickers,dt.today()-relativedelta(months=months))
data = data.drop(['Open','High','Low','Close','Volume'],axis=1)
data = data.unstack(0).swaplevel(0,1,axis=1).sortlevel(0,axis=1)
data['PF','Adj Close'] = np.zeros(len(data))
for stock in portfolio.getStocksInPortfolio():
  data['PF','Adj Close'] += data[stock.ticker,'Adj Close'] * stock.getSharesOwned()
  data = data.drop(stock.ticker,axis=1,level=0)
@cpcloud
Copy link
Member

cpcloud commented May 22, 2013

not sure what you mean by WAR here.

@cpcloud
Copy link
Member

cpcloud commented May 22, 2013

I can confirm something similar, this might have something to do with the rebinding of data inside the loop.

randint = np.random.randint
columns = MultiIndex.from_tuples([('gspc', 'adj_close'),
                                    ('pf', 'adj_close'),
                                    ('aapl', 'adj_close'),
                                    ('nvda', 'adj_close')])
n = 50
x = np.random.randn(n) * 31.563 + 1621.127
y = np.random.randn(n) * 181.441 + 4442.121
z = np.random.randn(n) * 31.563 + 1621.127
w = np.random.randn(n) * 181.441 + 4442.121
data = DataFrame(np.column_stack((x, y, z, w)), columns=columns)
stocks = {'nvda': randint(100, 1000), 'gspc': randint(100, 1000), 'aapl': randint(100, 1000)}
data['pf', 'adj_close'] = np.zeros(n)
for ticker, nshares in stocks.iteritems():
    data['pf', 'adj_close'] += data[ticker, 'adj_close'] * nshares
    data = data.drop(ticker, axis=1, level=0)

@amol-desai
Copy link
Author

WAR = workaround. Unconsciously threw that in. Sorry.

@michaelaye
Copy link
Contributor

That's a long standing issue: #2770
I still don't understand precisely why Wes does not consider it a bug. Something with not what is observed...
I think either the 'levels' member is to be used for something or to be removed, as this way it is just confusing.

@cpcloud
Copy link
Member

cpcloud commented May 23, 2013

It seems to me that if you are explicitly dropping levels then the index should be recomputed since that's what is being requested. As a corner case storing a huge multi index while only storing a single column would be inefficient.

@ghost
Copy link

ghost commented Nov 30, 2013

re #2770 (comment), this is not a bug (removed label),
That's not what multiindex levels are for.

The comment about wasting memory is relavent, but since the levels should be shared with the
original frame, that's less of a concern. IPython, which most use, is awful about keeping objects
in memory anyway. If you need a consolidate method, open an issue or use jtranter's suggestion.

@amol-desai, using that, you can reconstruct the frame to "squeeze" out unused level factors.
You should be looking through labels rather then levels, though, (depending on what you're tying to do).

@jreback
Copy link
Contributor

jreback commented Feb 26, 2014

closing as not a bug

@jreback jreback closed this as completed Feb 26, 2014
astafan8 added a commit to Dominik-Vogel/Qcodes that referenced this issue May 9, 2019
... in Dataset context manager example notebook.

Based on this pandas-dev/pandas#3686.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

5 participants
@michaelaye @cpcloud @jreback @amol-desai and others