-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Phantom Index when NaN in MultiIndex #5286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls post the creation code when u posts a problem so that it can be copy-pasted to reproduce it makes it much easier to see if their is actually a bug |
pls post it in a format which is easily copy pasted iow, put this in a string and use StringIO to read it in (via csv) |
gr8 pls also edit the question, so its a simple copy paste within ipython. you will get much faster responses this way. its best to make it easy as possible for readers of this thread. no one wants to wade thru many answers/responses just to see what the problem is. |
I had spent some time on trying to get this work (previously, not related to this issue). The basic problem is since the prescent of more than one nan makes the indes non-unique and quite hard to deal with. This should probably throw an error (not actually constructing it, but selecting from it). It IS useful to be able to have the nan's there during reshaping operations), e.g. imagine that you have nans in a column and you set_index, then reset, you'd expect it to work. So'll i'll mark this as a bug for 0.14, but prob just to raise an error. that said their are some specific cases that DO work, e.g.
This raises an error (in theory I can get this to work, but once you add another nan its very problematic)
|
The problem that I am trying to deal with is data that has a ragged hierarchy. By that I mean it is hierarchical data, but not all hierarchies have the same depth. An example can be seen looking at an income statement. |
I would just always fill the index labels with a string to avoid ambiguity. Indexes having NaN are by definition pretty ambiguous. |
It might be easier to work with if you put the data in the Index as columns |
IMHO you are also better off served NOT using a single frame and trying to shove everything into it. Yes it can work, but most often you need specific behavior. I find that creating an object and have an included DataFrame (an has-a) is a very nice idiom for holding data. Just because data comes from a single source does not mean that the storage is the right medium. I often serialize in a very flat way because its more efficient/easier; but that does not mean that that is the most 'natural' way. |
Which raises another question. Let's say I do have the table called d_nan above. How do I replace the NaNs in the index with another value (in this case I would imagine some monotonic int or string would be ideal). |
|
Okay that works. I think I will modify it to use the value in u2 rather than a fixed string. (i.e. just copy it for for missing values). |
As of 0.19.1, this original issue seems to have been fixed on master:
Should an error still be thrown in this case, or should a test be added to confirm this behavior? |
I think this just needs a test. |
Using the following:
If I have the following DataFrame:
and then I try to use
loc
on the index:When I try to
loc
the last level, a mysterious D appears:if d instead looks like (no NaNs):
then I have no problems:
The text was updated successfully, but these errors were encountered: