-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
pm.Data transforms its contents into floats inappropriately #3493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm assigning this to me, because I think this is fixed per @ferrine 's suggestion, but I need to check. |
Can I give it a try? I think if we want to support different types of variables for |
@jmloyola, yes, feel free to give it a try. As you said, we only need to look at |
That sounds fine, but I think it would be best if |
I never meant to convert integers to floats. That's exactly the issue raised in this discourse thread. What we would want is to cast any float like to floatX and any int like into intX |
The user does not use the function shared_object = theano.shared(pm.model.pandas_to_array(value), name) to shared_object = theano.shared(pm.model.pandas_to_array(value, force_cast=False), name) for example. Same situation with this line. The already existent calls to the function Does this make sense? |
I thought I had swapped the translation to use
I think that may not be a change, but it also may not be a change from The Wrong Thing. Having integers turn into floats seems like a Bad Thing, even if it is the current behavior. I'd prefer to see that particular current behavior change. I can't think of a case where any working code that works with Actually, maybe that didn't change -- I see a bunch of places where I constructed |
In the code base,
I agree with you in this. I will change the return of the function to If you think of a case that this might fail or a model you have run that didn't work previously (or that you had to |
I don't have a model that's easy to pull out of context (it's in the middle of a big pipeline). But my model's inspiration came from one by @junpenglao : Code 10 Schizophrenic case study. This one has variables like |
I understood. I think the model is similar to this one. I checked it and the proposal we have (changing the return to func = pm.model.pandas_to_array
...
generator_output = func(square_generator)
wrapped = generator_output.owner.inputs[0] # IndexError: list index out of range This is related to the fact that this line in the function pandas_to_array returns a variable of type
The same error can happen with the current code of PyMC if in the test we change this line to: # The only difference is the dtype
square_generator = (np.array([i*2.0], dtype=float) for i in range(100)) This happens because the function To summarize, the function pandas_to_array sometimes returns a variable of type I will start a pull request with these changes so more people can chip in. Maybe I am overthinking this 😅. |
Closing as this was taken care of by #3925 |
I just ran into this problem as well. For folks wanting to use a temporary fix, this class will work to BUILD the model and sample, though I assume import theano
class IntData(pm.Data):
def __new__(self, name, value):
if isinstance(value, list):
value = np.array(value)
# Add data container to the named variables of the model.
try:
model = pm.Model.get_context()
except TypeError:
raise TypeError(
"No model on context stack, which is needed to instantiate a data container. "
"Add variable inside a 'with model:' block."
)
name = model.name_for(name)
# `pm.model.pandas_to_array` takes care of parameter `value` and
# transforms it to something digestible for pymc3
_val = pm.model.pandas_to_array(value)
_val = (_val).astype('int64')
shared_object = theano.shared(_val, name)
# To draw the node for this variable in the graphviz Digraph we need
# its shape.
shared_object.dshape = tuple(shared_object.shape.eval())
model.add_random_variable(shared_object)
return shared_object If you want to do something with |
Hi Anatoly, |
Yes, sorry if I didn't make it clear - I'm running 3.8 (can't use a master version), but my temp fix is for folks who need to run "in production" RIGHT NOW and can't use "unstable versions" (or wait until 3.9 release 😉). Thanks for your work! |
Ow ok, thanks for the clarification! |
Whatever the data type that it receives, the
pm.Data
constructor turns it into a float, usingpm.model.pandas_to_array
which, in turn, callspm.floatX
. @ferrine says there's a substitute function,pm.theanof.smartfloatX
that would be appropriate to substitute.I need to clear a couple of existing PRs before I can do anything with this. If anyone else would like to grab it in the meantime, that would be great.
Note that we need to check
pm.set_data()
to see if it does the same thing. I don't understand the other functions inpm.data.py
that deal with DataFrames, etc., so it's possible something lurks in there, as well.The text was updated successfully, but these errors were encountered: