Skip to content

test_quotechar_unicode on Debian jessie (stable) #14699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yarikoptic opened this issue Nov 20, 2016 · 7 comments
Closed

test_quotechar_unicode on Debian jessie (stable) #14699

yarikoptic opened this issue Nov 20, 2016 · 7 comments
Labels
IO CSV read_csv, to_csv Unicode Unicode strings
Milestone

Comments

@yarikoptic
Copy link
Contributor

Seems to happen only with python3 (passes on python2)

======================================================================
ERROR: test_quotechar_unicode (pandas.io.tests.parser.test_parsers.TestCParserHighMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/quoting.py", line 152, in test_quotechar_unicode
    result = self.read_csv(StringIO(data), quotechar=u('\u0394'))
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/test_parsers.py", line 59, in read_csv
    return read_csv(*args, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 411, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4130)
  File "pandas/parser.pyx", line 588, in pandas.parser.TextReader._set_quoting (pandas/parser.c:6307)
OverflowError: value too large to convert to char

======================================================================
ERROR: test_quotechar_unicode (pandas.io.tests.parser.test_parsers.TestCParserLowMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/quoting.py", line 152, in test_quotechar_unicode
    result = self.read_csv(StringIO(data), quotechar=u('\u0394'))
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/test_parsers.py", line 77, in read_csv
    return read_csv(*args, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 411, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4130)
  File "pandas/parser.pyx", line 588, in pandas.parser.TextReader._set_quoting (pandas/parser.c:6307)
OverflowError: value too large to convert to char

FWIW also happens on ubuntu 15.04 . Passes on later releases.
Advice on where/how to dig would be appreciated

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.4.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-rc2+ machine: x86_64 processor: byteorder: little LC_ALL: C LANG: C LOCALE: None.None

pandas: 0.19.1
nose: 1.3.4
pip: None
setuptools: 20.10.1
Cython: 0.21.1
numpy: 1.8.2
scipy: 0.14.0
statsmodels: None
xarray: None
IPython: None
sphinx: 1.2.3
patsy: None
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.2.1
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None
pandas_datareader: None

@gfyoung
Copy link
Member

gfyoung commented Nov 21, 2016

@yarikoptic : Here are your relevant files:

  1. parser.pyx: this file is where we set the quoting for the CParser
  2. quoting.py: this is file is where you are getting the failing test

I suspect it is a compatibility issue in that the character (which is valid FYI) that we chose for the test cannot be converted to char for some reason or another. The patch I think is just changing the unicode value chosen here so that it doesn't fail for your machine (a lower number most likely).

@jreback jreback added IO CSV read_csv, to_csv Unicode Unicode strings labels Nov 21, 2016
@gfyoung
Copy link
Member

gfyoung commented Dec 8, 2016

@yarikoptic : Do you have any follow-up regarding this? At this point, it's a little difficult for us to patch without confirmation since I can't reproduce myself. @jreback ?

@yarikoptic
Copy link
Contributor Author

I will check tomorrow... Already stepped away from the keyboard. I suspect my locale setting during running tests

@yarikoptic
Copy link
Contributor Author

reproduced... and I think it is a cython issue and requires higher min version... on jessie stock cython is 0.21.1. When I forced to have at least 0.23 (thus use pregenerated by cython 0.23.4 files) -- test seems to pass! I will boost up cython version requirement on my end and see how it changes the situation overall ;)

@gfyoung
Copy link
Member

gfyoung commented Dec 8, 2016

Awesome! I think then we can close this issue so long as everything is working for you then.

@yarikoptic
Copy link
Contributor Author

or we could "fix" it ;) #14831

@gfyoung
Copy link
Member

gfyoung commented Dec 8, 2016

@yarikoptic : Ah, I see. I didn't realize our minimum Cython version was 0.19.1 😄

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 9, 2016
@jreback jreback added this to the 0.20.0 milestone Dec 9, 2016
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 10, 2016
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 11, 2016
ischurov pushed a commit to ischurov/pandas that referenced this issue Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Unicode Unicode strings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants