test_quotechar_unicode on Debian jessie (stable) #14699

yarikoptic · 2016-11-20T03:48:06Z

Seems to happen only with python3 (passes on python2)

======================================================================
ERROR: test_quotechar_unicode (pandas.io.tests.parser.test_parsers.TestCParserHighMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/quoting.py", line 152, in test_quotechar_unicode
    result = self.read_csv(StringIO(data), quotechar=u('\u0394'))
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/test_parsers.py", line 59, in read_csv
    return read_csv(*args, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 411, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4130)
  File "pandas/parser.pyx", line 588, in pandas.parser.TextReader._set_quoting (pandas/parser.c:6307)
OverflowError: value too large to convert to char

======================================================================
ERROR: test_quotechar_unicode (pandas.io.tests.parser.test_parsers.TestCParserLowMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/quoting.py", line 152, in test_quotechar_unicode
    result = self.read_csv(StringIO(data), quotechar=u('\u0394'))
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/parser/test_parsers.py", line 77, in read_csv
    return read_csv(*args, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/build/pandas-0.19.1/debian/tmp/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 411, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4130)
  File "pandas/parser.pyx", line 588, in pandas.parser.TextReader._set_quoting (pandas/parser.c:6307)
OverflowError: value too large to convert to char

FWIW also happens on ubuntu 15.04 . Passes on later releases.
Advice on where/how to dig would be appreciated

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.4.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-rc2+ machine: x86_64 processor: byteorder: little LC_ALL: C LANG: C LOCALE: None.None

pandas: 0.19.1
nose: 1.3.4
pip: None
setuptools: 20.10.1
Cython: 0.21.1
numpy: 1.8.2
scipy: 0.14.0
statsmodels: None
xarray: None
IPython: None
sphinx: 1.2.3
patsy: None
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.2.1
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2016-11-21T02:34:42Z

@yarikoptic : Here are your relevant files:

parser.pyx: this file is where we set the quoting for the CParser
quoting.py: this is file is where you are getting the failing test

I suspect it is a compatibility issue in that the character (which is valid FYI) that we chose for the test cannot be converted to char for some reason or another. The patch I think is just changing the unicode value chosen here so that it doesn't fail for your machine (a lower number most likely).

gfyoung · 2016-12-08T04:36:50Z

@yarikoptic : Do you have any follow-up regarding this? At this point, it's a little difficult for us to patch without confirmation since I can't reproduce myself. @jreback ?

yarikoptic · 2016-12-08T04:43:05Z

I will check tomorrow... Already stepped away from the keyboard. I suspect my locale setting during running tests

yarikoptic · 2016-12-08T16:10:08Z

reproduced... and I think it is a cython issue and requires higher min version... on jessie stock cython is 0.21.1. When I forced to have at least 0.23 (thus use pregenerated by cython 0.23.4 files) -- test seems to pass! I will boost up cython version requirement on my end and see how it changes the situation overall ;)

gfyoung · 2016-12-08T17:38:59Z

Awesome! I think then we can close this issue so long as everything is working for you then.

yarikoptic · 2016-12-08T17:57:00Z

or we could "fix" it ;) #14831

gfyoung · 2016-12-08T18:10:50Z

@yarikoptic : Ah, I see. I didn't realize our minimum Cython version was 0.19.1 😄

…esolve pandas-dev#14699 on jessie)

closes pandas-dev#14699 closes pandas-dev#14831 closes pandas-dev#14508

jreback added IO CSV read_csv, to_csv Unicode Unicode strings labels Nov 21, 2016

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 8, 2016

BF: boost min cython to 0.23 (Closes pandas-dev#14699)

86f438a

yarikoptic mentioned this issue Dec 8, 2016

BF: boost min cython to 0.23 #14831

Closed

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 9, 2016

BF: boost min cython to 0.23 (Closes pandas-dev#14699)

1889b83

jreback added this to the 0.20.0 milestone Dec 9, 2016

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 10, 2016

Require cython >= 0.23 or otherwise use pre-cythoned sources (should r…

1821854

…esolve pandas-dev#14699 on jessie)

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 11, 2016

BF: boost min cython to 0.23 (Closes pandas-dev#14699)

b92b878

jreback closed this as completed in 14e4815 Dec 12, 2016

ischurov pushed a commit to ischurov/pandas that referenced this issue Dec 19, 2016

BF: boost min cython to 0.23

4dedd42

closes pandas-dev#14699 closes pandas-dev#14831 closes pandas-dev#14508

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_quotechar_unicode on Debian jessie (stable) #14699

test_quotechar_unicode on Debian jessie (stable) #14699

yarikoptic commented Nov 20, 2016

gfyoung commented Nov 21, 2016

gfyoung commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

gfyoung commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

gfyoung commented Dec 8, 2016

test_quotechar_unicode on Debian jessie (stable) #14699

test_quotechar_unicode on Debian jessie (stable) #14699

Comments

yarikoptic commented Nov 20, 2016

Output of pd.show_versions()

gfyoung commented Nov 21, 2016

gfyoung commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

gfyoung commented Dec 8, 2016

yarikoptic commented Dec 8, 2016

gfyoung commented Dec 8, 2016

Output of `pd.show_versions()`