Skip to content

Commit fcb0263

Browse files
Lucas Kushnergfyoung
Lucas Kushner
authored andcommitted
DOC, TST: Clarify whitespace behavior in read_fwf documentation (#16950)
Closes gh-16772
1 parent 7b9a57f commit fcb0263

File tree

3 files changed

+41
-7
lines changed

3 files changed

+41
-7
lines changed

Diff for: doc/source/io.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -1258,7 +1258,8 @@ Files with Fixed Width Columns
12581258

12591259
While ``read_csv`` reads delimited data, the :func:`read_fwf` function works
12601260
with data files that have known and fixed column widths. The function parameters
1261-
to ``read_fwf`` are largely the same as `read_csv` with two extra parameters:
1261+
to ``read_fwf`` are largely the same as `read_csv` with two extra parameters, and
1262+
a different usage of the ``delimiter`` parameter:
12621263

12631264
- ``colspecs``: A list of pairs (tuples) giving the extents of the
12641265
fixed-width fields of each line as half-open intervals (i.e., [from, to[ ).
@@ -1267,6 +1268,9 @@ to ``read_fwf`` are largely the same as `read_csv` with two extra parameters:
12671268
behaviour, if not specified, is to infer.
12681269
- ``widths``: A list of field widths which can be used instead of 'colspecs'
12691270
if the intervals are contiguous.
1271+
- ``delimiter``: Characters to consider as filler characters in the fixed-width file.
1272+
Can be used to specify the filler character of the fields
1273+
if it is not spaces (e.g., '~').
12701274

12711275
.. ipython:: python
12721276
:suppress:

Diff for: pandas/io/parsers.py

+7-6
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,6 @@
6363
file. For file URLs, a host is expected. For instance, a local file could
6464
be file ://localhost/path/to/table.csv
6565
%s
66-
delimiter : str, default ``None``
67-
Alternative argument name for sep.
6866
delim_whitespace : boolean, default False
6967
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be
7068
used as the sep. Equivalent to setting ``sep='\s+'``. If this option
@@ -316,7 +314,9 @@
316314
be used automatically. In addition, separators longer than 1 character and
317315
different from ``'\s+'`` will be interpreted as regular expressions and
318316
will also force the use of the Python parsing engine. Note that regex
319-
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``"""
317+
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``
318+
delimiter : str, default ``None``
319+
Alternative argument name for sep."""
320320

321321
_read_csv_doc = """
322322
Read CSV (comma-separated) file into DataFrame
@@ -341,15 +341,16 @@
341341
widths : list of ints. optional
342342
A list of field widths which can be used instead of 'colspecs' if
343343
the intervals are contiguous.
344+
delimiter : str, default ``'\t' + ' '``
345+
Characters to consider as filler characters in the fixed-width file.
346+
Can be used to specify the filler character of the fields
347+
if it is not spaces (e.g., '~').
344348
"""
345349

346350
_read_fwf_doc = """
347351
Read a table of fixed-width formatted lines into DataFrame
348352
349353
%s
350-
351-
Also, 'delimiter' is used to specify the filler character of the
352-
fields if it is not spaces (e.g., '~').
353354
""" % (_parser_params % (_fwf_widths, ''))
354355

355356

Diff for: pandas/tests/io/parser/test_read_fwf.py

+29
Original file line numberDiff line numberDiff line change
@@ -405,3 +405,32 @@ def test_skiprows_inference_empty(self):
405405

406406
with pytest.raises(EmptyDataError):
407407
read_fwf(StringIO(test), skiprows=3)
408+
409+
def test_whitespace_preservation(self):
410+
# Addresses Issue #16772
411+
data_expected = """
412+
a ,bbb
413+
cc,dd """
414+
expected = read_csv(StringIO(data_expected), header=None)
415+
416+
test_data = """
417+
a bbb
418+
ccdd """
419+
result = read_fwf(StringIO(test_data), widths=[3, 3],
420+
header=None, skiprows=[0], delimiter="\n\t")
421+
422+
tm.assert_frame_equal(result, expected)
423+
424+
def test_default_delimiter(self):
425+
data_expected = """
426+
a,bbb
427+
cc,dd"""
428+
expected = read_csv(StringIO(data_expected), header=None)
429+
430+
test_data = """
431+
a \tbbb
432+
cc\tdd """
433+
result = read_fwf(StringIO(test_data), widths=[3, 3],
434+
header=None, skiprows=[0])
435+
436+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)