Skip to content

Add quoted information to CSV::FieldInfo #252 #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 8, 2022

Conversation

heronshoes
Copy link
Contributor

@heronshoes heronshoes commented Jul 4, 2022

This is a proposal to fix the issue #252;

  • Add a new member :quoted? to FieldInfo .
  • Introduce an array of booleans @quoted_fields in Parser to collect original column value was quoted?.
  • Check in each of parse_no_quote, parse_quotable_loose and parse_quotable_robust.
    • Use local variable quoted_fields to collect booleans in each method and pass it through #emit_row.
  • Add test of custom converter for preserving quoted value.

@heronshoes heronshoes force-pushed the feat/issue-252 branch 2 times, most recently from 55d6e87 to 344894d Compare July 11, 2022 05:35
@heronshoes
Copy link
Contributor Author

I rebased feat/issue-252 to the latest HEAD.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
(If you want me to write code instead of commenting, please let me know.)

field.to_sym
end
expected = [["serial", :value], ["109", "12"], ["10A", "13"]]
row = CSV.parse(@str, headers: true, header_converters: quoted_header_converter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add CSV.parse(..., headers: '"serial",value', ...) (specifying headers as String) case?

@heronshoes
Copy link
Contributor Author

Thanks! I will check.
(I think this style is better because I can check one by one.)

@heronshoes
Copy link
Contributor Author

I will add test case in:

  • "quoted",unquoted,"quoted-again"
  • Put expected @str in assertion locally
  • CSV.parse(..., headers: '"serial",value', ...) (specifying headers as String) case

Please hold approving.

@heronshoes
Copy link
Contributor Author

I added and fixed except 3rd one:

  • CSV.parse(..., headers: '"serial",value', ...) (specifying headers as String) case

This one requires fix in prepare_header, parse_headers of parser.rb, but I can't fix yet.
It may also require Symbol#encode fix.

Current CSV has a bug in converting when specified header has a Symbol.

CSV.parse("1,2", headers: ["string", :symbol], header_converters: :symbol)

CSV.parse(..., headers: '"serial",value', ...) case requires prior parsing, so same issue will happen.

@kou
Copy link
Member

kou commented Jul 27, 2022

How about this?

diff --git a/lib/csv/parser.rb b/lib/csv/parser.rb
index c3c1eff..cd5a45f 100644
--- a/lib/csv/parser.rb
+++ b/lib/csv/parser.rb
@@ -758,9 +758,10 @@ class CSV
       case headers
       when Array
         @raw_headers = headers
+        quoted_fields = [false] * @raw_headers.size
         @use_headers = true
       when String
-        @raw_headers = parse_headers(headers)
+        @raw_headers, quoted_fields = parse_headers(headers)
         @use_headers = true
       when nil, false
         @raw_headers = nil
@@ -770,20 +771,24 @@ class CSV
         @use_headers = true
       end
       if @raw_headers
-        # Set 'quoted?' info to false to avoid re-converting
-        @headers = adjust_headers(@raw_headers, [true] * @raw_headers.size)
+        @headers = adjust_headers(@raw_headers, quoted_fields)
       else
         @headers = nil
       end
     end
 
     def parse_headers(row)
-      converters = @return_headers ? [] : @header_fields_converter.to_a
-      CSV.parse_line(row,
-                     col_sep:    @column_separator,
-                     row_sep:    @row_separator,
-                     quote_char: @quote_character,
-                     converters: converters)
+      quoted_fields = []
+      converter = lambda do |field, info|
+        quoted_fields << info.quoted?
+        field
+      end
+      headers = CSV.parse_line(row,
+                               col_sep:    @column_separator,
+                               row_sep:    @row_separator,
+                               quote_char: @quote_character,
+                               converters: [converter])
+      [headers, quoted_fields]
     end
 
     def adjust_headers(headers, quoted_fields)

@heronshoes
Copy link
Contributor Author

Thank you!! I reflected this code!

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
It looks almost good! Could you confirm my comments?


sub_test_case("converter with #encode for Symbols") do
def test_encode_symbol
assert_nothing_raised { CSV.parse("1,2", headers: ["string", :symbol] , header_converters: :symbol) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is not related to this pull request.
Could you remove this from this pull request? If this is needed, please open another pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I removed last one and create modified commit.

I will consider more for another pull request about converting when specified header has a Symbol.

@kou kou merged commit 7454669 into ruby:master Aug 8, 2022
@kou
Copy link
Member

kou commented Aug 8, 2022

Thanks!

@heronshoes heronshoes deleted the feat/issue-252 branch August 8, 2022 03:33
peterzhu2118 pushed a commit to Shopify/ruby that referenced this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants