-
Notifications
You must be signed in to change notification settings - Fork 11
Collapse space when implying value
from textContents
#51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Implementing this change causes several failures with the microformats test suite (https://github.com./microformats/tests/tree/master/tests/microformats-v2). These tests also fail using the PHP parser. I would like to use the microformats test suite as a "source of truth" for how a parser should work. I don't think this is a bug, but a behaviour change. It could be implemented behind an experimental toggle? To change this parsing behaviour without an experimental toggle, this will either need to change the parsing specification or the test suite. |
The PHP parser has an open PR to use the test suite, I would be interested in what changes are being made with the way they parse these test scenarios, if they do? |
I didn't realize that was the case. That's interesting considering that this behavior seems to be somewhat common.
Definitely makes sense.
I've gone ahead and created a quick overview of how it's implemented across some projects. I think the pattern there is quite interesting.
Given that it's the default behavior in some parsers and it's arguably useful, I think we should implement it (behind a feature flag, for now). Also, I think there's definitely a discussion to be had about how this fits in with the official specification. |
Background: we found that user expectations did not really match the parsing spec in all cases (e.g. when it comes to consequtive whitespace). This is being discussed as a spec issue. PHP and Python both implement a version of an algorithm I wrote out. I am saying “a version of” as I am not actually completely sure on the details anymore and would not want to claim they match completely. (For even more complexity, there has been a try to find out what is needed to match browser specs more closely. Again in PHP and Python.)
I am cheating. When running the tests from the test suite I default to the text logic that we had before the new whitespace patch landed. See commit microformats/php-mf2@4d46586. This basically reverts a commit made in March 2018, but only for the purpose of running the tests. I hope that clears up some questions! |
@njkleiner thank you for the comprehensive comparison of parsers for this issue 🙂 it's very helpful! I have opened a draft pull (#52) request to add an experimental option to enable this. At present, it only collapses whitespace in properties and values (it does not apply to rels, but I haven't though about if it should handle these yet), and does not do any of the whitespace algorithm described by @Zegnat - although I think this would be the way to go with this experimental option. |
@njkleiner with v1.4.0 there's now support for the I am considering how we can enable some of these experimental options, perhaps by default at some point. |
Describe the bug
When implying the
value
property for a nested microformat (e.g.,h-adr
insideh-entry
) from the HTMLtextContents
, multiple successive whitespace characters should be collapsed to a single space character.To Reproduce
HTML input:
Expected behavior
Correct JSON output:
Actual JSON output:
Note the difference
Berlin, Berlin, DE
vs.Berlin,\n Berlin,\n DE
.Additional context
From what I can tell, this is not actually part of the specification, it seems to be commonly accepted though, as both the PHP parser and the Python parser do this.
The text was updated successfully, but these errors were encountered: