Use RFC3986 instead of manual string parsing #434
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR moves from using
urllib2
/urlparse
torfc3986
to perform URI handling. Parsed URI objects are now handled internally (and exposed from the publicRefResolver
attributes, but strings may be passed in to theRefResolver
public API as before. Some tests needed to be changed to reflect the more strict of URIs (such as the base URI must be absolute, or a URI reference with an empty fragment). Furthermore, an empty-string URI is not valid by the above rule, so a null URN is used. This probably should be a locatable URI (e.g file) rather than a URN upon reflection.Originally it was discussed (#346) that
hyperlink
would be the candidate library for better handling of URIs. After implementing support forhyperlink
, it slows the test-suite by approximately 8-9 X (from 1-2s to 17-18s). I then chose to use therfc3986
library, and then implemented a new branch which replaces therfc3986
api with the largely similar one ofhyperlink
, just so you can actually run the test suite with bothhyperlink
andrfc3986
. It may well be that I've missed something related to caching that I'm not aware of, which explains why thehyperlink
implementation is so slow.The minor differences between the
rfc3986_patch
and therfc_to_hyper
branches, besides the different APIs, are partly thathyperlink
doesn't support resolving against rootless URIs, so a rooted default URI has to be used. Also, therfc3986
parsed URI object defines a method for string comparison, whereashyperlink
does not.Of the libraries investigated.
hyperlink
slow, immutablefurl
slow, muteablerfc3987
immutable, not tested (doesn't implement normalisation yet)rfc3896
fastest and immutable objects