Skip to content

Use RFC3986 instead of manual string parsing #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

agoose77
Copy link
Contributor

@agoose77 agoose77 commented Jul 8, 2018

This PR moves from using urllib2/urlparse to rfc3986 to perform URI handling. Parsed URI objects are now handled internally (and exposed from the public RefResolver attributes, but strings may be passed in to the RefResolver public API as before. Some tests needed to be changed to reflect the more strict of URIs (such as the base URI must be absolute, or a URI reference with an empty fragment). Furthermore, an empty-string URI is not valid by the above rule, so a null URN is used. This probably should be a locatable URI (e.g file) rather than a URN upon reflection.

Originally it was discussed (#346) that hyperlink would be the candidate library for better handling of URIs. After implementing support for hyperlink, it slows the test-suite by approximately 8-9 X (from 1-2s to 17-18s). I then chose to use the rfc3986 library, and then implemented a new branch which replaces the rfc3986 api with the largely similar one of hyperlink, just so you can actually run the test suite with both hyperlink and rfc3986. It may well be that I've missed something related to caching that I'm not aware of, which explains why the hyperlink implementation is so slow.

The minor differences between the rfc3986_patch and the rfc_to_hyper branches, besides the different APIs, are partly that hyperlink doesn't support resolving against rootless URIs, so a rooted default URI has to be used. Also, the rfc3986 parsed URI object defines a method for string comparison, whereas hyperlink does not.

Of the libraries investigated.

  • hyperlink slow, immutable
  • furl slow, muteable
  • rfc3987 immutable, not tested (doesn't implement normalisation yet)
  • rfc3896 fastest and immutable objects

@Julian
Copy link
Member

Julian commented Oct 21, 2018

I didn't forget about this ,sorry :/ just wanted to get the new draft support in before anything else, but I know I owe you feedback here, assuming you're still interested in pushing this forward.

I will say that definitely want one of the immutable ones, and that if hyperlink is slow, I'd lean towards filing an upstream ticket telling them that, but we may need to reduce this to something measurable/helpful to be able to do that.

@Julian Julian closed this Oct 20, 2020
@Julian Julian deleted the branch python-jsonschema:master October 20, 2020 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants