Skip to content

Performance Problems #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wizzat opened this issue Apr 14, 2014 · 5 comments
Closed

Performance Problems #158

wizzat opened this issue Apr 14, 2014 · 5 comments

Comments

@wizzat
Copy link

wizzat commented Apr 14, 2014

Hey,

So I've been using jsonschema for basic json validation of events before processing. However, one thing that I've noticed is that urlparse and urljoin are consistently among the hot spots in my code base. After tracing it down I've found that a lot of it comes from RefResolver.resolving and RefResolver.resolving.

It seems like a couple of optimizations could be really handy:

@contextlib.contextmanager
def resolving(self, ref):

    if ref == '#':
        yield self.store[self.base_uri]
    else:
        ...

I'm not entirely sure how to go about handling in_scope, but in my application it seems like resolution scope always becomes the raw scope.

@contextlib.contextmanager
def in_scope(self, scope):
    old_scope = self.resolution_scope
    self.resolution_scope = scope
    try:
        yield
    finally:
        self.resolution_scope = old_scope

The next spot that seems to take up a bunch of time is in iter_errors. There aren't any validation errors, but this seems like it's just a matter of applying the schema to the object and detecting any errors. However, in several places it seems like it's wrapped in a list, which means you lose the ability to grab the first error and bail out.

ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
…ization-branches.

* FIX 2 forgotten test-case on resolver-URIs from split_scopes.

x 1.8 faster in big referenced model.
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
…s_stack empty when

iteration breaks (no detectable performance penalty).

* Replace non-python-2.6 DefragResult with named-tuple.
* Add test-case checking scopes_stack empty.
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
@ankostis
Copy link
Contributor

Hi,

I have also observed the same behavior with my schema, too many calls to the urllib.
But i found an additional cause of the delay, the repetitious stacking of function-frames for every visited schema node, regardless of whether it defined an id property or not.

To have a common reference i created benchmarks that Julian has requested, and used the META_SCHEMAs along with a moderately large sample schema i'm using for my project.
I then added two optimizations, their combination, a new test-case measuring the validator's performance, along with the engraving of the timings of the various cases above on the docstrings of the test-case methods (read also pull-request #182).
Both optimizations (hopefully) preserve the semantics of the scopes (to avoid problems like #159).

The new TC about ensuring the well-state of scopes-stacks in my Pull-Request passes for all python-version EXCEPT from the 'pypi' environment. It seems that the finally after the yield does not run.

If that is the case, then, this new test-case (slightly-modified) should also fail with the old unoptimized code in master.
Has anybody any idea why this happens?

@ankostis
Copy link
Contributor

Forgot the most important: For complicated schemas the combined optimizations improve the performance almost by a factor x2.

ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
…_unroll_scopes'

Although break_loop was not faster by itself, combined with the previous
optimizations dropped even further the time:
MAX FASTER x2!
ankostis added a commit to ankostis/jsonschema that referenced this issue Sep 24, 2014
…_unroll_scopes'

* Mark BreakLoopException as private.
Although break_loop was not faster by itself, combined with the previous
optimizations dropped even further the time:
MAX FASTER x2!
@Bachmann1234
Copy link

Is there any update on this? Currently jsonschema is a major performance bottleneck for a project I am working on.

What's the status of those PRS? Anything I can do to help? I hope to look into the library in detail myself next week.

@Julian
Copy link
Member

Julian commented Dec 19, 2014

Unfortunately I just haven't had the chance to really sit down and look at them :( -- at least one of them has changes that break public APIs, so those at least have to change. IIRC they also lack automated benchmarks. Beyond that I haven't gotten a chance to vet the changes there.

Certainly including profile output if you're having performance issues would also help.

@Bachmann1234
Copy link

@Julian alrighty, sadly the project is closed as its a work thing. I suspect the other people looking into this have the right idea. If I figure anything out ill make a PR or comment here. Perhaps I can generate a call graph using a similar but different schema

Bachmann1234 pushed a commit to Bachmann1234/jsonschema that referenced this issue Dec 20, 2014
Bachmann1234 pushed a commit to Bachmann1234/jsonschema that referenced this issue Dec 20, 2014
* Improve statistics print-outs.
* Engrave timming results in benchmark docstrings.
Bachmann1234 pushed a commit to Bachmann1234/jsonschema that referenced this issue Dec 20, 2014
…ing into a list whe

not null
(instead of using a context-manager each time)

Roughly x 1.5 faster
Bachmann1234 pushed a commit to Bachmann1234/jsonschema that referenced this issue Dec 20, 2014
…g by keeping

fragments separated from URL (and avoid redunant frag/defrag).
dnephin pushed a commit to dnephin/jsonschema that referenced this issue Feb 28, 2015
…g by keeping

fragments separated from URL (and avoid redunant frag/defrag).
Conflicts:
	jsonschema/tests/test_benchmarks.py

issue python-jsonschema#158: Use try-finally to ensure resolver scopes_stack empty when
iteration breaks (no detectable performance penalty).

* Replace non-python-2.6 DefragResult with named-tuple.
* Add test-case checking scopes_stack empty.
Conflicts:
	jsonschema/tests/test_validators.py
	jsonschema/validators.py
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
* Improve statistics print-outs.
* Engrave timming results in benchmark docstrings.
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
…ing into a list whe

not null
(instead of using a context-manager each time)

Roughly x 1.5 faster
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
…g by keeping

fragments separated from URL (and avoid redunant frag/defrag).
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
…s_stack empty when

iteration breaks (no detectable performance penalty).

* Replace non-python-2.6 DefragResult with named-tuple.
* Add test-case checking scopes_stack empty.
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
ankostis added a commit to ankostis/jsonschema that referenced this issue Mar 1, 2015
Julian added a commit that referenced this issue Mar 15, 2015
* dnephin/perf_cache_resolving:
  Use lru_cache
  Remove DefragResult.
  Remove context manager from ref() validation.
  Perf improvements by using a cache.
  Add benchmark script.
  Fix test failures
  issue #158: TRY to speed-up scope & $ref url-handling by keeping fragments separated from URL (and avoid redunant frag/defrag). Conflicts: 	jsonschema/tests/test_benchmarks.py
Julian added a commit that referenced this issue Apr 6, 2015
* perf_cache_resolving:
  Squashed 'json/' changes from 9208016..0b657e8
  Need to preserve backwards compat for RefResolvers without the new methods.
  Pass in caches instead of arguments.
  I give up.
  Not deprecating these for now, just not used internally.
  Fix base_uri backwards compatibility.
  Er, green doesn't work on 2.6, and make running right out of a checkout easier.
  Wrong docstring.
  Add back assertions for backwards compat.
  Wait wat. Remove insanity.
  Probably should combine these at some point, but for now move them.
  Really run on the installed package.
  Begone py.test.
  Remove 3.3, use pip for installs, use green here too.
  lxml-cffi is giving obscure errors again.
  Fix a non-type in the docs.
  Switch to vcversioner, use repoze.lru only on 2.6, and add extras_require for format.
  Run tests on the installed package.
  Newer tox is slightly saner.
  It's hard to be enthusiastic about tox anymore.
  Use lru_cache
  Remove DefragResult.
  Remove context manager from ref() validation.
  Perf improvements by using a cache.
  Add benchmark script.
  Fix test failures
  issue #158: TRY to speed-up scope & $ref url-handling by keeping fragments separated from URL (and avoid redunant frag/defrag). Conflicts: 	jsonschema/tests/test_benchmarks.py
@Julian Julian closed this as completed Jun 12, 2015
dlax added a commit to dlax/jsonschema that referenced this issue Jun 8, 2017
73a0593 Merge pull request python-jsonschema#169 from epoberezkin/draft6-tests-ajv
e2e06d7 update ajv version, test
bd14545 Merge branch 'master' into draft6-tests-ajv
8758156 Merge pull request python-jsonschema#172 from epoberezkin/bignum
19a0b46 Merge pull request python-jsonschema#171 from epoberezkin/id
b8165b7 draft-06: option/bignum tests exclusiveMaximum/Minimum updated
f04ed0e Merge branch 'master' into draft6-tests-ajv
0931c60 test: run draft-04/06 tests with ajv
da8b14e Merge pull request python-jsonschema#170 from epoberezkin/boolean
20d706c Merge pull request python-jsonschema#163 from epoberezkin/boundary-point
ae72865 Merge pull request python-jsonschema#168 from korzio/patch-1
f809e51 draft-06: $id keyword
04d7e06 Add djv validator to readme
e86adb2 draft-06: $ref to boolean schemas
d62b754 draft-06: allOf, anyOf, oneOf keywords with boolean schemas
b714a18 draft-06: not keyword with boolean schemas
b6b00f9 draft-06: contains keyword with boolean schemas
9c05d18 draft-06: propertyNames keyword with boolean schemas
fa99135 draft-06: patternProperties keyword with boolean schemas
5a888a6 draft-06: dependencies keyword with boolean subschemas
53858ff draft-06: items keyword with boolean schemas
afd6fab Merge pull request python-jsonschema#167 from json-schema-org/revert-123-relative-ref-id
3b9b688 Revert "Test relative reference resolution when ID is not present"
77e0411 draft-06: properties keyword with boolean schemas
cff24c1 Merge pull request python-jsonschema#123 from yuloh/relative-ref-id
e7c1f4e Merge pull request python-jsonschema#161 from epoberezkin/items
43bfc6b Merge pull request python-jsonschema#165 from epoberezkin/test-schema
6956f20 draft-06: boolean root schema
e1139a3 update test-schema to draft-04
042fae9 Merge pull request python-jsonschema#164 from epoberezkin/zero-terminated
78de3a6 draft-06: zero-terminated float is a valid integer
9d998bb draft-04: added maximum/minimum tests for boundary point
b8f51ab Merge pull request python-jsonschema#151 from epoberezkin/exclusive-limits
951bd41 Merge pull request python-jsonschema#154 from epoberezkin/property-names
b974907 Merge pull request python-jsonschema#159 from epoberezkin/empty-property-list
9b1364e Merge pull request python-jsonschema#158 from epoberezkin/contains
95b4648 Merge pull request python-jsonschema#157 from epoberezkin/const
85efafb draft-06: required and dependencies with empty property arrays
7e40f2e draft-06: exclusiveMaximum and exclusiveMinimum validation
0058644 draft-06: propertyNames validation
ba6c582 draft-06: const validation
9867b96 draft-06: contains keyword validation
96fed74 draft-04/06: updated descriptions for items/additionalItems test cases
0799212 Make the URI tests consistent with themselves.
65c18d5 draft-04/06: items/additionalItems tests
12ab007 Update to the new organization.
a599e1e Flip protocol-relative URI Reference (not a URI)
ed3391c Merge pull request python-jsonschema#150 from handrews/draft6
555dfd3 Initialize draft6 tests from draft4.
5363098 Merge remote-tracking branch 'pboettch/master'
2ba2657 Merge remote-tracking branch 'agebhar1/feature/java+everit-org'
f7cec5a Merge remote-tracking branch 'agebhar1/feature/java+networknt'
16a8c1b Merge pull request python-jsonschema#145 from agebhar1/feature/java+json-schema-validator
6d29eb2 add `networknt/json-schema-validator` to test suite users
b38690a add `everit-org/json-schema` to test suite users
5637345 update Java's `json-schema-validator` repository
5a2c708 Added 'Modern C++ JSON schema validator'
371977c Merge branch 'develop'
e2618a0 Merge pull request python-jsonschema#142 from pipobscure/refOnly
b24fae2 When $ref is present other keywords should be ignored
cb25a4e Merge pull request python-jsonschema#134 from zloster/develop
1e9db5e Add optional check for non-ECMA 262 regular expressions
efb3c89 Merge pull request python-jsonschema#127 from iainbeeston/extra-items-tests
27692ac Merge pull request python-jsonschema#133 from gavinwahl/develop
fc7584e add postgres-json-schema
654cc26 Added tests for items when data contains more or less types than the schema
5fb3d9f Merge pull request python-jsonschema#125 from yuloh/ref-property
833a70c Merge pull request python-jsonschema#126 from gregsdennis/develop
302da1e Added Manatee.Json as .Net implementor.
dc001c2 Add test for properties named $ref
6d366dc Test relative reference resolution
9355965 Merge pull request python-jsonschema#120 from seagreen/extract-test-schema
42ba0f9 Extract the schema for tests into a JSON file.
2f0d6e3 Merge pull request python-jsonschema#118 from yuloh/ignores-non-objects-required
cc52997 Test required ignores non objects
27ed651 Merge pull request python-jsonschema#117 from tatut/develop
c9f5dcd Add Clojure json-schema to users
8549899 Merge pull request python-jsonschema#116 from yuloh/patch-1
de70a75 Add league/json-guard implementation for PHP
9723c8f Merge pull request python-jsonschema#111 from Relequestual/patch-1
9d5909a Also check string "1" is not a number in draft 4
e4ab5bd Check that a string of "1" is not a number
5dc71b8 Merge pull request python-jsonschema#108 from duksis/patch-1
6a054cd Adds ex_json_schema validator for Elixir
f3d5aeb Merge branch 'develop'
5bcf11a Port to draft4.
3f7ed81 Merge pull request python-jsonschema#103 from Relequestual/patch-1
928d3b0 Fixed incorrect negative description of a sub test
62414e4 Minor shuffling.
a7305a6 Merge remote-tracking branch 'legoktm/tox' into develop
5cc622c Merge pull request python-jsonschema#100 from atomiqio/develop
2f51b2e updated node.js info in README.md
8f29757 Add the README note on develop.
338eef3 Merge pull request python-jsonschema#97 from atomiqio/develop
146e291 JavaScript should written in upper camel case
7511038 Merge pull request python-jsonschema#95 from epoberezkin/develop
ca28bbb added ajv validator to readme
504f776 Merge pull request python-jsonschema#93 from gelraen/readme
59207fd Add another Go implementation
d14cf96 Use tox to run tests

git-subtree-dir: json
git-subtree-split: 73a05935f5418f4b59fb8084b4ffa6edf8a4eea0
Julian added a commit that referenced this issue Jun 11, 2017
27f8c84 Merge pull request #184 from remexre/master
8fc497e Changes draft06 tests to use draft06 metaschema.
583ecf9 Add name.json to the CLI tool's remotes.
67c7b4d Merge pull request #180 from bismark/patch-1
784198b Update erlang URL
05fdba4 Merge pull request #173 from epoberezkin/format
e8ef6fd draft-06: additional format tests
d545553 Merge branch 'master' into format
bc4de6c Merge pull request #174 from epoberezkin/zero-term
f455ecc Merge branch 'master' into zero-term
5f6abf7 draft-06: zero-terminated float test description
c1b12bf Merge pull request #160 from epoberezkin/ref-tests
25836f7 update draft-06 tests: "id" -> "$id"
671bad6 fix: JavaScript test
44f6447 Merge branch 'master' into ref-tests
73a0593 Merge pull request #169 from epoberezkin/draft6-tests-ajv
e2e06d7 update ajv version, test
bd14545 Merge branch 'master' into draft6-tests-ajv
cbe0e5b Merge branch 'master' into ref-tests
8758156 Merge pull request #172 from epoberezkin/bignum
19a0b46 Merge pull request #171 from epoberezkin/id
e1e1eec draft-06: format "json-pointer" tests
1e2834c draft-06: format "uri-template" tests
21b776e draft-06: format "uri-reference" tests
b8165b7 draft-06: option/bignum tests exclusiveMaximum/Minimum updated
f04ed0e Merge branch 'master' into draft6-tests-ajv
0931c60 test: run draft-04/06 tests with ajv
da8b14e Merge pull request #170 from epoberezkin/boolean
20d706c Merge pull request #163 from epoberezkin/boundary-point
ae72865 Merge pull request #168 from korzio/patch-1
f809e51 draft-06: $id keyword
04d7e06 Add djv validator to readme
e86adb2 draft-06: $ref to boolean schemas
d62b754 draft-06: allOf, anyOf, oneOf keywords with boolean schemas
b714a18 draft-06: not keyword with boolean schemas
b6b00f9 draft-06: contains keyword with boolean schemas
9c05d18 draft-06: propertyNames keyword with boolean schemas
fa99135 draft-06: patternProperties keyword with boolean schemas
5a888a6 draft-06: dependencies keyword with boolean subschemas
53858ff draft-06: items keyword with boolean schemas
afd6fab Merge pull request #167 from json-schema-org/revert-123-relative-ref-id
3b9b688 Revert "Test relative reference resolution when ID is not present"
77e0411 draft-06: properties keyword with boolean schemas
cff24c1 Merge pull request #123 from yuloh/relative-ref-id
e7c1f4e Merge pull request #161 from epoberezkin/items
43bfc6b Merge pull request #165 from epoberezkin/test-schema
6956f20 draft-06: boolean root schema
e1139a3 update test-schema to draft-04
042fae9 Merge pull request #164 from epoberezkin/zero-terminated
78de3a6 draft-06: zero-terminated float is a valid integer
9d998bb draft-04: added maximum/minimum tests for boundary point
b8f51ab Merge pull request #151 from epoberezkin/exclusive-limits
951bd41 Merge pull request #154 from epoberezkin/property-names
b974907 Merge pull request #159 from epoberezkin/empty-property-list
9b1364e Merge pull request #158 from epoberezkin/contains
95b4648 Merge pull request #157 from epoberezkin/const
85efafb draft-06: required and dependencies with empty property arrays
7e40f2e draft-06: exclusiveMaximum and exclusiveMinimum validation
0058644 draft-06: propertyNames validation
ba6c582 draft-06: const validation
9867b96 draft-06: contains keyword validation
96fed74 draft-04/06: updated descriptions for items/additionalItems test cases
0799212 Make the URI tests consistent with themselves.
65c18d5 draft-04/06: items/additionalItems tests
3186761 draft-04/06: root ref "#" in remote ref
0178c74 draft-04/06: recursive refs test
a884acc draft-04/06: base URI change tests
12ab007 Update to the new organization.
a599e1e Flip protocol-relative URI Reference (not a URI)
ed3391c Merge pull request #150 from handrews/draft6
555dfd3 Initialize draft6 tests from draft4.
5363098 Merge remote-tracking branch 'pboettch/master'
2ba2657 Merge remote-tracking branch 'agebhar1/feature/java+everit-org'
f7cec5a Merge remote-tracking branch 'agebhar1/feature/java+networknt'
16a8c1b Merge pull request #145 from agebhar1/feature/java+json-schema-validator
6d29eb2 add `networknt/json-schema-validator` to test suite users
b38690a add `everit-org/json-schema` to test suite users
5637345 update Java's `json-schema-validator` repository
5a2c708 Added 'Modern C++ JSON schema validator'
371977c Merge branch 'develop'
e2618a0 Merge pull request #142 from pipobscure/refOnly
b24fae2 When $ref is present other keywords should be ignored
cb25a4e Merge pull request #134 from zloster/develop
1e9db5e Add optional check for non-ECMA 262 regular expressions
efb3c89 Merge pull request #127 from iainbeeston/extra-items-tests
27692ac Merge pull request #133 from gavinwahl/develop
fc7584e add postgres-json-schema
654cc26 Added tests for items when data contains more or less types than the schema
5fb3d9f Merge pull request #125 from yuloh/ref-property
833a70c Merge pull request #126 from gregsdennis/develop
302da1e Added Manatee.Json as .Net implementor.
dc001c2 Add test for properties named $ref
6d366dc Test relative reference resolution
9355965 Merge pull request #120 from seagreen/extract-test-schema
42ba0f9 Extract the schema for tests into a JSON file.
2f0d6e3 Merge pull request #118 from yuloh/ignores-non-objects-required
cc52997 Test required ignores non objects
27ed651 Merge pull request #117 from tatut/develop
c9f5dcd Add Clojure json-schema to users
8549899 Merge pull request #116 from yuloh/patch-1
de70a75 Add league/json-guard implementation for PHP
9723c8f Merge pull request #111 from Relequestual/patch-1
9d5909a Also check string "1" is not a number in draft 4
e4ab5bd Check that a string of "1" is not a number
5dc71b8 Merge pull request #108 from duksis/patch-1
6a054cd Adds ex_json_schema validator for Elixir
f3d5aeb Merge branch 'develop'
5bcf11a Port to draft4.
3f7ed81 Merge pull request #103 from Relequestual/patch-1
928d3b0 Fixed incorrect negative description of a sub test
62414e4 Minor shuffling.
a7305a6 Merge remote-tracking branch 'legoktm/tox' into develop
5cc622c Merge pull request #100 from atomiqio/develop
2f51b2e updated node.js info in README.md
8f29757 Add the README note on develop.
338eef3 Merge pull request #97 from atomiqio/develop
146e291 JavaScript should written in upper camel case
7511038 Merge pull request #95 from epoberezkin/develop
ca28bbb added ajv validator to readme
504f776 Merge pull request #93 from gelraen/readme
59207fd Add another Go implementation
d14cf96 Use tox to run tests

git-subtree-dir: json
git-subtree-split: 27f8c840acdb3700de7173981e95ed90ef28de2e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants