-
-
Notifications
You must be signed in to change notification settings - Fork 233
fix(BREAKING CHANGE): dereference caching to prevent infinite loops on circular schemas #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(BREAKING CHANGE): dereference caching to prevent infinite loops on circular schemas #380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot changes in this look odd but because we are now always using the cache for circular schemas, and because a -> b -> a
means that a -> a
, these dereferenced $ref
pointers are now a
because they were stored and retrieved from the cache as such.
Pull Request Test Coverage Report for Build 14412904745Details
💛 - Coveralls |
This is great! This is actually a bug I've been thinking about for a while. In general I think the cacheing mechanism needs to be fully rewritten, it's confusing and written improperly. Maybe I'll get Claude to take a stab at it in a while. This PR looks good. Unfortunately the change in how we process / return circular references makes me think that this is a breaking change. I think we need to label this as a BREAKING CHANGE pull request and do a major version bump, just in case. |
re: using unique keys for the Set cache, we could define something like import { hashKey } from "@tanstack/react-query"; it's a somewhat heavy operation (at least way heavier than a native object hash in javascript), but it would lead to stable keys. I'm not sure what it does wrt circular references though. |
Agreed that this definitely feels like a breaking change. I'll update the PR title. |
🎉 This PR is included in version 12.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Thanks so much @jonluca, you're amazing. |
…fs (#973) ## 🧰 Changes We uncovered a problem with the dereferencing functionality within our OpenAPI parser on schemas with an extensive amount of circular references where under these circumstances it could take upwards to an hour to dereference a schema. I was able to trace it down to being a problem within [@apidevtools/json-schema-ref-parser](https://npm.im/@apidevtools/json-schema-ref-parser) where because it was not always using its internal cache it would get into situations where, when parsing a circular reference, it would often attempt to re-parse that same reference multiple times. I submitted a fix for this over in APIDevTools/json-schema-ref-parser#380, and this pulls that in along with builds a unit test around one of these problematic schemas.
## 🧰 Changes This upgrades our OpenAPI tooling in order to pull in a fix for a serious performance issue we uncovered when dereferencing API definitions that make extensive use of circular references. Check out APIDevTools/json-schema-ref-parser#380 for details.
We've come across an interesting case with one of our customers OpenAPI definitions where due to the amount of circular references they are using it ends up sending
$RefParser.dereference()
into an infinite loop that has unfortunately been causing us some pain.The following is a video of running the
dereference.circular=ignore
test in this PR without my eventual fix:uncached.mov
I've killed the process after 20 seconds but we've seen cases where it can run for almost an hour trying to dereference this circular API definition. In investigating this we've identified a couple issues:
options.timeoutMs
check and exception from ever getting hit.In investigating the cache issues we noticed a couple more issues:
Core issues with
Set
cachesThe
Set
objects that are used forparents
andprocessedObjects
don't appear to be fully utilized because these are often setting objects, andSet
does not do object deuping:I'm not convinced that any of the
.has()
checks being executed on these stores are currently working and I made an attempt at pulling in flatted1 to create consistent and unique keys off of these values however a couple unit tests broken in unexpected ways and ended up moving on. I would love to spend some more time investigating this because I think in extreme cases like ours we could really improve memory usage during dereferencing.The dereference cache is only being used for non-circular objects
After crawling to dereferencing an object and either updating the original schema or resetting it to the original
$ref
ifdereference.circular=ignore
is set that resulting object is saved into the dereferenced cacheMap
. This map is referenced at the beginning of thedereference$Ref
function however the cache is only utilized if the found object is not circular.I was unable to uncover a reason why this was the case but without this cache being utilized this dereferencer would continuously, even despite
dereference.circular=ignore
being configured, crawl and re-dereference objects it had already seen. Changing this logic to always return the cache if we found something brought dereferencing our use case down from ∞ to less than a second.cached.mov
Ignore the logs in this video, it was recorded while I was still working on this fix.
BREAKING CHANGE: dereference caching to prevent infinite loops on circular schemas
Footnotes
Because
JSON.stringify()
cannot serialize circular objects. ↩