Skip to content

fix(BREAKING CHANGE): dereference caching to prevent infinite loops on circular schemas #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 11, 2025

Conversation

erunion
Copy link
Contributor

@erunion erunion commented Apr 11, 2025

We've come across an interesting case with one of our customers OpenAPI definitions where due to the amount of circular references they are using it ends up sending $RefParser.dereference() into an infinite loop that has unfortunately been causing us some pain.

The following is a video of running the dereference.circular=ignore test in this PR without my eventual fix:

uncached.mov

I've killed the process after 20 seconds but we've seen cases where it can run for almost an hour trying to dereference this circular API definition. In investigating this we've identified a couple issues:

  1. When dereferencing this specific schema the event loop gets blocked in the process preventing the options.timeoutMs check and exception from ever getting hit.
    • We were able to resolve this by adding a supplementary timeout check when the dereference crawler processes each property within an object.
  2. The dereference cache isn't being fully taken advantage of.

In investigating the cache issues we noticed a couple more issues:

Core issues with Set caches

The Set objects that are used for parents and processedObjects don't appear to be fully utilized because these are often setting objects, and Set does not do object deuping:

const set = new Set();
set.add({ type: 'string' });
set.add({ type: 'string' });

console.log({ set: Array.from(set), has: set.has({ type: 'string'} )})

> {set: Array(2), has: false}

I'm not convinced that any of the .has() checks being executed on these stores are currently working and I made an attempt at pulling in flatted1 to create consistent and unique keys off of these values however a couple unit tests broken in unexpected ways and ended up moving on. I would love to spend some more time investigating this because I think in extreme cases like ours we could really improve memory usage during dereferencing.

The dereference cache is only being used for non-circular objects

After crawling to dereferencing an object and either updating the original schema or resetting it to the original $ref if dereference.circular=ignore is set that resulting object is saved into the dereferenced cache Map. This map is referenced at the beginning of the dereference$Ref function however the cache is only utilized if the found object is not circular.

I was unable to uncover a reason why this was the case but without this cache being utilized this dereferencer would continuously, even despite dereference.circular=ignore being configured, crawl and re-dereference objects it had already seen. Changing this logic to always return the cache if we found something brought dereferencing our use case down from ∞ to less than a second.

cached.mov

Ignore the logs in this video, it was recorded while I was still working on this fix.

BREAKING CHANGE: dereference caching to prevent infinite loops on circular schemas

Footnotes

  1. Because JSON.stringify() cannot serialize circular objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot changes in this look odd but because we are now always using the cache for circular schemas, and because a -> b -> a means that a -> a, these dereferenced $ref pointers are now a because they were stored and retrieved from the cache as such.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 14412904745

Details

  • 16 of 16 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 92.926%

Totals Coverage Status
Change from base Build 13576959097: 0.3%
Covered Lines: 1644
Relevant Lines: 1744

💛 - Coveralls

@jonluca
Copy link
Collaborator

jonluca commented Apr 11, 2025

This is great! This is actually a bug I've been thinking about for a while. In general I think the cacheing mechanism needs to be fully rewritten, it's confusing and written improperly. Maybe I'll get Claude to take a stab at it in a while.

This PR looks good.

Unfortunately the change in how we process / return circular references makes me think that this is a breaking change. I think we need to label this as a BREAKING CHANGE pull request and do a major version bump, just in case.

@jonluca
Copy link
Collaborator

jonluca commented Apr 11, 2025

re: using unique keys for the Set cache, we could define something like hashKey from @tanstack/react-query

import { hashKey } from "@tanstack/react-query";

it's a somewhat heavy operation (at least way heavier than a native object hash in javascript), but it would lead to stable keys. I'm not sure what it does wrt circular references though.

@erunion
Copy link
Contributor Author

erunion commented Apr 11, 2025

Unfortunately the change in how we process / return circular references makes me think that this is a breaking change. I think we need to label this as a BREAKING CHANGE pull request and do a major version bump, just in case.

Agreed that this definitely feels like a breaking change. I'll update the PR title.

@erunion erunion changed the title fix(dereference): utilizing caching to prevent infinite loops on circular schemas breaking change(fix): dereference caching to prevent infinite loops on circular schemas Apr 11, 2025
@erunion erunion changed the title breaking change(fix): dereference caching to prevent infinite loops on circular schemas BREAKING CHANGE: fix dereference caching to prevent infinite loops on circular schemas Apr 11, 2025
@erunion erunion changed the title BREAKING CHANGE: fix dereference caching to prevent infinite loops on circular schemas fix(BREAKING CHANGE): dereference caching to prevent infinite loops on circular schemas Apr 11, 2025
@jonluca jonluca merged commit 50414c3 into APIDevTools:main Apr 11, 2025
14 checks passed
Copy link

🎉 This PR is included in version 12.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@erunion
Copy link
Contributor Author

erunion commented Apr 11, 2025

Thanks so much @jonluca, you're amazing.

erunion added a commit to readmeio/oas that referenced this pull request Apr 14, 2025
…fs (#973)

## 🧰 Changes

We uncovered a problem with the dereferencing functionality within our
OpenAPI parser on schemas with an extensive amount of circular
references where under these circumstances it could take upwards to an
hour to dereference a schema. I was able to trace it down to being a
problem within
[@apidevtools/json-schema-ref-parser](https://npm.im/@apidevtools/json-schema-ref-parser)
where because it was not always using its internal cache it would get
into situations where, when parsing a circular reference, it would often
attempt to re-parse that same reference multiple times.

I submitted a fix for this over in
APIDevTools/json-schema-ref-parser#380, and this
pulls that in along with builds a unit test around one of these
problematic schemas.
erunion added a commit to readmeio/rdme that referenced this pull request Apr 14, 2025
## 🧰 Changes

This upgrades our OpenAPI tooling in order to pull in a fix for a
serious performance issue we uncovered when dereferencing API
definitions that make extensive use of circular references. Check out
APIDevTools/json-schema-ref-parser#380 for
details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants