Skip to content

Fail mirroring more gracefully #34002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 26, 2025
Merged

Conversation

rremer
Copy link
Contributor

@rremer rremer commented Mar 24, 2025

  • reuse recoverable error checks across mirror_pull
  • add new cases for 'cannot lock ref/not our ref' (race condition in fetch) and 'Unable to create/lock"
  • move lfs sync right after commit graph write, and before other maintenance which may fail
  • try a prune for 'broken reference' as well as 'not our ref'
  • always sync LFS right after commit graph write, and before other maintenance which may fail

This handles a few cases where our very large and very active repositories could serve mirrored git refs, but be missing lfs files:

Case 1 (multiple variants): Race condition in git fetch

There was already a check for 'unable to resolve reference' on a failed git fetch, after which a git prune and then subsequent fetch are performed. This is to work around a race condition where the git remote tells Gitea about a ref for some HEAD of a branch, then fails a few seconds later because the remote branch was deleted, or the ref was updated (force push).

There are two more variants to the error message you can get, but for the same kind of race condition. These may be related to the git binary version Gitea has access to (in my case, it was 2.48.1).

Case 2: githttp.go can serve updated git refs before it's synced lfs oids

There is probably a more aggressive refactor we could do here to have the cat-file loop use FETCH_HEAD instead of relying on the commit graphs to be committed locally (and thus serveable to clients of Gitea), but a simple reduction in the occurrences of this for me was to move the lfs sync block immediately after the commit-graph write and before any other time-consuming (or potentially erroring/exiting) blocks.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 24, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Mar 24, 2025
@rremer rremer force-pushed the missing-lfs-fixes branch from 39f376f to c6337ab Compare March 24, 2025 21:43
@lunny lunny added the type/enhancement An improvement of existing functionality label Mar 24, 2025
@lunny lunny added this to the 1.24.0 milestone Mar 24, 2025
@rremer rremer force-pushed the missing-lfs-fixes branch from c6337ab to 62a85f3 Compare March 25, 2025 17:23
* reuse recoverable error checks across mirror_pull
* add new cases for 'cannot lock ref/not our ref' (race condition in fetch) and 'Unable to create/lock"
* move lfs sync right after commit graph write, and before other maintenance which may fail
* try a prune for 'broken reference' as well as 'not our ref'
* always sync LFS right after commit graph write, and before other maintenance which may fail
@rremer rremer force-pushed the missing-lfs-fixes branch from 62a85f3 to 1871276 Compare March 25, 2025 20:19
Copy link
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes in the test code to make it done in a loop

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 26, 2025
@wxiaoguang wxiaoguang changed the title Fail mirroring more gracefully: Fail mirroring more gracefully Mar 26, 2025
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Mar 26, 2025
@wxiaoguang wxiaoguang enabled auto-merge (squash) March 26, 2025 15:56
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Mar 26, 2025
@wxiaoguang wxiaoguang merged commit e0ad72e into go-gitea:main Mar 26, 2025
26 checks passed
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Mar 26, 2025
zjjhot added a commit to zjjhot/gitea that referenced this pull request Mar 27, 2025
* giteaofficial/main:
  [skip ci] Updated translations via Crowdin
  Download actions job logs from API (go-gitea#33858)
  Fail mirroring more gracefully (go-gitea#34002)
  Fix dropdown module accessing (go-gitea#34026)
  Polyfill WeakRef (go-gitea#34025)
  Fix dropdown delegating and some UI problems (go-gitea#34014)
project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo-as-gitea-fork that referenced this pull request Apr 1, 2025
* reuse recoverable error checks across mirror_pull
* add new cases for 'cannot lock ref/not our ref' (race condition in
fetch) and 'Unable to create/lock"
* move lfs sync right after commit graph write, and before other
maintenance which may fail
* try a prune for 'broken reference' as well as 'not our ref'
* always sync LFS right after commit graph write, and before other
maintenance which may fail

This handles a few cases where our very large and very active
repositories could serve mirrored git refs, but be missing lfs files:

## Case 1 (multiple variants): Race condition in git fetch
There was already a check for 'unable to resolve reference' on a failed
git fetch, after which a git prune and then subsequent fetch are
performed. This is to work around a race condition where the git remote
tells Gitea about a ref for some HEAD of a branch, then fails a few
seconds later because the remote branch was deleted, or the ref was
updated (force push).

There are two more variants to the error message you can get, but for
the same kind of race condition. These *may* be related to the git
binary version Gitea has access to (in my case, it was 2.48.1).

## Case 2: githttp.go can serve updated git refs before it's synced lfs
oids

There is probably a more aggressive refactor we could do here to have
the cat-file loop use FETCH_HEAD instead of relying on the commit graphs
to be committed locally (and thus serveable to clients of Gitea), but a
simple reduction in the occurrences of this for me was to move the lfs
sync block immediately after the commit-graph write and before any other
time-consuming (or potentially erroring/exiting) blocks.

---------

Co-authored-by: wxiaoguang <[email protected]>
(cherry picked from commit e0ad72e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/go Pull requests that update Go code type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants