-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix(snapshot/x86_64): make sure TSC_DEADLINE MSR is non-zero #4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ShadowCurse
merged 3 commits into
firecracker-microvm:main
from
kalyazin:fix_tsc_deadline
May 31, 2024
Merged
fix(snapshot/x86_64): make sure TSC_DEADLINE MSR is non-zero #4618
ShadowCurse
merged 3 commits into
firecracker-microvm:main
from
kalyazin:fix_tsc_deadline
May 31, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
180c57f
to
021f189
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4618 +/- ##
==========================================
+ Coverage 82.08% 82.10% +0.01%
==========================================
Files 255 255
Lines 31258 31280 +22
==========================================
+ Hits 25659 25681 +22
Misses 5599 5599
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
40938cb
to
d31bcc2
Compare
d31bcc2
to
2adf6c4
Compare
zulinx86
reviewed
May 18, 2024
2adf6c4
to
79452f4
Compare
d7efcff
to
58e73cf
Compare
pb8o
reviewed
May 28, 2024
58e73cf
to
15b0650
Compare
pb8o
previously approved these changes
May 28, 2024
15b0650
to
ee8a644
Compare
ShadowCurse
reviewed
May 30, 2024
1cbc9d4
to
be15d75
Compare
ShadowCurse
reviewed
May 30, 2024
be15d75
to
d99a2f2
Compare
On x86_64, we observed that when restoring from a snapshot, one of the vCPUs had MSR_IA32_TSC_DEADLINE cleared and never received TSC interrupts until the MSR is updated externally (eg by setting the system time). We believe this happens because the TSC interrupt is lost during snapshot taking process: the MSR is cleared, but the interrupt is not delivered to the guest, so the guest does not rearm the timer. A visible effect of that is failure to connect to a restored VM via SSH. This commit introduces a workaround. If when taking a snapshot, we see a zero MSR_IA32_TSC_DEADLINE, we replace its value with the MSR_IA32_TSC value from the same vCPU to make sure that the vCPU will continue to receive TSC interrupts. Signed-off-by: Nikita Kalyazin <[email protected]>
The TSC_DEADLINE MSR value is volatile is it is getting updated by the guest kernel based on the current TSC value. Signed-off-by: Nikita Kalyazin <[email protected]>
The TSC_DEADLINE MSR value is volatile is it is getting updated by the guest kernel based on the current TSC value. Signed-off-by: Nikita Kalyazin <[email protected]>
d99a2f2
to
b222c18
Compare
ShadowCurse
approved these changes
May 31, 2024
pb8o
approved these changes
May 31, 2024
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
This change introduces a workaround. If when taking a snapshot, we see a zero MSR_IA32_TSC_DEADLINE, we replace its value with the MSR_IA32_TSC value from the same vCPU to make sure the vCPU will continue to receive TSC interrupts.
Reason
On x86_64, we observed that when restoring from a snapshot, one of the vCPUs had MSR_IA32_TSC_DEADLINE cleared and never received TSC interrupts until the MSR is updated externally (eg by setting the system time).
We believe this happens because the TSC interrupt is lost during snapshot taking process: the MSR is cleared, but the interrupt is not delivered to the guest, so the guest does not rearm the timer.
A visible effect of that is failure to connect to a restored VM via SSH, similar to https://buildkite.com/firecracker/firecracker-pr-nightly/builds/1403#018f83db-5395-4656-8d9c-83b6fcfcfd54/50-1994 .
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
[ ] If a specific issue led to this PR, this PR closes the issue.PR.
[ ] API changes follow the Runbook for Firecracker API changes.CHANGELOG.md
.[ ] NewTODO
s link to an issue.contribution quality standards.
rust-vmm
.