Skip to content

Commit c4aa727

Browse files
committed
examples: Have UFFD handler kill Firecracker should it die
If the UFFD handler exits abnormaly for some reason, have it take down Firecracker as well by SIGKILL-ing it from a panic hook. For this, reintroduce the "get peer creds" logic. We have to use SIGKILL because Firecracker could be inside the handler for a KVM-originated page fault that is not marked as interruptible, in which case all signals but SIGKILL are ignored (happens for example during KVM_SET_MSRS when it triggers the initialization of a gfn_to_pfn_cache for the kvm-clock page, which uses GUP without FOLL_INTERRUPTIBLE). While we're at it, add a hint to the generic "process not found" error message to indicate that potentially Firecracker died, and that the cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601 the cause of the mystery hang is the UFFD handler crashing, but we were stumped by what's going on for over half a year. Let's avoid that going forward). We can't enable this by default because it interferes with unittests, and also the "malicious_handler", so expose a function on `Runtime` to enable it only in valid_handler and fault_all_handler. Signed-off-by: Patrick Roy <[email protected]>
1 parent 0b9e00c commit c4aa727

File tree

4 files changed

+42
-0
lines changed

4 files changed

+42
-0
lines changed

src/firecracker/examples/uffd/fault_all_handler.rs

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ fn main() {
2424
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");
2525

2626
let mut runtime = Runtime::new(stream, file);
27+
runtime.install_panic_hook();
2728
runtime.run(|uffd_handler: &mut UffdHandler| {
2829
// Read an event from the userfaultfd.
2930
let event = uffd_handler

src/firecracker/examples/uffd/uffd_utils.rs

+37
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,43 @@ impl Runtime {
208208
}
209209
}
210210

211+
fn peer_process_credentials(&self) -> libc::ucred {
212+
let mut creds: libc::ucred = libc::ucred {
213+
pid: 0,
214+
gid: 0,
215+
uid: 0,
216+
};
217+
let mut creds_size = size_of::<libc::ucred>() as u32;
218+
let ret = unsafe {
219+
libc::getsockopt(
220+
self.stream.as_raw_fd(),
221+
libc::SOL_SOCKET,
222+
libc::SO_PEERCRED,
223+
&mut creds as *mut _ as *mut _,
224+
&mut creds_size as *mut libc::socklen_t,
225+
)
226+
};
227+
if ret != 0 {
228+
panic!("Failed to get peer process credentials");
229+
}
230+
creds
231+
}
232+
233+
pub fn install_panic_hook(&self) {
234+
let peer_creds = self.peer_process_credentials();
235+
236+
let default_panic_hook = std::panic::take_hook();
237+
std::panic::set_hook(Box::new(move |panic_info| {
238+
let r = unsafe { libc::kill(peer_creds.pid, libc::SIGKILL) };
239+
240+
if r != 0 {
241+
eprintln!("Failed to kill Firecracker process from panic hook");
242+
}
243+
244+
default_panic_hook(panic_info);
245+
}));
246+
}
247+
211248
/// Polls the `UnixStream` and UFFD fds in a loop.
212249
/// When stream is polled, new uffd is retrieved.
213250
/// When uffd is polled, page fault is handled by

src/firecracker/examples/uffd/valid_handler.rs

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ fn main() {
2424
let (stream, _) = listener.accept().expect("Cannot listen on UDS socket");
2525

2626
let mut runtime = Runtime::new(stream, file);
27+
runtime.install_panic_hook();
2728
runtime.run(|uffd_handler: &mut UffdHandler| {
2829
// Read an event from the userfaultfd.
2930
let event = uffd_handler

tests/framework/microvm.py

+3
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,9 @@ def kill(self):
310310
if self.screen_pid:
311311
os.kill(self.screen_pid, signal.SIGKILL)
312312
except:
313+
LOG.error(
314+
"Failed to kill Firecracker Process. Did it already die (or did the UFFD handler process die and take it down)?"
315+
)
313316
LOG.error(self.log_data)
314317
raise
315318

0 commit comments

Comments
 (0)