x86: vmm: Mark the vmtf as partial when popping The GPC is loaded when we attempt vmenter (launch/resume). If we fail, we'll reflect the context. However, we need to unload the GPC to restore things to sanity. Specifically, MSRs like MSR_STAR need to be set back to normal. Normally, this is done when we finalize a context, (done during reflect_current_ctx(), during copy_current_ctx_to()). Previously, once we attempt to launch the context, it wasn't necessarily marked as partial. It could have been, if it was originally partial when we launched it. No guarantees though. Since we loaded the GPC, we ought to track the context as partial, which will be undone when the TF is reflected. This was a nasty find. If you ran a VM as an SCP, the entire machine would appear to lock up. However, Core 0 was just spinning in an uninterruptible mess. I was able to poke it from core 1. ($ m monitor 1; vmrunkernel). I bisected the problem to the lazy VMCS unloading, and saw it was dying on the first pop. I had an unrelated VMENTER (launch/resume) error, which was triggering the failed pop. I somewhat suspected a bad VMENTER, since the VMCS was involved, and I recalled having issues in that area. printks and while(1) eventually pointed me to the sysret of the reflected SW TF, which sounded like the MSRs weren't reset (meaning, no GPC unload). Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
diff --git a/kern/arch/x86/process64.c b/kern/arch/x86/process64.c index 9ad7ba5..6d3c677 100644 --- a/kern/arch/x86/process64.c +++ b/kern/arch/x86/process64.c
@@ -209,6 +209,7 @@ assert(read_flags() & FL_ZF); tf->tf_exit_reason = EXIT_REASON_VMENTER_FAILED; tf->tf_exit_qual = vmcs_read(VM_INSTRUCTION_ERROR); + tf->tf_flags |= VMCTX_FL_PARTIAL; handle_bad_vm_tf(tf); }