)]}'
{
  "commit": "e11b3eb75680500cc0bb67130bbf435fc04b769f",
  "tree": "e0be0f4f6be27f73378b0e61f25edaf7fb0d05c9",
  "parents": [
    "5d17f489b91e07c541dfe1cae2501a75b60b6cf3"
  ],
  "author": {
    "name": "Barret Rhoden",
    "email": "brho@cs.berkeley.edu",
    "time": "Fri Jan 13 10:28:37 2017 -0500"
  },
  "committer": {
    "name": "Barret Rhoden",
    "email": "brho@cs.berkeley.edu",
    "time": "Wed Jan 18 10:00:02 2017 -0500"
  },
  "message": "x86: vmm: Flush the VMCS when changing owning_proc\n\nThe GPCs unload when we finalize the context, which happens before the\nprocess leaves the core at all.  The VMCS lingers, and my intent was to\nremove it from the core when the process leaves the core.\n\nHowever, abandon_core() was the wrong spot for that.  abandon_core() is\nmore of a \"make sure we get out of whatever process context was running on\nthis core\".  There\u0027s no guarantee that a process is even loaded at that\ntime.  What I really wanted was to drop the VMCS (if it was there at all)\nwhen the core is no longer designated as belonging to the proc.\n\nThe distinction between these two is related to the difference between\ncur_proc and owning_proc.  owning_proc is the process who should be running\non that core, specifically running owning_vcoreid with cur_ctx.  In\ncontrast, cur_proc (a.k.a. \u0027current\u0027) is the process whose address space is\nloaded, for whatever reason.  If the process is running in userspace,\ncur_proc \u003d\u003d owning_proc.  However, kthreads that work on syscalls for a\nprocess can run on cores that aren\u0027t owned by the proc.  Likewise, the\nkernel can temporarily enter a process\u0027s address space, which usually\ninvolves setting cur_proc.  e.g. send_event(), switch_to(), etc.\n\nGiven all this, here was the bug that caused this.  VMs would occasionally\ndie with an Invalid Opcode trap in the kernel.  I noticed this when they\nwere killed from ssh (ctrl-c), especially when the VM/VMM was busy.  I\ncould deterministically recreate it by having the guest spin and sending a\nctrl-c (kill from bash also worked, but kill -9 did not).\n\nThe invalid opcode happened during __invept().  From looking at the hexdump\nof the arguments, the eptp was 0.  That only gets cleared in __proc_free(),\nwhich made me suspect a refcnt problem.  The ref was indeed 0, so I thought\nthere was a problem with someone dropping a ref or not upping enough.\n\nIt turns out that all refs were accounted for, but the problem was\ntriggered by kthreads restarting and decreffing the proc before we called\nabandon_core().  Specifically:\n- kill sends a POSIX signal, the process calls sys_proc_destroy()\n- proc_destroy() wakes the parent and sends itself __death, all via KMSGs\n  to core_id().\n- By the time we get to __death, we\u0027re down to about two references:\n  owning_proc and cur_proc.  (I\u0027m ignoring the FDs here - there were a\nbunch of alarm FDs and syscalls that had refs).\n- __death clears_owning_proc, dropping the owning_proc ref.\n- Normally, you\u0027d think we\u0027d call abandon_core soon, but first we have the\n  __launch_kthread() KMSG, which is restarting our parent\u0027s wait syscall\n- When restarting that kthread, we switch cur_proc from the VMM (the one\n  that is dying, and has only one ref) to the parent.  In doing so, we drop\nthe final ref for the VMM, which triggers __proc_free() and clears the\neptp.\n- Once the parent\u0027s syscall is done, we prepare to idle and abandon_core().\n  This abandon call isn\u0027t clearing the VMM\u0027s context, it\u0027s clearing the\nparents.  abandon() doesn\u0027t really care what is running there.\n- At this point, __abandon_core() tries to clear the VMCS.  It had never\n  been cleared, but it\u0027s process (including the GPC!) had been freed.\nYikes!\n- The reason the GPC-\u003eproc refcnt was zero wasn\u0027t because someone messed up\n  the refcnts, it\u0027s because we didn\u0027t clear the GPC before dropping all the\nrefs.  That GPC-\u003eproc ref is internal (aka weak, uncounted).  The ref that\nkeeps the GPC alive is owning_proc (at least, now it is, after the fix).\n\nSigned-off-by: Barret Rhoden \u003cbrho@cs.berkeley.edu\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "7b72d2e01d46859da8fe0b92e1f28174f1583027",
      "old_mode": 33188,
      "old_path": "kern/arch/riscv/process.c",
      "new_id": "47d0f5ddfdddc354d36a7df91f9e8e9b7291566e",
      "new_mode": 33188,
      "new_path": "kern/arch/riscv/process.c"
    },
    {
      "type": "modify",
      "old_id": "810d453bf6804075b31d29dc7c3382661c0d4061",
      "old_mode": 33188,
      "old_path": "kern/arch/x86/process64.c",
      "new_id": "9ad7ba5cde8d81efaca5455a8b5a79b196f5b254",
      "new_mode": 33188,
      "new_path": "kern/arch/x86/process64.c"
    },
    {
      "type": "modify",
      "old_id": "1ccab3b5600f4c13e65104a6c72c275acd4e2028",
      "old_mode": 33188,
      "old_path": "kern/include/process.h",
      "new_id": "b2d7bf14562f43ad71cdf915621bbf1fc60d5325",
      "new_mode": 33188,
      "new_path": "kern/include/process.h"
    },
    {
      "type": "modify",
      "old_id": "518f37ab40958cf1a82ffcc926612e60f51f8672",
      "old_mode": 33188,
      "old_path": "kern/src/kthread.c",
      "new_id": "59db13c66c78aacece379ef3d44665be7767ed7a",
      "new_mode": 33188,
      "new_path": "kern/src/kthread.c"
    },
    {
      "type": "modify",
      "old_id": "43a18c0199ebaa85f16e9c2d30932efe70fa8f1c",
      "old_mode": 33188,
      "old_path": "kern/src/process.c",
      "new_id": "5ef5ee3a5640b25db5980617ba1f8997008c0fd6",
      "new_mode": 33188,
      "new_path": "kern/src/process.c"
    }
  ]
}
