commit	7a1e025a7fa589ddb1c72f1f8e54cb0bb7ce8391	[log] [tgz]
author	Barret Rhoden <brho@cs.berkeley.edu>	Wed Oct 23 18:56:05 2019 -0400
committer	Barret Rhoden <brho@cs.berkeley.edu>	Wed Oct 23 19:06:29 2019 -0400
tree	a1f0a10458cc58d9a981e5a6cad55b3538ec815b
parent	2aa94575a5a7b809c593f73487f25ec4e2e90098 [diff]

vmm: reimplement the x86 instruction decoder

Our old decoder had a bunch of issues.  Whenever we get a new version of
Linux, we tend to have new instructions that need decoding.  The old
decoder was hard to extend, and it was also hiding a bunch of bugs.

Here are some of the problems the old decoder had:
- It assumed every operation was a load or store.  Including cmp (which
does not change registers/memory) and add (which does change them, but
only after adding).
- It did not set rflags
- It did not zero-extend 32-bit wide register results
- *word was busted.  At one point, we did word++ when we meant to
advance by a byte.
- etc.

The code was pretty 'swirly' too, where you'd have similar processing
repeated all over the place, like the REX checks.

To fix that, I added a 'decode' struct, to pass along values that we
determined, such as address size, operand size, rex bits, etc.

Best of all, we weren't computing the size correctly, since we didn't
really do the modrm handling right.  Here's the case:

	81 7d 00 5f 4d 50 5f    cmp    DWORD PTR [rbp+0x0],0x5f504d5f

That was being treated like it is only 4 bytes long, instead of 7.
Whoops!

However, it didn't crash, even though we set RIP to be part way (4
bytes) into the instruction!  Why?  well, those extra three bytes that
are just arbitrary numbers in the immediate32 part of the instruction
(which we end up running) decodes too!

	0:  4d 50                   rex.WRB push r8
	2:  5f                      pop    rdi

It pushes and pops, essentially clobbering rdi.  The Linux guest ends up
resetting rdi later, so no one noticed.

Had it been another value for the immed, we'd execute that too.  It
might blow up, and we'd notice.  But this one silently executed and
silently trashed a register.

To fix that, I needed better mod/rm+sib handling.  We still get away
with using GPA instead of decoding modrm+sib and translating through the
guest's page tables.  Ron's comment still applies.  =)

To handle the emulation of instructions, I had our callers pass us the
'access()' function.  So we can handle read-modify-write instructions,
like add.  Those didn't need to change too much, though I yanked out
destreg, which was just debug clutter.

I could have broken the commit up a little bit, but there wasn't a lot
of value in it, since the whole thing needed to be overhauled.

Note that the APIC_ACCESS and WRITE exits never happen.  That might have
been the case ever since we started using the x2APIC for the guest.

Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>

6 files changed

tree: a1f0a10458cc58d9a981e5a6cad55b3538ec815b

README.md

About Akaros

Akaros is an open source, GPL-licensed operating system for manycore architectures. Its goal is to provide better support for parallel and high-performance applications in the datacenter. Unlike traditional OSs, which limit access to certain resources (such as cores), Akaros provides native support for application-directed resource management and 100% isolation from other jobs running on the system.

Although not yet integrated as such, it is designed to operate as a low-level node OS with a higher-level Cluster OS, such as Mesos, governing how resources are shared amongst applications running on each node. Its system call API and “Many Core Process” abstraction better match the requirements of a Cluster OS, eliminating many of the obstacles faced by other systems when trying to isolate simultaneously running processes. Moreover, Akaros’s resource provisioning interfaces allow for node-local decisions to be made that enforce the resource allocations set up by a Cluster OS. This can be used to simplify global allocation decisions, reduce network communication, and ultimately promote more efficient sharing of resources. There is limited support for such functionality on existing operating systems.

Akaros is still very young, but preliminary results show that processes running on Akaros have an order of magnitude less noise than on Linux, as well as fewer periodic signals, resulting in better CPU isolation. Additionally, its non-traditional threading model has been shown to outperform the Linux NPTL across a number of representative application workloads. This includes a 3.4x faster thread context switch time, competitive performance for the NAS parallel benchmark suite, and a 6% increase in throughput over nginx for a simple thread-based webserver we wrote. We are actively working on expanding Akaros's capabilities even further.

Visit us at akaros.org

Installation

Instructions on installation and getting started with Akaros can be found in GETTING_STARTED.md

Documentation

Our current documentation is very lacking, but it is slowly getting better over time. Most documentation is typically available in the Documentation/ directory. However, many of these documents are outdated, and some general cleanup is definitely in order.

Mailing Lists

Want to join the developers mailing list?

Send an email to akaros+subscribe@googlegroups.com.

Or visit our google group and click “Join Group”

Want to report a bug?

Create a new issue here.

Want to chat on IRC?

brho hangs out (usually alone) in #akaros on irc.freenode.net. The other devs may pop in every now and then.

Contributing

Instructions on contributing can be found in Documentation/Contributing.md.

License

The Akaros repository contains a mix of code from different projects across a few top-level directories. The kernel is in kern/, userspace libraries are in user/, and a variety of tools can be found in tools/, including the toolchain.

The Akaros kernel is licensed under the GNU General Public License, version 2. Our kernel is made up of code from a number of other systems. Anything written for the Akaros kernel is licensed “GPLv2 or later”. However, other code, such as from Linux and Plan 9, are licensed GPLv2, without the “or later” clause. There is also code from BSD, Xen, JOS, and Plan 9 derivatives. As a whole, the kernel is licensed GPLv2.

Note that the Plan 9 code that is a part of Akaros is also licensed under the Lucent Public License. The University of California, Berkeley, has been authorised by Alcatel-Lucent to release all Plan 9 software previously governed by the Lucent Public License, Version 1.02 under the GNU General Public License, Version 2. Akaros derives its Plan 9 code from this UCB release. For more information, see LICENSE-plan9 or here.

Our user code is likewise from a mix of sources. All code written for Akaros, such as user/parlib/, is licensed under the GNU LGPLv2.1, or later. Plan 9 libraries, including user/iplib and user/ndblib are licensed under the LGPLv2.1, but without the “or later”. See each library for details.

Likewise, tools/ is a collection of various code. All of our contributions to existing code bases, such as GCC, glibc, and busybox, are licensed under their respective projects' licenses.