x86: Ensure boot_pgdir's user entries are unmapped

For sanity reasons, we should never have anything in boot_pgdir below ULIM.
Technically, we could make it work, but not without some thought.  The
issues is that PML4 entries are pointers, pointing to common PML3s.  Any
entry in boot_pgdir is shared with every process, from the PML3 on down.
The kernel expects to manage and synchronize global access to the kernel
mappings (above ULIM).  But memory below ULIM is managed per-process.

Backstory: I was trying to debug a null function pointer by mapping
something at page 0.  The kernel panicked after decreffing a page too many
times.

What happened was that inserting one page at virtual addr 0 created a PML3,
PML2, and PML1 that was shared between every process - not just the page
mapped at 0.  This PTE reach was 512 GB, including the program binary
(which was in the page cache).

The first process was fine.  However, when we forked, the pages for e.g.
busybox's text segment already had PTEs in the new process's address space.
Technically, they were the same PTEs (and PML3, 2, and 1) as the parent
process, since they were shared data structures.

Anyway, map_page_at_addr() saw that the PTE had a mapping, so it decreffed
the page before inserting a new page.  It just so happened that the new
page was the same as the old one, since it was a fork (duplicate_vmrs,
etc).  That page had a single refcnt, since it was managed by the page
cache, causing it to be freed.

Now the page is freed, but it is in the page tables still, since
map_page_at_addr() wrote the PTE with the value of the page.  Basically, it
was a "replace the PTE with its own value", but it triggered a decref.

Next up was the VMR for busybox's MAP_PRIVATE vmr.  Since it's a private
mapping, it gets its own page.  upage_alloc() gives us the most recently
freed page, which was the one that was still in the address space, right at
the top of the text segment.

Eventually the process page faults, probably because of the madness of its
address space.  When it does, the kernel puts every page in the VMRs.  At
least one page (the one we discussed) was in there twice.  The second
decref triggers the kref assert, since we decreffed something that was
already 0.

Good times.

Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
2 files changed
tree: 054e7fa2a4c62a955c4a23298f5aa7847c2ac6f8
  1. Documentation/
  2. kern/
  3. scripts/
  4. tests/
  5. tools/
  6. user/
  7. .clang-format
  8. .gitignore
  9. config-default
  10. GETTING_STARTED.md
  11. Kconfig
  12. LICENSE
  13. LICENSE-gpl-2.0.txt
  14. LICENSE-inferno
  15. LICENSE-lgpl-2.1.txt
  16. LICENSE-plan9
  17. Makefile
  18. Makelocal.template
  19. README.md
README.md

About Akaros

Akaros is an open source, GPL-licensed operating system for manycore architectures. Its goal is to provide better support for parallel and high-performance applications in the datacenter. Unlike traditional OSs, which limit access to certain resources (such as cores), Akaros provides native support for application-directed resource management and 100% isolation from other jobs running on the system.

Although not yet integrated as such, it is designed to operate as a low-level node OS with a higher-level Cluster OS, such as Mesos, governing how resources are shared amongst applications running on each node. Its system call API and “Many Core Process” abstraction better match the requirements of a Cluster OS, eliminating many of the obstacles faced by other systems when trying to isolate simultaneously running processes. Moreover, Akaros’s resource provisioning interfaces allow for node-local decisions to be made that enforce the resource allocations set up by a Cluster OS. This can be used to simplify global allocation decisions, reduce network communication, and ultimately promote more efficient sharing of resources. There is limited support for such functionality on existing operating systems.

Akaros is still very young, but preliminary results show that processes running on Akaros have an order of magnitude less noise than on Linux, as well as fewer periodic signals, resulting in better CPU isolation. Additionally, its non-traditional threading model has been shown to outperform the Linux NPTL across a number of representative application workloads. This includes a 3.4x faster thread context switch time, competitive performance for the NAS parallel benchmark suite, and a 6% increase in throughput over nginx for a simple thread-based webserver we wrote. We are actively working on expanding Akaros's capabilities even further.

Visit us at akaros.org

Installation

Instructions on installation and getting started with Akaros can be found in GETTING_STARTED.md

Documentation

Our current documentation is very lacking, but it is slowly getting better over time. Most documentation is typically available in the Documentation/ directory. However, many of these documents are outdated, and some general cleanup is definitely in order.

Mailing Lists

Want to join the developers mailing list?

Send an email to akaros+subscribe@googlegroups.com.

Or visit our google group and click “Join Group”

Want to report a bug?

Create a new issue here.

Want to chat on IRC?

brho hangs out (usually alone) in #akaros on irc.freenode.net. The other devs may pop in every now and then.

Contributing

Instructions on contributing can be found in Documentation/Contributing.md.

License

The Akaros repository contains a mix of code from different projects across a few top-level directories. The kernel is in kern/, userspace libraries are in user/, and a variety of tools can be found in tools/, including the toolchain.

The Akaros kernel is licensed under the GNU General Public License, version 2. Our kernel is made up of code from a number of other systems. Anything written for the Akaros kernel is licensed “GPLv2 or later”. However, other code, such as from Linux and Plan 9, are licensed GPLv2, without the “or later” clause. There is also code from BSD, Xen, JOS, and Plan 9 derivatives. As a whole, the kernel is licensed GPLv2.

Note that the Plan 9 code that is a part of Akaros is also licensed under the Lucent Public License. The University of California, Berkeley, has been authorised by Alcatel-Lucent to release all Plan 9 software previously governed by the Lucent Public License, Version 1.02 under the GNU General Public License, Version 2. Akaros derives its Plan 9 code from this UCB release. For more information, see LICENSE-plan9 or here.

Our user code is likewise from a mix of sources. All code written for Akaros, such as user/parlib/, is licensed under the GNU LGPLv2.1, or later. Plan 9 libraries, including user/iplib and user/ndblib are licensed under the LGPLv2.1, but without the “or later”. See each library for details.

Likewise, tools/ is a collection of various code. All of our contributions to existing code bases, such as GCC, glibc, and busybox, are licensed under their respective projects' licenses.