| kernel_messages.txt |
| Barret Rhoden |
| 2010-03-19 |
| Updated 2012-11-14 |
| |
| This document explains the basic ideas behind our "kernel messages" (KMSGs) and |
| some of the arcane bits behind the implementation. These were formerly called |
| active messages, since they were an implementation of the low-level hardware |
| messaging. |
| |
| Overview: |
| -------------------------------- |
| Our kernel messages are just work that is shipped remotely, delayed in time, or |
| both. They currently consist of a function pointer and a few arguments. Kernel |
| messages of a given type will be executed in order, with guaranteed delivery. |
| |
| Initially, they were meant to be a way to immediately execute code on another |
| core (once interrupts are enabled), in the order in which the messages were |
| sent. This is insufficient (and wasn't what we wanted for the task, |
| incidentally). We simply want to do work on another core, but not necessarily |
| instantly. And not necessarily on another core. |
| |
| Currently, there are two types, distinguished by which list they are sent to per |
| core: immediate and routine. Routine messages are often referred to as RKMs. |
| Immediate messages will get executed as soon as possible (once interrupts are |
| enabled). Routine messages will be executed at convenient points in the kernel. |
| This includes when the kernel is about to pop back to userspace |
| (proc_restartcore()), or smp_idle()ing. Routine messages are necessary when |
| their function does not return, such as a __launch_kthread. They should also be |
| used if the work is not worth fully interrupting the kernel. (An IPI will still |
| be sent, but the work will be delayed). Finally, they should be used if their |
| work could affect currently executing kernel code (like a syscall). |
| |
| For example, some older KMSGs such as __startcore used to not return and would |
| pop directly into user space. This complicted the KMSG code quite a bit. While |
| these functions now return, they still can't be immediate messages. Proc |
| management KMSGs change the cur_ctx out from under a syscall, which can lead to |
| a bunch of issues. |
| |
| Immediate kernel messages are executed in interrupt context, with interrupts |
| disabled. Routine messages are only executed from places in the code where the |
| kernel doesn't care if the functions don't return or otherwise cause trouble. |
| This means RKMs aren't run in interrupt context in the kernel (or if the kernel |
| code itself traps). We don't have a 'process context' like Linux does, instead |
| its more of a 'default context'. That's where RKMs run, and they run with IRQs |
| disabled. |
| |
| RKMs can enable IRQs, or otherwise cause IRQs to be enabled. __launch_kthread |
| is a good example: it runs a kthread, which may have had IRQs enabled. |
| |
| With RKMs, there are no concerns about the kernel holding locks or otherwise |
| "interrupting" its own execution. Routine messages are a little different than |
| just trapping into the kernel, since the functions don't have to return and may |
| result in clobbering the kernel stack. Also note that this behavior is |
| dependent on where we call process_routine_kmsg(). Don't call it somewhere you |
| need to return to. |
| |
| An example of an immediate message would be a TLB_shootdown. Check current, |
| flush if applicable, and return. It doesn't harm the kernel at all. Another |
| example would be certain debug routines. |
| |
| History: |
| -------------------------------- |
| KMSGs have a long history tied to process management code. The main issues were |
| related to which KMSG functions return and which ones mess with local state (like |
| clobbering cur_ctx or the owning_proc). Returning was a big deal because you |
| can't just arbitrarily abandon a kernel context (locks or refcnts could be held, |
| etc). This is why immediates must return. Likewise, there are certain |
| invariants about what a core is doing that shouldn't be changed by an IRQ |
| handler (which is what an immed message really is). See all the old proc |
| management commits if you want more info (check for changes to __startcore). |
| |
| Other Uses: |
| -------------------------------- |
| Kernel messages will also be the basis for the alarm system. All it is is |
| expressing work that needs to be done. That being said, the k_msg struct will |
| probably receive a timestamp field, among other things. Routine messages also |
| will replace the old workqueue, which hasn't really been used in 40 months or |
| so. |
| |
| To Return or Not: |
| -------------------------------- |
| Routine k_msgs do not have to return. Immediate messages must. The distinction |
| is in how they are sent (send_kernel_message() will take a flag), so be careful. |
| |
| To retain some sort of sanity, the functions that do not return must adhere to |
| some rules. At some point they need to end in a place where they check routine |
| messages or enable interrupts. Simply calling smp_idle() will do this. The |
| idea behind this is that routine messages will get processed once the kernel is |
| able to (at a convenient place). |
| |
| Missing Routine Messages: |
| -------------------------------- |
| It's important that the kernel always checks for routine messages before leaving |
| the kernel, either to halt the core or to pop into userspace. There is a race |
| involved with messages getting posted after we check the list, but before we |
| pop/halt. In that time, we send an IPI. This IPI will force us back into the |
| kernel at some point in the code before process_routine_kmsg(), thus keeping us |
| from missing the RKM. |
| |
| In the future, if we know the kernel code on a particular core is not attempting |
| to halt/pop, then we could avoid sending this IPI. This is the essence of the |
| optimization in send_kernel_message() where we don't IPI ourselves. A more |
| formal/thorough way to do this would be useful, both to avoid bugs and to |
| improve cross-core KMSG performance. |
| |
| IRQ Trickiness: |
| -------------------------------- |
| You cannot enable interrupts in the handle_kmsg_ipi() handler, either in the |
| code or in any immediate kmsg. Since we send the EOI before running the handler |
| (on x86), another IPI could cause us to reenter the handler, which would spin on |
| the lock the previous context is holding (nested IRQ stacks). Using irqsave |
| locks is not sufficient, since they assume IRQs are not turned on in the middle |
| of their operation (such as in the body of an immediate kmsg). |
| |
| Other Notes: |
| -------------------------------- |
| Unproven hunch, but the main performance bottleneck with multiple senders and |
| receivers of k_msgs will be the slab allocator. We use the slab so we can |
| dynamically create the k_msgs (can pass them around easily, delay with them |
| easily (alarms), and most importantly we can't deadlock by running out of room |
| in a static buffer). |
| |
| Architecture Dependence: |
| -------------------------------- |
| Some details will differ, based on architectural support. For instance, |
| immediate messages can be implemented with true active messages. Other systems |
| with maskable IPI vectors can use a different IPI for routine messages, and that |
| interrupt can get masked whenever we enter the kernel (note, that means making |
| every trap gate an interrupt gate), and we unmask that interrupt when we want to |
| process routine messages. |
| |
| However, given the main part of kmsgs is arch-independent, I've consolidated all |
| of it in one location until we need to have separate parts of the implementation. |