| kernel_messages.txt | 
 | Barret Rhoden | 
 | 2010-03-19 | 
 | Updated 2012-11-14 | 
 |  | 
 | This document explains the basic ideas behind our "kernel messages" (KMSGs) and | 
 | some of the arcane bits behind the implementation.  These were formerly called | 
 | active messages, since they were an implementation of the low-level hardware | 
 | messaging. | 
 |  | 
 | Overview: | 
 | -------------------------------- | 
 | Our kernel messages are just work that is shipped remotely, delayed in time, or | 
 | both.  They currently consist of a function pointer and a few arguments.  Kernel | 
 | messages of a given type will be executed in order, with guaranteed delivery. | 
 |  | 
 | Initially, they were meant to be a way to immediately execute code on another | 
 | core (once interrupts are enabled), in the order in which the messages were | 
 | sent.  This is insufficient (and wasn't what we wanted for the task, | 
 | incidentally).  We simply want to do work on another core, but not necessarily | 
 | instantly.  And not necessarily on another core. | 
 |  | 
 | Currently, there are two types, distinguished by which list they are sent to per | 
 | core: immediate and routine.   Routine messages are often referred to as RKMs. | 
 | Immediate messages will get executed as soon as possible (once interrupts are | 
 | enabled).  Routine messages will be executed at convenient points in the kernel. | 
 | This includes when the kernel is about to pop back to userspace | 
 | (proc_restartcore()), or smp_idle()ing.  Routine messages are necessary when | 
 | their function does not return, such as a __launch_kthread.  They should also be | 
 | used if the work is not worth fully interrupting the kernel.  (An IPI will still | 
 | be sent, but the work will be delayed).  Finally, they should be used if their | 
 | work could affect currently executing kernel code (like a syscall). | 
 |  | 
 | For example, some older KMSGs such as __startcore used to not return and would | 
 | pop directly into user space.  This complicted the KMSG code quite a bit.  While | 
 | these functions now return, they still can't be immediate messages.  Proc | 
 | management KMSGs change the cur_ctx out from under a syscall, which can lead to | 
 | a bunch of issues. | 
 |  | 
 | Immediate kernel messages are executed in interrupt context, with interrupts | 
 | disabled.  Routine messages are only executed from places in the code where the | 
 | kernel doesn't care if the functions don't return or otherwise cause trouble. | 
 | This means RKMs aren't run in interrupt context in the kernel (or if the kernel | 
 | code itself traps).  We don't have a 'process context' like Linux does, instead | 
 | its more of a 'default context'.  That's where RKMs run, and they run with IRQs | 
 | disabled. | 
 |  | 
 | RKMs can enable IRQs, or otherwise cause IRQs to be enabled.  __launch_kthread | 
 | is a good example: it runs a kthread, which may have had IRQs enabled. | 
 |  | 
 | With RKMs, there are no concerns about the kernel holding locks or otherwise | 
 | "interrupting" its own execution.  Routine messages are a little different than | 
 | just trapping into the kernel, since the functions don't have to return and may | 
 | result in clobbering the kernel stack.  Also note that this behavior is | 
 | dependent on where we call process_routine_kmsg().  Don't call it somewhere you | 
 | need to return to. | 
 |  | 
 | An example of an immediate message would be a TLB_shootdown.  Check current, | 
 | flush if applicable, and return.  It doesn't harm the kernel at all.  Another | 
 | example would be certain debug routines. | 
 |  | 
 | History: | 
 | -------------------------------- | 
 | KMSGs have a long history tied to process management code.  The main issues were | 
 | related to which KMSG functions return and which ones mess with local state (like | 
 | clobbering cur_ctx or the owning_proc).  Returning was a big deal because you | 
 | can't just arbitrarily abandon a kernel context (locks or refcnts could be held, | 
 | etc).  This is why immediates must return.  Likewise, there are certain | 
 | invariants about what a core is doing that shouldn't be changed by an IRQ | 
 | handler (which is what an immed message really is).  See all the old proc | 
 | management commits if you want more info (check for changes to __startcore). | 
 |  | 
 | Other Uses: | 
 | -------------------------------- | 
 | Kernel messages will also be the basis for the alarm system.  All it is is | 
 | expressing work that needs to be done.  That being said, the k_msg struct will | 
 | probably receive a timestamp field, among other things.  Routine messages also | 
 | will replace the old workqueue, which hasn't really been used in 40 months or | 
 | so. | 
 |  | 
 | To Return or Not: | 
 | -------------------------------- | 
 | Routine k_msgs do not have to return.  Immediate messages must.  The distinction | 
 | is in how they are sent (send_kernel_message() will take a flag), so be careful. | 
 |  | 
 | To retain some sort of sanity, the functions that do not return must adhere to | 
 | some rules.  At some point they need to end in a place where they check routine | 
 | messages or enable interrupts.  Simply calling smp_idle() will do this.  The | 
 | idea behind this is that routine messages will get processed once the kernel is | 
 | able to (at a convenient place).  | 
 |  | 
 | Missing Routine Messages: | 
 | -------------------------------- | 
 | It's important that the kernel always checks for routine messages before leaving | 
 | the kernel, either to halt the core or to pop into userspace.  There is a race | 
 | involved with messages getting posted after we check the list, but before we | 
 | pop/halt.  In that time, we send an IPI.  This IPI will force us back into the | 
 | kernel at some point in the code before process_routine_kmsg(), thus keeping us | 
 | from missing the RKM. | 
 |  | 
 | In the future, if we know the kernel code on a particular core is not attempting | 
 | to halt/pop, then we could avoid sending this IPI.  This is the essence of the | 
 | optimization in send_kernel_message() where we don't IPI ourselves.  A more | 
 | formal/thorough way to do this would be useful, both to avoid bugs and to | 
 | improve cross-core KMSG performance. | 
 |  | 
 | IRQ Trickiness: | 
 | -------------------------------- | 
 | You cannot enable interrupts in the handle_kmsg_ipi() handler, either in the | 
 | code or in any immediate kmsg.  Since we send the EOI before running the handler | 
 | (on x86), another IPI could cause us to reenter the handler, which would spin on | 
 | the lock the previous context is holding (nested IRQ stacks).  Using irqsave | 
 | locks is not sufficient, since they assume IRQs are not turned on in the middle | 
 | of their operation (such as in the body of an immediate kmsg). | 
 |  | 
 | Other Notes: | 
 | -------------------------------- | 
 | Unproven hunch, but the main performance bottleneck with multiple senders and | 
 | receivers of k_msgs will be the slab allocator.  We use the slab so we can | 
 | dynamically create the k_msgs (can pass them around easily, delay with them | 
 | easily (alarms), and most importantly we can't deadlock by running out of room | 
 | in a static buffer). | 
 |  | 
 | Architecture Dependence: | 
 | -------------------------------- | 
 | Some details will differ, based on architectural support.  For instance, | 
 | immediate messages can be implemented with true active messages.  Other systems | 
 | with maskable IPI vectors can use a different IPI for routine messages, and that | 
 | interrupt can get masked whenever we enter the kernel (note, that means making | 
 | every trap gate an interrupt gate), and we unmask that interrupt when we want to | 
 | process routine messages. | 
 |  | 
 | However, given the main part of kmsgs is arch-independent, I've consolidated all | 
 | of it in one location until we need to have separate parts of the implementation. |