|  | kernel_messages.txt | 
|  | Barret Rhoden | 
|  | 2010-03-19 | 
|  | Updated 2012-11-14 | 
|  |  | 
|  | This document explains the basic ideas behind our "kernel messages" (KMSGs) and | 
|  | some of the arcane bits behind the implementation.  These were formerly called | 
|  | active messages, since they were an implementation of the low-level hardware | 
|  | messaging. | 
|  |  | 
|  | Overview: | 
|  | -------------------------------- | 
|  | Our kernel messages are just work that is shipped remotely, delayed in time, or | 
|  | both.  They currently consist of a function pointer and a few arguments.  Kernel | 
|  | messages of a given type will be executed in order, with guaranteed delivery. | 
|  |  | 
|  | Initially, they were meant to be a way to immediately execute code on another | 
|  | core (once interrupts are enabled), in the order in which the messages were | 
|  | sent.  This is insufficient (and wasn't what we wanted for the task, | 
|  | incidentally).  We simply want to do work on another core, but not necessarily | 
|  | instantly.  And not necessarily on another core. | 
|  |  | 
|  | Currently, there are two types, distinguished by which list they are sent to per | 
|  | core: immediate and routine.   Routine messages are often referred to as RKMs. | 
|  | Immediate messages will get executed as soon as possible (once interrupts are | 
|  | enabled).  Routine messages will be executed at convenient points in the kernel. | 
|  | This includes when the kernel is about to pop back to userspace | 
|  | (proc_restartcore()), or smp_idle()ing.  Routine messages are necessary when | 
|  | their function does not return, such as a __launch_kthread.  They should also be | 
|  | used if the work is not worth fully interrupting the kernel.  (An IPI will still | 
|  | be sent, but the work will be delayed).  Finally, they should be used if their | 
|  | work could affect currently executing kernel code (like a syscall). | 
|  |  | 
|  | For example, some older KMSGs such as __startcore used to not return and would | 
|  | pop directly into user space.  This complicted the KMSG code quite a bit.  While | 
|  | these functions now return, they still can't be immediate messages.  Proc | 
|  | management KMSGs change the cur_ctx out from under a syscall, which can lead to | 
|  | a bunch of issues. | 
|  |  | 
|  | Immediate kernel messages are executed in interrupt context, with interrupts | 
|  | disabled.  Routine messages are only executed from places in the code where the | 
|  | kernel doesn't care if the functions don't return or otherwise cause trouble. | 
|  | This means RKMs aren't run in interrupt context in the kernel (or if the kernel | 
|  | code itself traps).  We don't have a 'process context' like Linux does, instead | 
|  | its more of a 'default context'.  That's where RKMs run, and they run with IRQs | 
|  | disabled. | 
|  |  | 
|  | RKMs can enable IRQs, or otherwise cause IRQs to be enabled.  __launch_kthread | 
|  | is a good example: it runs a kthread, which may have had IRQs enabled. | 
|  |  | 
|  | With RKMs, there are no concerns about the kernel holding locks or otherwise | 
|  | "interrupting" its own execution.  Routine messages are a little different than | 
|  | just trapping into the kernel, since the functions don't have to return and may | 
|  | result in clobbering the kernel stack.  Also note that this behavior is | 
|  | dependent on where we call process_routine_kmsg().  Don't call it somewhere you | 
|  | need to return to. | 
|  |  | 
|  | An example of an immediate message would be a TLB_shootdown.  Check current, | 
|  | flush if applicable, and return.  It doesn't harm the kernel at all.  Another | 
|  | example would be certain debug routines. | 
|  |  | 
|  | History: | 
|  | -------------------------------- | 
|  | KMSGs have a long history tied to process management code.  The main issues were | 
|  | related to which KMSG functions return and which ones mess with local state (like | 
|  | clobbering cur_ctx or the owning_proc).  Returning was a big deal because you | 
|  | can't just arbitrarily abandon a kernel context (locks or refcnts could be held, | 
|  | etc).  This is why immediates must return.  Likewise, there are certain | 
|  | invariants about what a core is doing that shouldn't be changed by an IRQ | 
|  | handler (which is what an immed message really is).  See all the old proc | 
|  | management commits if you want more info (check for changes to __startcore). | 
|  |  | 
|  | Other Uses: | 
|  | -------------------------------- | 
|  | Kernel messages will also be the basis for the alarm system.  All it is is | 
|  | expressing work that needs to be done.  That being said, the k_msg struct will | 
|  | probably receive a timestamp field, among other things.  Routine messages also | 
|  | will replace the old workqueue, which hasn't really been used in 40 months or | 
|  | so. | 
|  |  | 
|  | Blocking: | 
|  | -------------------------------- | 
|  | If a routine kernel message blocks (or has a chance to block), it must | 
|  | smp_idle() at the end.  If it were to return to PRKM(), it could be on a new | 
|  | core, due to kthread migration.  We might have a way to enforce this later. | 
|  |  | 
|  | To Return or Not: | 
|  | -------------------------------- | 
|  | Routine k_msgs do not have to return.  Immediate messages must.  The distinction | 
|  | is in how they are sent (send_kernel_message() will take a flag), so be careful. | 
|  |  | 
|  | To retain some sort of sanity, the functions that do not return must adhere to | 
|  | some rules.  At some point they need to end in a place where they check routine | 
|  | messages or enable interrupts.  Simply calling smp_idle() will do this.  The | 
|  | idea behind this is that routine messages will get processed once the kernel is | 
|  | able to (at a convenient place). | 
|  |  | 
|  | Missing Routine Messages: | 
|  | -------------------------------- | 
|  | It's important that the kernel always checks for routine messages before leaving | 
|  | the kernel, either to halt the core or to pop into userspace.  There is a race | 
|  | involved with messages getting posted after we check the list, but before we | 
|  | pop/halt.  In that time, we send an IPI.  This IPI will force us back into the | 
|  | kernel at some point in the code before process_routine_kmsg(), thus keeping us | 
|  | from missing the RKM. | 
|  |  | 
|  | In the future, if we know the kernel code on a particular core is not attempting | 
|  | to halt/pop, then we could avoid sending this IPI.  This is the essence of the | 
|  | optimization in send_kernel_message() where we don't IPI ourselves.  A more | 
|  | formal/thorough way to do this would be useful, both to avoid bugs and to | 
|  | improve cross-core KMSG performance. | 
|  |  | 
|  | IRQ Trickiness: | 
|  | -------------------------------- | 
|  | You cannot enable interrupts in the handle_kmsg_ipi() handler, either in the | 
|  | code or in any immediate kmsg.  Since we send the EOI before running the handler | 
|  | (on x86), another IPI could cause us to reenter the handler, which would spin on | 
|  | the lock the previous context is holding (nested IRQ stacks).  Using irqsave | 
|  | locks is not sufficient, since they assume IRQs are not turned on in the middle | 
|  | of their operation (such as in the body of an immediate kmsg). | 
|  |  | 
|  | Other Notes: | 
|  | -------------------------------- | 
|  | Unproven hunch, but the main performance bottleneck with multiple senders and | 
|  | receivers of k_msgs will be the slab allocator.  We use the slab so we can | 
|  | dynamically create the k_msgs (can pass them around easily, delay with them | 
|  | easily (alarms), and most importantly we can't deadlock by running out of room | 
|  | in a static buffer). | 
|  |  | 
|  | Architecture Dependence: | 
|  | -------------------------------- | 
|  | Some details will differ, based on architectural support.  For instance, | 
|  | immediate messages can be implemented with true active messages.  Other systems | 
|  | with maskable IPI vectors can use a different IPI for routine messages, and that | 
|  | interrupt can get masked whenever we enter the kernel (note, that means making | 
|  | every trap gate an interrupt gate), and we unmask that interrupt when we want to | 
|  | process routine messages. | 
|  |  | 
|  | However, given the main part of kmsgs is arch-independent, I've consolidated all | 
|  | of it in one location until we need to have separate parts of the implementation. |