blob: 89fe6a8df054e31798796ee745a00ce9ede19b99 [file] [log] [blame]
We're adding the plan 9 file system to akaros.We're bringing in the
name space interface, including the ability to do binds, union mounts,
and the like. We will extend it to support things we might need,
in particular mmap. We will use it to add things to Akaros we
need, such as virtio drivers, ethernet drivers, and TCP.
By bringing this model in, we can make the Akaros interface more powerful,
yet simpler. We can remove a number of system calls right away
and yet still have the functions they provide, for example.
This is not a from scratch effort but a code translation. The Plan 9 code
deals with very subtle situations and has been hardened over time. No need
to relearn what they learned from scratch.
Currently we have a lot of the code in and are testing a first device --
the regression device from NxM.
Issues.
The biggest issue so far is implementing the Plan 9 error handling.
In Plan 9, errors are managed via a longjmp-like mechanism. For example,
in a call chain such as:
a()
b()
c()
d()
It is possible for 'd' to invoke an error that returns to 'a' directly.
This model has many advantages over the standard model, which looks like
this:
a{ b();}
b{ if c() return OK; else return -1;}
c{ if d() return OK; else return -1;}
d{ if something return OK; else return -1;}
In Plan 9 it can look like this:
a{ b();}
b{ c(); something else();}
c{ d(); other thing();}
d{ if something return OK; else error("bad");}
Note that the intermediate functions can be written as though nothing
went wrong, since anything that goes wrong gets bounced to the first level
error handler.
The handling is implemented via something called waserror().
In a() we do this:
a(){
if (waserror()) { handle error; poperror();}
b();
poperror();
}
and in d we might have this:
d(){
do something;
if (bad) error("bad");
return 0;
}
What if there's more than one error? There can be multiple invocations
of waserror in one function:
a(){
if (waserror()){ printf("b failed"); poperror(); return -2;}
b();
if (waserror()) { printf("z failed"); nexterror(); }
z();
poperror();
poperror();
}
Every waserror needs a matching poperror or nexterror in the same function as
the waserror, covering every exit from the function. nexterror is like
to poperror() then error("str"), but without resetting errstr.
Note that the error could have been anywhere in the call chain;
we don't care. From the point of view of a(), something failed, and we only
know about b() or z(), so we blame them. We also show in this example
nexterror(). Nexterror() pops back to the next error in the error stack,
which might be in this function or somewhere up the call chain.
How do we find out the ultimate blame? Recall that error takes a string,
and that can be anything. We can tell from that.
Where does the string in error() go?
In Plan 9, it goes into the proc struct; in Akaros,
it's copied to a user-mode buffer via set_errstr().
waserror()/error()/nexterror() manipulate an error call stack, similar to
the function call stack. In Plan 9, this stack is maintained in the proc
struct. This is cheap in Plan 9, since the compiler has caller-save, and
hence each stack entry is a stack and a PC. In a callee-save world, the
stack entries are much, much larger; so large that maintaining the stack
in the proc struct is impractical.
Hence, we've had to make changes that add a bit of inconvenience but
leave the code mostly intact. The error code in Akaros is tested in every
circumstance at this point due to all the bugs we had in our port
of the Plan 9 file system code.
So, we'll go from the easiest case to the hardest.
Case 1: You're a leaf function that does not use error(). No change.
Case 2: You're a leaf function that needs error(). Just call error().
Case 3: You're an intermediate function that calls functions that use error(),
even though you do not. No change.
Those are in some sense the easier cases. Now it starts to get a
bit harder.
Case 4: you're a leaf or intermediate function that uses waserror(). You need
to use a macro which creates an array of errbuf structs as automatics. The
waserror() usage does not change. The macro is ERRSTACK(x), where x is the
number of calls to waserror() in your function. See kern/sys/chan.c. Every call
to waserror needs to have a matching poperror. You cannot call nexterror or
poperror unless you are in the same scope as an ERRSTACK that had a waserror
call.
Case 5: you're a root node, i.e. you're the start of a chain of calls
via syscall that must do the "root" errbuf setup, so that all error
calls eventually return to you. In this case, you need to start the error
stack. This uses the same macro as case 4 (ERRSTACK(x)), for now.
Finally, if, in a waserror, you are finished and want to pop out to the
next error in the chain, either in the same function or up the call stack,
just call nexterror().
This can be handy for debugging: in any function that supports error(), i.e.
called from a function that called waserror(), simply call error and it bails
you out of wherever you are, doing appropriate cleanup on the way out. No need
to add return value processing to intermediate functions; if you try this out
you will likely find it is extremely handy. I really miss it on Linux. It made a
lot of the port debugging to Akaros a lot easier.
There is error checking in error()/waserror()/nexterror().
If you overflow or underflow the error stack the kernel will panic.
The implementation is in kern/src/error.c, the macros are in
kern/include/plan9.h.
Giant warning: if you access any automatic (local) variables inside the waserror
block that may have been modified after you started the waserror, those
variables need to be volatile. waserror() is implemented with setjmp and
according to one of the man pages:
automatic variables are unspecified after a call to longjmp() if
they meet all the following criteria:
- they are local to the function that made the corresponding setjmp(3)
call;
- their values are changed between the calls to setjmp(3) and
longjmp(); and
- they are not declared as volatile.
We could ask the compiler to tell us which variables are potentially clobbered
with -Wclobbered, however it is a noisy warning. It will warn even if the
variables are not used in the error case. That may be because the compiler has
a hard time deciding whether a variable is used or not, since we often longjmp
from within an error handler, though on gcc 4.9.2, even if we return
immediately, we still get a warning. On a related note, it's not always
possible to tell which case is the error handling case - consider the "discard"
pattern for waserror.
No amount of "returns_twice" or register clobbering or "memory" clobbering is
enough. Think about what happens when the variable is changed after the setjmp
call, i.e. farther down in the function. It's may be in a register, then we
call some other function, and that longjmps. That register value is gone (it
might be somewhere else on the stack). The compiler needs to know when it makes
that write that it needs to go onto the stack storage of the automatic variable.
That's 'volatile.'
The best we can hope for is the compiler to know what variables could be written
from one side of the setjmp and used on the other. Perhaps that will show up,
and then we can turn on -Wclobbered. Until then, we have to be vigilant, or use
different patterns for waserror. Note there are a bunch of bugs with
-Wclobbered detection, e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65041.
For info on the format of Ms (which are part of the kernel interface):
http://9p.cat-v.org/documentation/rfc/