http://savannah.gnu.org/download/adeos/doc/porting.txt 8/19/03 -- Philippe Gerum Porting Adeos (beneath the Linux kernel) ======================================== O. Introduction --------------- This file contains practical information you might need in your attempt to port Adeos to other target architectures. This information is not intended to be exhaustive, but to give you the basic knowledge about the current Adeos implementation you should need to do the port (almost) easily. This document will be updated from time to time, as readers send their feedback. 1. What's Adeos? ---------------- The purpose of Adeos is to provide a flexible environment for sharing hardware resources among multiple operating systems, or among multiple instances of a single OS. To this end, Adeos enables multiple kernel components called domains to exist simultaneously on the same hardware. None of these domains necessarily see each other, but all of them see Adeos. A domain could be a complete OS, but there is no assumption being made regarding the sophistication of what's in a domain. In its current development stage, Adeos allows you to share hardware interrupts and system-originated events like traps and faults with the Linux kernel. 2. Why would you need Adeos beneath the Linux kernel? ----------------------------------------------------- o Because you might need to control the flow of hardware interrupts deterministically using a software layer, before the Linux kernel had a chance to process it. Such control might include intercepting, masking and/or prioritizing those interrupts. o Because you might need to monitor the Linux system calls, adding a custom prologue and/or an epilogue to them, without altering the code of the system calls themselves. o Because you might want a unified mechanism to monitor internal events occuring into the Linux kernel such as task switches, process creation, task signaling. You could also trigger your own notifications which would then get dispatched to any code monitoring them using the same mechanism. o Because you might need a unified API to access all or some of the previous features, so that your code works the same way using the same services regardless of the target Linux kernel version and/or processor architecture. Generally speaking, you might want to adapt Adeos to some version of the Linux kernel because you need to force Linux to share some critical resources/information with some of your code, written as a module or statically linked into the kernel, according to a priority scheme you defined (e.g. your code needs to handle processor interrupts and being notified of kernel syscalls issued by any application before the Linux kernel code handles them). For this reason, Adeos is a pretty useful kernel layer on top of which real-time systems can be seamlessly built into Linux, like RTAI (http://www.aero.polimi.it/~rtai/). But you could also use it as a basic component for enabling a kernel debugger, or for running multiple instances of the Linux kernel on the same hardware. 3. Pipeline and domains ----------------------- In order to allow interrupts and system events to be delivered to multiple kernel components, Adeos defines a special abstraction called a "domain". A domain is a (Linux) kernel-based software component which can ask the Adeos layer to be notified of: o every incoming hardware interrupt, o every system call issued by Linux applications, o every system event triggered by the Linux kernel code, o every additional event triggered by custom notification added. A domain can be available as a dynamic kernel module, or be a static part of the Linux kernel image, there is no incidence of this on Adeos behaviour. Adeos also ensures that events are dispatched in an orderly manner to the various client domains, so it is possible to provide for determinism. This is achieved by assigning each domain a static priority. This priority value strictly defines the delivery order of events to the domains. All active domains are queued according to their respective priority, forming the "pipeline" abstraction used by Adeos to make the events flow, from the most to the less prioritary domain. Incoming events are pushed to the head of the pipeline (i.e. to the most prioritary domain) and progress down to its tail (i.e. to the less prioritary domain). The Linux kernel code as a whole is a special pre-defined domain which is created by Adeos during the early stages of the Linux boot procedure. This initial domain is often referred to as the "root" domain, since the infrastructure it provides is needed to load other domains dynamically (e.g. the kernel module loader, the fs layer and so on). Event notifications also go through the pipeline, but are always produced by the domains themselves. Unlike IRQs which may be propagated to each and every domain queued to the pipeline, notifications of system events (e.g. system calls) triggered at a given stage of the pipeline are only delivered to the domains preceding the notifying domain (i.e. more prioritary than), and to the notifying domain's event handler eventually. The stage of the pipeline occupied by any given domain can be "stalled", which means that the next incoming hardware interrupts (and _not_ the events) will not be delivered to the domain's handler(s), and will be prevented from flowing down to the less prioritary domain(s) in the same move. While a stage is stalled, interrupts accumulate in the domain logs, and eventually get played when the stage is unstalled. You will notice that only interrupts might be blocked by a stalled stage, but traps, exceptions and other system notifications are not. Please refer to Karim Yaghmour's paper for a detailed description of the Adeos scheme: http://savannah.gnu.org/download/adeos/doc/adeos.pdf. 4. How does this work? ---------------------- First of all, Adeos takes over all interrupts, CPU traps and exceptions at the hardware level by reprogramming their software handlers in __adeos_enable_pipeline() (for x86, look at drivers/adeos/x86.c). In fact, Adeos actually replaces the previous handlers set up by the Linux kernel during the boot procedure with its own handlers. The original handlers then become the root domain's handlers for the corresponding interrupts/events. On x86, it's simply a matter of reprogramming the Interrupt Descriptor Table appropriately. On ARM-nommu, an Adeos hook is directly inserted into the Linux interrupt service code instead (see arch/armnommu/kernel/entry-armv.S). The best way to perform this take over depends on your target architecture and the Linux implementation for this target. In order to defer the interrupts dispatching so that each domain has its own interrupt log which gets eventually played in a timely manner, Adeos implements the "Optimistic interrupt protection" scheme as described by Stodolsky, Chen, and Bershad (http://citeseer.nj.nec.com/stodolsky93fast.html). At any point in time, and for any given CPU of an Adeos-enabled system, any given domain is whether: o running; o preempted by a prioritary domain for processing an incoming interrupt/event while it was itself processing its pending event(s); o voluntarily self-suspended, after all the interrupts/events directed to this domain have been processed. When a domain has finished processing the pending IRQs/events it has received, it only needs to call a special Adeos service named adeos_suspend_domain() which yields the CPU to the next domain down the pipeline, so the latter can process in turn the pending events it has been notified of, and this cycle continues down to the less prioritary domain of the pipeline. The last domain in the pipeline is also expected to provide for some idle loop waiting for hardware interrupts to occur. The Linux idle loop (e.g. arch/i386/kernel/process.c) is modified by the Adeos patch to call adeos_suspend_domain() accordingly. It should be noted that Adeos resumes a domain only if there are some IRQs/events it must process, or if it happens to have been preempted by a more prioritary domain receiving some IRQ/event to process. In other words, no domain is switched in unless some work is pending for it. Each time a domain is switched back in, its pending interrupts log is played and event handlers are fired as needed. When an interrupt occurs on an idle system, Adeos dispatches it to the most prioritary domain interested, and the pipelined processing cycle starts again. Given the domains A, B and C, where A is the most prioritary domain from the pipeline, you can be sure that: o If A runs, then B and C are either preempted or suspended; o If B runs, then A has suspended itself and C is either preempted or suspended; o if C runs, then A and B must have suspended themselves. 5. How is a domain implemented? ------------------------------- The interface between a domain and Adeos is composed of: o a descriptor (adomain_t) which is a data structure describing the domain properties at any point in time. o a domain entry routine, which Adeos calls to actually start a domain, just like the start_routine() argument of a POSIX pthread_create() call. This routine is expected to initialize the domain, like registering a set of event/IRQ handlers, then enter the domain's idle loop. In its simplest form, a domain contained in a kernel module looks like this: --- adomain_t domain_desc; void domain_entry (int iflag) { if (iflag) { /* init CPU? (only varies on SMP, always true otherwise) */ /* Ask Adeos to have our handlers called for a given set of IRQs/events as they flow down the pipeline. */ adeos_virtualize_irq(...); adeos_virtualize_irq(...); adeos_virtualize_irq(...); adeos_catch_event(...); } /* This is our idle loop for the current CPU. When Adeos switches us in, the pending IRQ/events handlers are fired upon awakening right before adeos_suspend_domain() returns. Since we have nothing else to do than processing interrupts and system events through our custom handlers, we only need to call back adeos_suspend_domain() immediately. */ for (;;) adeos_suspend_domain(); } int init_module (void) { adattr_t attr; adeos_init_attr(&attr); attr.name = "MyDomain"; attr.domid = 0x01010101; /* 0 is reserved, anything else is valid */ attr.entry = &domain_entry; attr.priority = ADEOS_ROOT_PRI + 100; /* Precede Linux in the pipeline */ return adeos_register_domain(&domain_desc,&attr); } void cleanup_module (void) { return adeos_unregister_domain(&domain_desc); } --- On a global scale, given the A, B and C domains, we would have: A:adeos_suspend_domain() => 1) yields to B 2) runs A-defined IRQ/event handlers B:adeos_suspend_domain() => 1) yields to C 2) runs B-defined IRQ/event handlers C:adeos_suspend_domain() => immediately returns. Upon interrupt or event, Adeos preempts the current domain, then resumes the most prioritary one among A, B or C which happens to have trapped the former (i.e. virtualized). If the current domain is selected and accepts interrupts, the interrupt handler is immediately fired on behalf of the Adeos layer. Events are always dispatched to the current domain regardless of the current interrupt masking state. 6. How is the Adeos nanokernel code organized? ---------------------------------------------- We are now about to describe the organization of the Adeos code for the 2.4.20/x86 platform, using the r9c1 patch as a reference point for a 2.4 kernel. This platform will probably be among the most complex implementations of Adeos, due to special considerations introduced by the SMP support and various interrupt-related quirks brought by this architecture. However, one should notice that the resulting implementation remains fairly simple, so porting Adeos to another architecture should almost be a no-brainer, provided the basics of such architecture (e.g. interrupt management) are well-understood in the first place. Here is a list of the modified/added files to a vanilla 2.4.20 kernel, along with the reasons for such changes/additions: ./arch/i386/config.in [ Add the Adeos configuration toggles. Defines CONFIG_ADEOS_CORE, CONFIG_ADEOS and CONFIG_ADEOS_MODULE. ] ./arch/i386/kernel/apic.c [ Nullify the APIC acknowledgement code while Adeos controls the interrupt flow. In such a case, APIC interrupts will be acknowledged at the hardware level by __adeos_handle_irq().] ./arch/i386/kernel/entry.S [ Replace the paired cli/sti insns by calls to the corresponding Adeos pipeline stall/unstall routines, so that Linux conforms to the pipeline scheme when masking/unmasking interrupts. Also add some event notification points (e.g. syscall prologue/epilogue). It might not be worth doing this for all implementations of Adeos, it really depends on the underlying architecture. For x86, reducing the interrupt-free sections at processor level this file defines for the Linux domain is worth the cost of calling the Adeos replacements for cli/sti. ] ./arch/i386/kernel/i386_ksyms.c [ Export arch-dependent Adeos routines. ] ./arch/i386/kernel/i8259.c [ Make mask_and_ack_8259A() Adeos-aware, so that Adeos can call this routine safely to acknowledge ISA interrupts from __adeos_handle_irq(). ] ./arch/i386/kernel/io_apic.c [ Probably the most twisted part of the Adeos code due to the IO-APIC management nightmare. The original Linux code is changed in order to guarantee that low priority IRQs won't be delayed waiting for a high priority interrupt handler to call end_level_ioapic_irq(). ] ./arch/i386/kernel/irq.c [ Nullify the PIC acknowledgement code for ISA interrupts while Adeos controls the interrupt flow. In such a case, ISA interrupts will be acknowledged from __adeos_handle_irq(). See arch/i386/kernel/i8259.c. ] ./arch/i386/kernel/Makefile [ Compile in the arch-dependent Adeos core support. ] ./arch/i386/kernel/process.c [ Tweak the Linux idle loop so that adeos_suspend_domain() is called as required. ] ./arch/i386/kernel/smp.c [ Nullify the acknowledgement code for SMP-related interrupts while Adeos controls the interrupt flow. In such a case, those interrupts will be acknowledged by __adeos_handle_irq(). ] ./arch/i386/kernel/time.c [ Convert the (now) pipelined spin_lock_irqsave/restore to a version still using CPU-based masking. This is needed to grab the i8259A_lock safely since this spinlock is also locked on behalf of __adeos_handle_irq() => __adeos_ack_common_irq(). ] ./include/asm-i386/smp.h [ Make sure smp_processor_id() can be used on behalf of any domain. In other words, do not rely on "current" anymore, since there is no underlying Linux task but on the root (i.e. Linux) domain. ] ./arch/i386/mm/ioremap.c [ Make sure that all ioremap'ed areas are immediately visible to any running domain. IOW, do not rely anymore on the on-demand mapping done by Linux in its page fault handler. WARNING: Changes to this file, and all other ioremap-related changes in other files are _not_ mandatory for Adeos to run properly, they only help developing some Adeos domains which would require it. They might even not apply at all if on-demand mapping of ioremap'ed areas is a no-brainer for your architecture (specifically if there is no MMU on board in the first place), so don't get stuck with it during the first order port. ] ./arch/i386/mm/fault.c [ Eagerly re-enable interrupts at _CPU level_ in do_page_fault(). ] ./include/asm-i386/pgalloc.h [ Add helper code for use in ioremap.c. ] ./include/asm-i386/system.h [ Use pipeline-based IRQ control instead of CPU-based one for controlling the interrupt flow. Changing this file appropriately and accurately is absolutely _crucial_ for Adeos to work properly. ] ./include/asm-i386/hw_irq.h [ Reserve a few IDT vectors for Adeos, so that non-Linux domains can use them safely for their own purpose. ] ./include/linux/sched.h [ Insert the Adeos-specific per-thread data slots into the Linux task_struct. This is needed to provide the adeos_ptd_set()/get() services. ] ./init/main.c [ Bootstrap the Adeos core support. Beware of the sequence between the Adeos take over, and the SMP and driver init code. You need to make sure that Adeos can take a complete snapshot of the system stable state (number of online CPUs, initial IRQ routing etc.) before taking over the box. ] ./kernel/ksyms.c [ Export arch-independent Adeos routines. ] ./kernel/Makefile [ Compile in the arch-independent Adeos core support. ] ./kernel/printk.c [ Make printk() Adeos-aware, so that any code can call this routine safely regardless of the domain it belongs to. However, expect time jitter, Adeos makes no miracle... ] ./kernel/sysctl.c [ Provide for /proc/adeos. ] ./kernel/fork.c [ Adeos-specific hardening of some Linux MM-related parts, and initialization of the PTD keys in do_fork(). ] ./kernel/sched.c [ Put Adeos notification points for the scheduler-related events. ] ./kernel/signal.c [ Put Adeos notification points for the signal-related events. ] ./kernel/exit.c [ Put Adeos notification points for the (linux task) exit-related events. ] ./drivers/Makefile [ Handle the compilation of the Adeos driver. ] ./mm/vmalloc.c [ Companion code for changes in arch/i386/mm/ioremap.c. ] 7. Porting Adeos in a few steps ------------------------------- You should preferably use any Adeos release >= r9c1 as your baseline for a 2.4 kernel port (or r2c1 for 2.6.0-testX), and in any case, nothing older than r8 (or r2 for 2.6.0-testX). There are only a few important changes between r8 and r9, but these are mixed with a bunch of cosmetic ones, so using r9 directly could be better in the long run if you plan to track the latest Adeos releases for a while. o Adeos core Install the Adeos core implementation at the right places into the Linux tree, updating the Linux Makefiles accordingly. The arch-dependent files will need to be ported to your target architecture. - arch//adeos.c - kernel/adeos.c - drivers/adeos/generic.c - drivers/adeos/.c - include/linux/adeos.h - include/asm-/adeos.h o Linux-specific changes Each time appropriate, apply all/some of the changes found in the standard kernel files listed in section 6 for x86, adapting the arch-dependent files to your target CPU. Some files and implementation may be quite different between x86 and your target architecture (like they are between x86 and ARM for instance), but for each x86 change you should usually be able to find an applicable counterpart for your target CPU. If your target architecture does not support SMP, you can even get rid of most of the SMP stuff, but I'd strongly suggest to keep the SMP-adaptive constructs (like adeos_declare_cpuid, adp_cpu_current[] etc.) used in the code shared between uniprocessor/SMP. The SMP-sensitive code has been crafted in a way that it is optimized away in uniprocessor configurations, so there is no cost in keeping it, but it is worth keeping the Adeos baseline as common as possible across CPU architectures. o Meaning of the Linux configuration switches Adeos adds 3 configuration switches: - CONFIG_ADEOS_CORE: When defined, the Adeos core support is compiled in, whether the Adeos driver is modular or built-in. - CONFIG_ADEOS: When defined, the Adeos driver is built-in, along with the Adeos core support. This implies CONFIG_ADEOS_CORE. The pipeline is always active from the kernel boot onwards. - CONFIG_ADEOS_MODULE: When defined, the Adeos driver is modular, whilst the Adeos core support is still built-in (can't be otherwise anyway). This implies CONFIG_ADEOS_CORE. The pipeline is only active when the Adeos driver is loaded. Each modification to the Linux code base is bracketed by a conditional section depending on at least one of these macros. This makes things easier to locate changes at a glance in a patched Linux tree and this allows to remove the Adeos support completely at compile-time, so I'd suggest you keep them for your port too. Until the Adeos core is working properly on your target, you should keep building the Adeos driver statically into the Linux kernel. Checking for correctness of the modular mode can be done much later, and since you would not benefit from the modularity until the Adeos core is up and running (i.e. any bug in the pipeline core would probably throw your box outside of the window, so no rmmod/fix/insmod cycle...), there is no use in caring of it at this stage. o Boostrap code Your Adeos-enabled kernel must call __adeos_init() very early in the Linux boostrap process to declare Linux's root domain, usually right after the IRQ support layer is initialized by a call to init_IRQ(). In a vanilla Linux tree, the boostrap code of interest for hooking Adeos is implemented in the start_kernel() routine, in init/main.c. If the Adeos driver (i.e. driver/adeos) is built statically into the kernel, you must also provide for taking over the interrupt control from Linux immediately by an explicit call to __adeos_takeover(). WARNING: this take over must be done once Linux has had a chance to initialize the hw interrupt routing and handlers normally, since __adeos_takeover() peeks at this information to interpose its own handlers. This is the reason why we call __adeos_takeover() on return of smp_init() on x86, since we know that the APIC configuration must have been completed at this point. If the Adeos driver is modular, the Adeos driver will call __adeos_takeover() when insmod'ed as part of its initialization chores. o Interruptions First, you need to virtualize the interrupt control for Linux. As a first guess, rush to include/asm-/system.h and try to locate macros used by Linux to mask/unmask the interrupts. These are local_irq_enable/disable, cli/sti and friends. Have a look at include/asm-i386/system.h from an Adeos-patched x86 kernel to get some inspiration. On some architectures, the actual interrupt control macros are implemented in CPU-dependent files drained by system.h, such as include/asm-arm/proc-*/ for the ARM targets. Just apply your changes there instead. ========> Now you should try booting your modified kernel for the first time and fix what's needed. o Events When you are confident enough with your virtualized interrupt sub-system, you should try catching some Linux kernel events using adeos_catch_event() in a simple test module, which does _not_ register any additional domain. Use the root one, since your box has correctly booted with an Adeos core, it is known to work. This step should validate the __adeos_handle_event() routine, and also the arch-dependent code which routes processor traps and exceptions to Adeos in the first place. For instance, this code is available from drivers/adeos/x86.c (BUILD_TRAP_xxx) for ia32 boxen. Some events are notified from /kernel/*entry.S, others are fired from other locations, like the Linux scheduler. o Multi-domain The last part of the job is to ensure that the pipeline is fully functional with more than a single domain. To this end, modify your test module to register a new domain with adeos_register_domain(). This domain should have a higher priority than Linux; you can use ADEOS_ROOT_PRI + 1 as its priority value to achieve this. Then, catch the system timer interrupt using adeos_virtualize_irq() in the domain entry routine, and simply forward each tick to the root (i.e. Linux) domain using adeos_propagate_irq(). This step should help validate the event propagation through adeos_suspend_domain(). In any case, all of the above code is fairly generic and has proved to work fine on x86 and ARM-nommu, so you should have no problem porting it to your new architecture, aside of a few small assembly-written parts which pass the events to Adeos (e.g. syscall interposition in /kernel/*entry.S, common trap handling block in drivers/adeos/.c). 8. Miscellanea -------------- 8.1 Patch combinations for the 2.4/x86 series Adeos can be combined with LTT, KPreempt and/or Low Latency patches. However, the amount of work needed to make such combo patches may vary depending on the combinations: - Adeos + LTT (in this order) is simple to obtain. You will have a single serious conflict only in arch/i386/entry.S, between the LTT syscall tracing code and Adeos's event handling one. Solving this by hand is rather straightforward if you are assembly-litterate. - Adeos + KPreempt (in this order) is fairly more complex but does work perfectly. To sum up, you will have a few conflicts to solve by hand into kernel/sched.c and maybe into arch/i386/kernel/entry.S. Knowing how kpreempt works is better to solve them appropriately. In any case, you will also need to hack linux/spinlock.h manually to make the preempt_XXX() macros Adeos-aware. Use an existing combo patch to learn how to do this. *BEWARE*: it may happen that conflicts do not even arise with entry.S, but the resulting file is _not functional_, so you have to hack it too. Again, look at an existing combo patch, and carefully follow the use of the "unstall_and_restore_all" label. - Adeos + Low Latency is a no brainer. There is no conflict between those patches. You should even be able to apply the lolat patch directly on top of an Adeos + Kpreempt combo tree. Pre-built combo patches will be available in the future on the Adeos website for selected revisions of the nanokernel. A combo patch including Adeos + KPreempt + LowLat + LTT already exists for R&D purposes and works fine. Some cleanup remains to be done, but you are welcome to ask for it by anticipation if interested. 8.2 Support for CONFIG_PREEMPT in 2.6 This support is now complete since Adeos r2c4 for the 2.6 series, thanks to the previous experience collected in this area for 2.4.