# [Kos-cvs] kos-doc dynamic_stack.tex,1.3,1.4 kos_spec.txt,1.9,1.10

thomas at kos.enix.org thomas at kos.enix.org
Sun Jan 9 13:58:49 CET 2005

Update of /home/the-doors/kos/cvs/kos-doc
In directory the-doors:/tmp/cvs-serv6513

Modified Files:
dynamic_stack.tex kos_spec.txt
Log Message:
kos_spec.txt: Je signale que le document n'est plus à jour.

dynamic_stack.tex: Gros rewrap sur le document + mise à jour pour dire que finalement ça ne fonctionne pas (cf dernier paragraphe).

Index: dynamic_stack.tex
===================================================================
RCS file: /home/the-doors/kos/cvs/kos-doc/dynamic_stack.tex,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -d -r1.3 -r1.4
--- dynamic_stack.tex	7 Nov 2001 13:25:27 -0000	1.3
+++ dynamic_stack.tex	9 Jan 2005 12:58:46 -0000	1.4
@@ -22,80 +22,172 @@
\maketitle

\begin{abstract}
-This document describes the mechanisms of the x86 (IA32) architecture that are used to allow the kernel threads stacks to be dynamically expandable. The idea comes from M. Christophe Avoinne, and David Decotigny and I tested it practically. The experimentations have been made to be integrated into the Kid Operating System (\url{http://kos.enix.org}).
+
+This document describes the mechanisms of the x86 (IA32) architecture
+we tried to use to allow the kernel threads stacks to be dynamically
+expandable. The idea comes from M. Christophe Avoinne, and David
+Decotigny and I tested it practically. The experimentations have been
+made to be integrated into the Kid Operating System
+(\url{http://kos.enix.org}).
+
\end{abstract}

\tableofcontents

\section{Why dynamic kernel stack expansion ?}

-We wanted to have dynamic kernel stack expansion in our operating system, because resorting to a fixed (static) size for the stacks of the kernel threads didn't satisfy us. Linux, for instance, uses fixed size stacks, which turns out not to be always comfortable for kernel programming: limited number of (small) local variables, and limited function nesting. On the x86 architecture, Linux reserves 8 KB for each kernel thread's stack. This space is also used for storing some information about the thread, so if this thread uses too much kernel stack, it will overwrite these informations.
+We wanted to have dynamic kernel stack expansion in our operating
+system, because resorting to a fixed (static) size for the stacks of
+the kernel threads didn't satisfy us. Linux, for instance, uses fixed
+size stacks, which turns out not to be always comfortable for kernel
+programming: limited number of (small) local variables, and limited
+function nesting. On the x86 architecture, Linux reserves 8 KB for
+each kernel thread's stack. This space is also used for storing some
+stack, it will overwrite these informations.

-In order to gain more liberty, we could have said: let's put a lot of memory for each kernel stack, for instance 128 KB''. But it would not be a good decision since that much memory would have to be physically mapped, and it would unfortunately appear not to be used most of the time.
+In order to gain more liberty, we could have said: let's put a lot
+of memory for each kernel stack, for instance 128 KB''. But it would
+not be a good decision since that much memory would have to be
+physically mapped, and it would unfortunately appear not to be used
+most of the time.

-That was no a good solution for us: we really wanted something dynamic, which just fit the needs, something similar to demand paging'' for the kernel stacks.
+That was no a good solution for us: we really wanted something
+dynamic, which just fit the needs, something similar to demand
+paging'' for the kernel stacks.

\section{What's the problem with dynamic stack expansion ?}

-Imagine a process has used all of its physically-mapped kernel stack (during a syscall for instance). Next time it needs to push something on the stack, this just comes down to writing to an unmapped page: the page that lies just below the current physically-mapped stack bottom.
+Imagine a process has used all of its physically-mapped kernel stack
+(during a syscall for instance). Next time it needs to push something
+on the stack, this just comes down to writing to an unmapped page: the
+page that lies just below the current physically-mapped stack bottom.

-Because this page is unmapped, a \emph{Page Fault} occurs. But, in most operating systems (as in KOS), a \emph{Page Fault} is handled through an \emph{Interrupt Gate} or a \emph{Trap Gate}: the interrupted context is pushed on the stack. Unfortunately, we cannot save the context, because the stack is already full ! So an other \emph{Page Fault} occurs, generating a \emph{Double Fault} and then a \emph{Triple Fault}.
+most operating systems (as in KOS), a \emph{Page Fault} is handled
+through an \emph{Interrupt Gate} or a \emph{Trap Gate}: the
+interrupted context is pushed on the stack. Unfortunately, we cannot
+save the context, because the stack is already full ! So an other
+\emph{Page Fault} occurs, generating a \emph{Double Fault} and then a
+\emph{Triple Fault}.

-In Linux, the \emph{Double Fault} is handled so that it can show what was the problem, but it always needs a reboot.
+In Linux, the \emph{Double Fault} is handled so that it can show what
+was the problem, but it always needs a reboot.

\section{Why not using a \emph{Task Gate} for \emph{Page Fault} ?}

-As explained above, the problem is that we cannot save the context when \emph{Page Fault} is handled through an \emph{Interrupt Gate} or a \emph{Trap Gate}.
+As explained above, the problem is that we cannot save the context
+when \emph{Page Fault} is handled through an \emph{Interrupt Gate} or
+a \emph{Trap Gate}.

-Of course, we could handle the \emph{Page Fault} with a \emph{Task Gate}, so that it enables the \emph{Page Fault} to have a different stack, and have the interrupted context stored into a \emph{TSS}.
+Of course, we could handle the \emph{Page Fault} with a \emph{Task
+Gate}, so that it enables the \emph{Page Fault} to have a different
+stack, and have the interrupted context stored into a \emph{TSS}.

-Of course it would work, but \emph{Task Gate\/}s are much slower than \emph{Interrupt Gate\/}s\footnote{It is not as slow as many people say: we made some experimentations which resulted in a difference of between 20\% and 30\% on a P6.}. Actually, \emph{Page Fault} is very often called during the execution of an operating system, and most of the time it is not for a kernel stack expansion. Using \emph{Task Gate} would undermine all the other normal'' \emph{Page Fault}s, which would result in slower performances.
+Of course it would work, but \emph{Task Gate\/}s are much slower than
+\emph{Interrupt Gate\/}s\footnote{It is not as slow as many people
+say: we made some experimentations which resulted in a difference of
+between 20\% and 30\% on a P6.}. Actually, \emph{Page Fault} is very
+often called during the execution of an operating system, and most of
+the time it is not for a kernel stack expansion. Using \emph{Task
+Gate} would undermine all the other normal'' \emph{Page Fault}s,
+which would result in slower performances.

\section{Why not using \emph{Stack Fault} ?}

-An other possibility came to our mind, when reading the Intel Documentation\footnote{\url{http://developer.intel.com}}: the usage of the \emph{Stack Fault}.
+An other possible solution came to our mind, when reading the Intel
+Documentation\footnote{\url{http://developer.intel.com}}: the usage of
+the \emph{Stack Fault}.

-The \emph{Stack Fault} is exception number 12, and is a \emph{fault}, which means that returning from the fault handler is always possible. This fault occurs when ESP contains an address which is not included into a {\bf segment}. So, we could imagine to have segments which have just the size of the currently physically-mapped stack, and to increase these segments each time a \emph{Stack Fault} occurs. Theoretically, this would be possible. But we are working with \emph{gcc}, a C compiler which does not allow segmentation other than flat model.
+The \emph{Stack Fault} is exception number 12, and is a \emph{fault},
+which means that returning from the fault handler is always
+possible. This fault occurs when ESP contains an address which is not
+included into a {\bf segment}. So, we could imagine to have segments
+which have just the size of the currently physically-mapped stack, and
+to increase these segments each time a \emph{Stack Fault}
+occurs. Theoretically, this would be possible. But we are working with
+\emph{gcc}, a C compiler which does not allow segmentation other than
+flat model.

Thus, using the \emph{Stack Fault} proved to be impossible.

\section{Double Fault, the solution}

-When a process tries to access below the last page of its kernel stack, a \emph{Page Fault} is generated. Because this \emph{Page Fault} is most of the time handled through an \emph{Interrupt (or Trap) Gate}, a \emph{Double Fault} follows.
+When a process tries to access below the last page of its kernel
+stack, a \emph{Page Fault} is generated. Because this \emph{Page
+Fault} is most of the time handled through an \emph{Interrupt (or
+Trap) Gate}, a \emph{Double Fault} follows.

-The Intel Documentation says that \emph{Double Fault} is an \emph{abort}, which means that returning from the handler cannot be guaranteed. But M. Christophe Avoinne said that he tested it, and that it proved possible to return correctly from such a \emph{Double Fault}. We were very interested in this idea. We implemented and tested it in our operating system.
+The Intel Documentation says that \emph{Double Fault} is an
+\emph{abort}, which means that returning from the handler cannot be
+guaranteed. But M. Christophe Avoinne said that he tested it, and that
+it proved possible to return correctly from such a \emph{Double
+Fault}. We were very interested in this idea. We implemented and
+tested it in our operating system.

\section{The implementation}

\subsection{Two \emph{TSS}}

-\emph{Double Fault} must be handled through a \emph{Task Gate}, in order to save the interrupted context into a TSS, and to have a stack other than the one of the interrupted process. Executing some piece of code in the \emph{Double Fault} handler is therefore possible even when no more kernel stack is available to the interrupted process.
+\emph{Double Fault} must be handled through a \emph{Task Gate}, in
+order to save the interrupted context into a TSS, and to have a stack
+other than the one of the interrupted process. Executing some piece of
+code in the \emph{Double Fault} handler is therefore possible even
+when no more kernel stack is available to the interrupted process.

-We have two \emph{TSS}: one for the system, in which the context in saved when a \emph{Double Fault} occurs; and one for the \emph{Double Fault} handler, in order to make the associated \emph{Task Gate} working correctly.
+We have two \emph{TSS}: one for the system, in which the context in
+saved when a \emph{Double Fault} occurs; and one for the \emph{Double
+Fault} handler, in order to make the associated \emph{Task Gate}
+working correctly.

These two \emph{TSS} are, of course registered into the \emph{GDT}.

\subsection{The \emph{Task Gate} into the \emph{IDT}}

-All interruptions are handled through \emph{Interrupt (or Trap) Gates}, except for the \emph{Double Fault}, which is a {\em TSS} as mentioned above.  This \emph{Task Gate} points to the \emph{Double Fault TSS} so that, when the exception occurs, the \emph{double fault} handler is executed.
+All interruptions are handled through \emph{Interrupt (or Trap)
+Gates}, except for the \emph{Double Fault}, which is a {\em TSS} as
+mentioned above.  This \emph{Task Gate} points to the \emph{Double
+Fault TSS} so that, when the exception occurs, the \emph{double fault}
+handler is executed.

\begin{itemize}
-\item When using \emph{Task Gate}, the next execution of the associated handler restarts where it has been stopped last time. So, you need a \tt jmp \rm at the end of the handler, so that the execution will start again at the beginning.
-\item A special physical (and virtual) page is used for containing the stack for the \emph{Double Fault} Handler.
-\item The EFLAGS register in the \emph{Double Fault TSS} is initialized with bit \emph{NT} set. It means that, when returning with the \emph{IRET} at the end of the handler, the execution of the interrupted process starts again, by using the \emph{Previous link field} of the TSS.
-\end{itemize}

+\item When using \emph{Task Gate}, the next execution of the
+  associated handler restarts where it has been stopped last time. So,
+  you need a \tt jmp \rm at the end of the handler, so that the
+  execution will start again at the beginning.

-\section{Conclusion}
+\item A special physical (and virtual) page is used for containing the
+  stack for the \emph{Double Fault} Handler.

-This solution is quite undocumented, and strange. But we have tested it on Intel Processors (from old Pentium to recent Pentium III), on AMD Processors (K6, K6-2, Athlon and Duron) and on Cyrix processors. It also works under Bochs (version 1.2-pre1 required, or older version with a patch).
+\item The EFLAGS register in the \emph{Double Fault TSS} is
+  initialized with bit \emph{NT} set. It means that, when returning
+  with the \emph{IRET} at the end of the handler, the execution of the
+  interrupted process starts again, by using the \emph{Previous link
+    field} of the TSS.

-The \emph{Double Fault} perfectly allows dynamic stack expansion, and \emph{KOS} uses it with no problem !
+\end{itemize}

-If you want to get the sources of \emph{KOS}, with the implementation of dynamic stack expansion, go to \url{http://kos.enix.org} where you will find daily snapshots of our CVS.
+\section{Problems and conclusion}
+
+This solution has been implemented in
+KOS\footnote{\url{http://kos.enix.org}} for a while, but some problems
+have raised since the writing of this document.
+
+More specifically, we had troubles handling Double Fault that occured
+during the implicit {\em pushes} made by the processor when an
+interrupt is generated. During these early {\em pushes}, the bitmap of
+the interrupts is not yet updated, so the Double Fault handler doesn't
+know that an interrupt has been raised, and that the cause of the
+stack overflow is this interrupt. So, the Double Fault handler is not
+able to restart the interrupt : we missed one interrupt, which is
+unacceptable.
+
+We do not have any solution to this problem yet, and have given up the
+idea of dynamic kernel stack expansion.

\end{document}

Index: kos_spec.txt
===================================================================
RCS file: /home/the-doors/kos/cvs/kos-doc/kos_spec.txt,v
retrieving revision 1.9
retrieving revision 1.10
diff -u -d -r1.9 -r1.10
--- kos_spec.txt	19 Jun 2000 13:29:11 -0000	1.9
+++ kos_spec.txt	9 Jan 2005 12:58:47 -0000	1.10
@@ -1,3 +1,7 @@
+Ce document est un très ancien document de réflexion sur KOS, qui date
+d'avant août 2000, c'est à dire d'avant la version actuelle de KOS. Il
+est conservé uniquement pour l'historique.
+
Projet KOS : Document de specification
--------------------------------------