Linux Kernel Architecture
Module 20: More on LINUX
Linux Kernel Architecture
The Big Picture:
It is a good idea to look at the Linux kernel within the overall system’s overall context..
Applications and OS services:
These are the user application running on the Linux system. These applications are not fixed but typically include applications like email clients, text processors etc. OS services include utilities and services that are traditionally considered part of an OS like the windowing system, shells, programming interface to the kernel, the libraries and compilers etc.
Linux Kernel:
Kernel abstracts the hardware to the upper layers. The kernel presents the same view of the hardware even if the underlying hardware is ifferent. It mediates and controls access to system resources.
Hardware:
This layer consists of the physical resources of the system that finally do the actual work. This includes the CPU, the hard disk, the parallel port controllers, the system RAM etc. The Linux Kernel:
After looking at the big picture we should zoom into the Linux kernel to get a closer look.
Purpose of the Kernel:
The Linux kernel presents a virtual machine interface to user processes. Processes are written without needing any knowledge (most of the time) of the type of the physical hardware that constitutes the computer. The Linux kernel abstracts all hardware into a consistent interface.
In addition, Linux Kernel supports multi-tasking in a manner that is transparent to user processes: each process can act as though it is the only process on the computer, with exclusive use of main memory and other hardware resources. The kernel actually runs several processes concurrently, and mediates access to hardware resources so that each process has fair access while inter-process security is maintained.
The kernel code executes in privileged mode called kernel mode. Any code that does not need to run in privileged mode is put in the system library. The interesting thing about Linux kernel is that it has a modular architecture – even with binary codes: Linux kernel can load (and unload) modules dynamically (at run time) just as it can load or unload the system library modules.
Here we shall explore the conceptual view of the kernel without really bothering about the implementation issues (which keep on constantly changing any way). Kernel code provides for arbitrations and for protected access to HW resources. Kernel supports services for the applications through the system libraries. System calls within applications (may be written in C) may also use system library. For instance, the buffered file handling is operated and managed by Linux kernel through system libraries. Programs like utilities that are needed to initialize the system and configure network devices are classed as user mode programs and do not run with kernel privileges (unlike in Unix). Programs like those that handle login requests are run as system utilities and also do not require kernel privileges (unlike in Unix).
The Linux Kernel Structure Overview:
The “loadable” kernel modules execute in the privileged kernel mode – and therefore have the capabilities to communicate with all of HW.
Linux kernel source code is free. People may develop their own kernel modules. However, this requires recompiling, linking and loading. Such a code can be distributed under GPL. More often the modality is:
Start with the standard minimal basic kernel module. Then enrich the environment by the addition of customized drivers.
This is the route presently most people in the embedded system area are adopting world- wide.
The commonly loaded Linux system kernel can be thought of comprising of the following main components:
Process Management: User process as also the kernel processes seek the cpu and other services. Usually a fork system call results in creating a new process. System call execve results in execution of a newly forked process. Processes, have an id (PID) and also have a user id (UID) like in Unix. Linux additionally has a personality associated with a process. Personality of a process is used by emulation libraries to be able to cater to a range of implementations. Usually a forked process inherits parent’s environment.
In Linux Two vectors define a process: these are argument vector and environment vector. The environment vector essentially has a (name, value) value list wherein different environment variable values are specified. The argument vector has the command line arguments used by the process. Usually the environment is inherited however, upon execution of execve the process body may be redefined with a new set of environment variables. This helps in the customization of a process’s operational environment. Usually a process also has some indication on its scheduling context. Typically a process context includes information on scheduling, accounting, file tables, capability on signal handling and virtual memory context.
In Linux, internally, both processes and threads have the same kind of representation. Linux processes and threads are POSIX compliant and are supported by a threads library package which provides for two kinds of threads: user and kernel. User-controlled scheduling can be used for user threads. The kernel threads are scheduled by the kernel. While in a single processor environment there can be only one kernel thread scheduled. In a multiprocessor environment one can use the kernel supported library and clone system call to have multiple kernel threads created and scheduled.
Scheduler:
Schedulers control the access to CPU by implementing some policy such that the CPU is shared in a way that is fair and also the system stability is maintained. In Linux scheduling is required for the user processes and the kernel tasks. Kernel tasks may be internal tasks on behalf of the drivers or initiated by user processes requiring specific OS services. Examples are: a page fault (induced by a user process) or because some device driver raises an interrupt. In Linux, normally, the kernel mode of operation can not be pre-empted. Kernel code runs to completion - unless it results in a page fault, or an interrupt of some kind or kernel code it self calls the scheduler. Linux is a time sharing system. So a timer interrupt happens and rescheduling may be initiated at that time. Linux
uses a credit based scheduling algorithm. The process with the highest credits gets scheduled. The credits are revised after every run. If all run-able processes exhaust all the credits a priority based fresh credit allocation takes place. The crediting system usually gives higher credits to interactive or IO bound processes – as these require immediate responses from a user. Linux also implements Unix like nice process characterization.
The Memory Manager:
Memory manager manages the allocation and de-allocation of system memory amongst the processes that may be executing concurrently at any time on the system. The memory manager ensures that these processes do not end up corrupting each other’s memory area. Also, this module is responsible for implementing virtual memory and the paging mechanism within it. The loadable kernel modules are managed in two stages:
First the loader seeks memory allocation from the kernel. Next the kernel returns the address of the area for loading the new module.
➢ The linking for symbols is handled by the compiler because whenever a new module is loaded recompilation is imperative.
The Virtual File System (VFS):
Presents a consistent file system interface to the kernel. This allows the kernel code to be independent of the various file systems that may be supported (details on virtual file system VFS follow under the files system).
The Network Interface:
Provides kernel access to various network hardware and protocols.
Inter Process Communication (IPC):
The IPC primitives for processes also reside on the same system. With the explanation above we should think of the typical loadable kernel module in Linux to have three main components:
➢ Module management,
➢ Driver registration and
➢ Conflict resolution mechanism.
Module Management:
For new modules this is done at two levels – the management of kernel referenced symbols and the management of the code in kernel memory. The Linux kernel maintains a symbol table and symbols defined here can be exported (that is these definitions can be used elsewhere) explicitly. The new module must seek these symbols. In fact this is like having an external definition in C and then getting the definition at the kernel compile time. The module management system also defines all the required communications interfaces for this newly inserted module. With this done, processes can request the services (may be of a device driver) from this module. Driver registration:
The kernel maintains a dynamic table which gets modified once a new module is added – some times one may wish to delete also. In writing these modules care is taken to ensure that initializations and cleaning up operations are defined for the driver. A module may register one or more drivers of one or more types of drivers. Usually the registration of drivers is maintained in a registration table of the module. The registration of drives entails the following:
1. Driver context identification: as a character or bulk device or a network driver.
2. File system context: essentially the routines employed to store files in Linux virtual file system or network file system like NFS.
3. Network protocols and packet filtering rules.
4. File formats for executable and other files.
Conflict Resolution:
The PC hardware configuration is supported by a large number of chip set configurations and with a large range of drivers for SCSI devices, video display devices and adapters, network cards. This results in the situation where we have
module device drivers which vary over a very wide range of capabilities and options. This necessitates a conflict resolution mechanism to resolve accesses in a variety of conflicting concurrent accesses. The conflict resolution mechanisms help in preventing modules from having an access conflict to the HW – for example an access to a printer. Modules usually identify the HW resources it needs at the time of loading and the kernel makes these available by using a reservation table. The kernel usually maintains information on the address to be used for accessing HW - be it DMA channel or an interrupt line. The drivers avail kernel services to access HW resources.
System Calls:
Let us explore how system calls are handled. A user space process enters the kernel. From this point the mechanism is some what CPU architecture dependent. Most common examples of system calls are: - open, close, read, write, exit, fork, exec, kill, socket calls etc.
The Linux Kernel 2.4 is non preemptable. Implying once a system call is executing it will run till it is finished or it relinquishes control of the CPU. However, Linux kernel
2.6 has been made partly preemptable. This has improved the responsiveness considerably and the system behavior is less ‘jerky’.
Systems Call Interface in Linux:
System call is the interface with which a program in user space program accesses kernel functionality. At a very high level it can be thought of as a user process calling a function in the Linux Kernel. Even though this would seem like a normal C function call, it is in fact handled differently. The user process does not issue a system call directly - in stead, it is internally invoked by the C library.
Linux has a fixed number of system calls that are reconciled at compile time. A user process can access only these finite set of services via the system call interface. Each system call has a unique identifying number. The exact mechanism of a system call implementation is platform dependent. Below we discuss how it is done in the x86 architecture.
To invoke a system call in x86 architecture, the following needs to be done. First, a system call number is put into the EAX hardware register. Arguments to the system call are put into other hardware registers. Then the int0x80 software interrupt is
issued which then invokes the kernel service.
Adding one’s own system call is a pretty straight forward (almost) in Linux. Let us try to implement our own simple system call which we will call ‘simple’ and whose source we will put in simple.c.
/* simple.c */
/* this code was never actually compiled and tested */
#include<linux/simple.h> asmlinkage int sys_simple(void)
{
return 99;
}
As can be seen that this a very dumb system call that does nothing but return 99. But that is enough for our purpose of understanding the basics.
This file now has to be added to the Linux source tree for compilation by executing:
/usr/src/linux.*.*/simple.c
Those who are not familiar with kernel programming might wonder what “asmlinkage” stands for in the system call. ‘C’ language does not allow access hardware directly. So, some assembly code is required to access the EAX register etc. The asmlinkage macro does the dirty work fortunately.
The asmlinkage macro is defined in XXXX/linkage.h. It initiates another macro_syscall in XXXXX/unistd.h. The header file for a typical system call will contain the following.
After defining the system call we need to assign a system call number. This can be done by adding a line to the file unistd.h . unistd.h has a series of #defines of the form:
#define _NR_sys_exit 1
Now if the last system call number is 223 then we enter the following line at the bottom
#define _NR_sys_simple 224
After assigning a number to the system call it is entered into system call table. The system call number is the index into a table that contains a pointer to the actual routine. This table is defined in the kernel file ‘entry.S’ .We add the following line to the file :
* this code was never actually compiled and tested
*/.long SYSMBOL_NAME(sys_simple)
Finally, we need to modify the makefile so that our system call is added to the kernel when it is compiled. If we look at the file /usr/src/linux.*.*/kernel/Makefile we get a line of the following format.
obj_y= sched.o + dn.o …….etc we add: obj_y += simple.o
Now we need to recompile the kernel. Note that there is no need to change the config file. With the source code of the Linux freely available, it is possible for users to make their own versions of the kernel. A user can take the source code select only the parts of the kernel that are relevant to him and leave out the rest. It is possible to get a working Linux kernel in single 1.44 MB floppy disk. A user can modify the source for the kernel so that the kernel suits a targeted application better. This is one of the reasons why Linux is the successful (and preferred) platform for developing embedded systems In fact, Linux has reopened the world of system programming.
The Memory Management Issues
The two major components in Linux memory management are:
- The page management
- The virtual memory management
1. The page management: Pages are usually of a size which is a power of 2. Given the main memory Linux allocates a group of pages using a buddy system. The allocation is the responsibility of a software called “page allocator”. Page allocator software is responsible for both allocation, as well as, freeing the memory. The basic memory allocator uses a buddy heap which allocates a contiguous area of size 2n > the required memory with minimum n obtained by successive generation of “buddies” of equal size. We explain the buddy allocation using an example. An Example: Suppose we need memory of size 1556 words. Starting with a memory size 16K we would proceed as follows:
1. First create 2 buddies of size 8k from the given memory size ie. 16K
2. From one of the 8K buddy create two buddies of size 4K each
3. From one of the 4k buddy create two buddies of size 2K each.
4. Use one of the most recently generated buddies to accommodate the 1556 size memory requirement.
Note that for a requirement of 1556 words, memory chunk of size 2K words satisfies the property of being the smallest chunk larger than the required size.
Possibly some more concepts on page replacement, page aging, page flushing and the changes done in Linux 2.4 and 2.6 in these areas.
2. Virtual memory Management: The basic idea of a virtual memory system is to expose address space to a process. A process should have the entire address space exposed to it to make an allocation or deallocation. Linux makes a conscious effort to allocate logically, “page aligned” contiguous address space. Such page aligned logical spaces are called regions in the memory.
Linux organizes these regions to form a binary tree structure for fast access. In addition to the above logical view the Linux kernel maintains the physical view ie maps the hardware page table entries that determine the location of the logical page in the exact location on a disk. The process address space may have private or shared pages. Changes made to a page require that locality is preserved for a process by maintaining a copy-on-write when the pages are private to the process where as these have to be visible when they are shared.
A process, when first created following a fork system call, finds its allocation with a new entry in the page table – with inherited entries from the parent. For any page which is shared amongst the processes (like parent and child), a reference count is maintained. Linux has a far more efficient page swapping algorithm than Unix – it uses a second chance algorithm dependent on the usage pattern. The manner it manifests it self is that a page gets a few chances of survival before it is considered to be no longer useful. Frequently used pages get a higher age value and a reduction in usage brings the age closer to zero – finally leading to its exit.
The Kernel Virtual Memory: Kernel also maintains for each process a certain amount of “kernel virtual memory” – the page table entries for these are marked ”protected”. The kernel virtual memory is split into two regions. First there is a static region which has the core of the kernel and page table references for all the normally allocated pages that can not be modified. The second region is dynamic - page table entries created here may point anywhere and can be modified.
Loading, Linking and Execution: For a process the execution mode is entered following an exec system call. This may result in completely rewriting the previous execution context – this, however, requires that the calling process is entitled an access to the called code. Once the check is through the loading of the code is initiated. Older versions of Linux used to load binary files in the a.out format. The current version also loads binary files in ELF format. The ELF format is flexible as it permits adding additional information for debugging etc. A process can be executed when all the needed library routines have also been linked to form an executable module. Linux supports dynamic linking. The dynamic linking is achieved in two stages:
1. First the linking process downloads a very small statically linked function – whose task is to read the list of library functions which are to be dynamically linked.
2. Next the dynamic linking follows - resolving all symbolic references to get a loadable executable.
Comments
Post a Comment