· KLDP.org · KLDP.net · KLDP Wiki · KLDP BBS ·
Interrupt And Exception

  • When happen interrupt,
    • it does this by saving the current value of the program counter (i.e., the content of the eip and cs registers) in the Kernel Mode stack and by placing an address related to the interrupt type into the program counter.

  • There are some things in this chapter that will remind you of the context switch described in the previous chapter, carried out when a kernel substitutes one process for another. But there is a key difference between interrupt handling and process switching: the code executed by an interrupt or by an exception handler is not a process. Rather, it is a kernel control path that runs at the expense of the same process that was running when the interrupt occurred. As a kernel control path, the interrupt handler is lighter than a process (it has less context and requires less time to set up or tear down).

  • Kernel must satisfy the following constraints

    1. The kernel's goal is therefore to get the interrupt out of the way as soon as possible and defer as much processing as it can
    2. Because interrupts can come anytime, the kernel might be handling one of them while another one (of a different type) occurs.
    3. Although the kernel may accept a new interrupt signal while handling a previous one, some critical regions exist inside the kernel code where interrupts must be disabled. Such critical regions must be limited as much as possible

  • The role of PIC(ProgrammableInterruptController)

    1. Monitors the IRQ lines, checking for raised signals. If two or more IRQ lines are raised, selects the one having the lower pin number.
    2. If a raised signal occurs on an IRQ line
      1. Converts the raised signal received into a corresponding vector
      2. Stores the vector in an Interrupt Controller I/O port, thus allowing the CPU to read it via the data bus
      3. Sends a raised signal to the processor INTR pinthat is, issues an interrupt.
      4. Waits until the CPU acknowledges the interrupt signal by writing into one of the Programmable Interrupt Controllers (PIC) I/O ports; when this occurs, clears the INTR line.
    3. Goes back to step 1.

  • Each IRQ line can be selectively disabled. Thus, the PIC can be programmed to disable IRQs. Disabled interrupts are not lost; the PIC sends them to the CPU as soon as they are enabled again. This feature is used by most interrupt handlers, because it allows them to process IRQs of the same type serially.

  • Selective enabling/disabling of IRQs is not the same as global masking/unmasking of maskable interrupts. When the IF flag of the eflags register is clear, each maskable interrupt issued by the PIC is temporarily ignored by the CPU. The cli and sti assembly language instructions, respectively, clear and set that flag.

  • The Advanced Programmable Interrupt Controller (APIC)
    • The previous description refers to PICs designed for uniprocessor systems. If the system includes a single CPU, the output line of the master PIC can be connected in a straightforward way to the INTR pin the CPU. However, if the system includes two or more CPUs, this approach is no longer valid and more sophisticated PICs are needed.

    • all current 80 x 86 microprocessors include a local APIC.Each local APIC has 32-bit registers, an internal clock; a local timer device; and two additional IRQ lines, LINT 0 and LINT 1, reserved for local APIC interrupts.All local APICs are connected to an external I/O APIC, giving rise to a multi-APIC system.

  • apic.jpg
    [JPG image (10.69 KB)]

    • The I/O APIC consists of a set of 24 IRQ lines, a 24-entry Interrupt Redirection Table, programmable registers, and a message unit for sending and receiving APIC messages over the APIC bus. Unlike IRQ pins of the 8259A, interrupt priority is not related to pin number.
    • each entry in the Redirection Table can be individually programmed to indicate the interrupt vector and priority, the destination processor, and how the processor is selected. The information in the Redirection Table is used to translate each external IRQ signal into a message to one or more local APIC units via the APIC bus.

      • Static distribution
        • The IRQ signal is delivered to the local APICs listed in the corresponding Redirection Table entry. The interrupt is delivered to one specific CPU, to a subset of CPUs, or to all CPUs at once (broadcast mode).
      • Dynamic distribution
        • The IRQ signal is delivered to the local APIC of the processor that is executing the process with the lowest priority.
        • Every local APIC has a programmable task priority register (TPR), which is used to compute the priority of the currently running process. Intel expects this register to be modified in an operating system kernel by each process switch.

      • Besides distributing interrupts among processors, the multi-APIC system allows CPUs to generate interprocessor interrupts . When a CPU wishes to send an interrupt to another CPU, it stores the interrupt vector and the identifier of the target's local APIC in the Interrupt Command Register (ICR) of its own local APIC. A message is then sent via the APIC bus to the target's local APIC, which therefore issues a corresponding interrupt to its own CPU.

      • Many of the current uniprocessor systems include an I/O APIC chip, which may be configured in two distinct ways:

        • As a standard 8259A-style external PIC connected to the CPU. The local APIC is disabled and the two LINT 0 and LINT 1 local IRQ lines are configured, respectively, as the INTR and NMI pins.

        • As a standard external I/O APIC. The local APIC is enabled, and all external interrupts are received through the I/O APIC.

  • Hardware Handling of Interrupts and Exceptions
    1. Determines the vector i (0 i 255) associated with the interrupt or the exception
    2. Reads the i th entry of the IDT referred by the idtr register
    3. Gets the base address of the GDT from the gdtr register and looks in the GDT to read the Segment Descriptor identified by the selector in the IDT entry. This descriptor specifies the base address of the segment that includes the interrupt or exception handler.
    4. Makes sure the interrupt was issued by an authorized source. First, it compares the Current Privilege Level (CPL), which is stored in the two least significant bits of the cs register, with the Descriptor Privilege Level (DPL ) of the Segment Descriptor included in the GDT. Raises a "General protection " exception if the CPL is lower than the DPL, because the interrupt handler cannot have a lower privilege than the program that caused the interrupt
    5. Checks whether a change of privilege level is taking place that is, if CPL is different from the selected Segment Descriptor's DPL. If so, the control unit must start using the stack that is associated with the new privilege level. It does this by performing the following steps
      1. Reads the tr register to access the TSS segment of the running process.
      2. Loads the ss and esp registers with the proper values for the stack segment and stack pointer associated with the new privilege level. These values are found in the TSS
      3. In the new stack, it saves the previous values of ss and esp, which define the logical address of the stack associated with the old privilege level.
    6. If a fault has occurred, it loads cs and eip with the logical address of the instruction that caused the exception so that it can be executed again.
    7. Saves the contents of eflags , cs, and eip in the stack.
    8. If the exception carries a hardware error code, it saves it on the stack
    9. Loads cs and eip, respectively, with the Segment Selector and the Offset fields of the Gate Descriptor stored in the i th entry of the IDT. These values define the logical address of the first instruction of the interrupt or exception handler.

  • After the interrupt or exception is processed, the corresponding handler must relinquish control to the interrupted process by issuing the iret instruction, which forces the control unit to:

    1. Load the cs, eip, and eflags registers with the values saved on the stack. If a hardware error code has been pushed in the stack on top of the eip contents, it must be popped before executing iret.

    2. Check whether the CPL of the handler is equal to the value contained in the two least significant bits of cs (this means the interrupted process was running at the same privilege level as the handler). If so, iret concludes execution; otherwise, go to the next step.

    3. Load the ss and esp registers from the stack and return to the stack associated with the old privilege level.

    4. Examine the contents of the ds, es, fs, and gs segment registers; if any of them contains a selector that refers to a Segment Descriptor whose DPL value is lower than CPL, clear the corresponding segment register. The control unit does this to forbid User Mode programs that run with a CPL equal to 3 from using segment registers previously used by kernel routines (with a DPL equal to 0). If these registers were not cleared, malicious User Mode programs could exploit them in order to access the kernel address space.

Nested Execution of Exception and Interrupt Handlers

  • The price to pay for allowing nested kernel control paths is that an interrupt handler must never block, that is, no process switch can take place until an interrupt handler is running. In fact, all the data needed to resume a nested kernel control path is stored in the Kernel Mode stack, which is tightly bound to the current process.

  • An interrupt handler may preempt both other interrupt handlers and exception handlers. Conversely, an exception handler never preempts an interrupt handler. The only exception that can be triggered in Kernel Mode is "Page Fault," which we just described. But interrupt handlers never perform operations that can induce page faults, and thus, potentially, a process switch.

  • On multiprocessor systems, several kernel control paths may execute concurrently. Moreover, a kernel control path associated with an exception may start executing on a CPU and, due to a process switch, migrate to another CPU. ??

Initializing the Interrupt Descriptor Table

  • The int instruction allows a User Mode process to issue an interrupt signal that has an arbitrary vector ranging from 0 to 255. Therefore, initialization of the IDT must be done carefully, to block illegal interrupts and exceptions simulated by User Mode processes via int instructions. This can be achieved by setting the DPL field of the particular Interrupt or Trap Gate Descriptor to 0. If the process attempts to issue one of these interrupt signals, the control unit checks the CPL value against the DPL field and issues a "General protection " exception.In a few cases, however, a User Mode process must be able to issue a programmed exception. To allow this, it is sufficient to set the DPL field of the corresponding Interrupt or Trap Gate Descriptors to 3 that is, as high as possible

  • Interrupt gate
    • An Intel interrupt gate that cannot be accessed by a User Mode process (the gate's DPL field is equal to 0). All Linux interrupt handlers are activated by means of interrupt gates , and all are restricted to Kernel Mode.
  • System gate
    • An Intel trap gate that can be accessed by a User Mode process (the gate's DPL field is equal to 3). The three Linux exception handlers associated with the vectors 4, 5, and 128 are activated by means of system gates , so the three assembly language instructions into , bound , and int $0x80 can be issued in User Mode.
  • System interrupt gate
    • An Intel interrupt gate that can be accessed by a User Mode process (the gate's DPL field is equal to 3). The exception handler associated with the vector 3 is activated by means of a system interrupt gate, so the assembly language instruction int3 can be issued in User Mode.
  • Trap gate
    • An Intel trap gate that cannot be accessed by a User Mode process (the gate's DPL field is equal to 0). Most Linux exception handlers are activated by means of trap gates .
  • Task gate
    • An Intel task gate that cannot be accessed by a User Mode process (the gate's DPL field is equal to 0). The Linux handler for the "Double fault " exception is activated by means of a task gate.

  • set_intr_gate(n,addr)
    • Inserts an interrupt gate in the n th IDT entry. The Segment Selector inside the gate is set to the kernel code's Segment Selector. The Offset field is set to addr, which is the address of the interrupt handler. The DPL field is set to 0.
  • set_system_gate(n,addr)
    • Inserts a trap gate in the n th IDT entry. The Segment Selector inside the gate is set to the kernel code's Segment Selector. The Offset field is set to addr, which is the address of the exception handler. The DPL field is set to 3.
  • set_system_intr_gate(n,addr)
    • Inserts an interrupt gate in the n th IDT entry. The Segment Selector inside the gate is set to the kernel code's Segment Selector. The Offset field is set to addr, which is the address of the exception handler. The DPL field is set to 3.
  • set_trap_gate(n,addr)
    • Similar to the previous function, except the DPL field is set to 0.
  • set_task_gate(n,gdt)
    • Inserts a task gate in the n th IDT entry. The Segment Selector inside the gate stores the index in the GDT of the TSS containing the function to be activated. The Offset field is set to 0, while the DPL field is set to 3.

Preliminary Initialization of the IDT

During kernel initialization, the setup_idt( ) assembly language function starts by filling all 256 entries of idt_table with the same interrupt gate, which refers to the ignore_int( ) interrupt handler:
        lea ignore_int, %edx
        movl $(_ _KERNEL_CS << 16), %eax
        movw %dx, %ax       /* selector = 0x0010 = cs */
        movw $0x8e00, %dx   /* interrupt gate, dpl=0, present */
        lea idt_table, %edi
        mov $256, %ecx
        movl %eax, (%edi)
        movl %edx, 4(%edi)
        addl $8, %edi
        dec %ecx
        jne rp_sidt

Exception Handling

  • Exception handlers have a standard structure consisting of three steps:
    • Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language).
    • Handle the exception by means of a high-level C function.
    • Exit from the handler by means of the ret_from_exception( ) function.

Interrupt Handling

Interrupt handler flexibility is achieved in two distinct ways, as discussed in the following list.

  • IRQ sharing

    The interrupt handler executes several interrupt service routines (ISRs). Each ISR is a function related to a single device sharing the IRQ line. Because it is not possible to know in advance which particular device issued the IRQ, each ISR is executed to verify whether its device needs attention; if so, the ISR performs all the operations that need to be executed when the device raises an interrupt.

  • IRQ dynamic allocation

    An IRQ line is associated with a device driver at the last possible moment; for instance, the IRQ line of the floppy device is allocated only when a user accesses the floppy disk device. In this way, the same IRQ vector may be used by several hardware devices even if they cannot share the IRQ line; of course, the hardware devices cannot be used at the same time. (See the discussion at the end of this section.)

Linux divides the actions to be performed following an interrupt into three classes:

  • Critical

    Actions such as acknowledging an interrupt to the PIC, reprogramming the PIC or the device controller, or updating data structures accessed by both the device and the processor. These can be executed quickly and are critical, because they must be performed as soon as possible. Critical actions are executed within the interrupt handler immediately, with maskable interrupts disabled.

  • Noncritical

    Actions such as updating data structures that are accessed only by the processor (for instance, reading the scan code after a keyboard key has been pushed). These actions can also finish quickly, so they are executed by the interrupt handler immediately, with the interrupts enabled.

  • Noncritical deferrable

    Actions such as copying a buffer's contents into the address space of a process (for instance, sending the keyboard line buffer to the terminal handler process). These may be delayed for a long time interval without affecting the kernel operations; the interested process will just keep waiting for the data. Noncritical deferrable actions are performed by means of separate functions that are discussed in the later section "Softirqs and Tasklets."

Regardless of the kind of circuit that caused the interrupt, all I/O interrupt handlers perform the same four basic actions

  • Save the IRQ value and the register's contents on the Kernel Mode stack.
  • Send an acknowledgment to the PIC that is servicing the IRQ line, thus allowing it to issue further interrupts.
  • Execute the interrupt service routines (ISRs) associated with all the devices that share the IRQ.
  • Terminate by jumping to the ret_from_intr( ) address.

Table 4-5. Flags describing the IRQ line status

Flag name Description
IRQ_INPROGRESS A handler for the IRQ is being executed.
IRQ_DISABLED The IRQ line has been deliberately disabled by a device driver.
IRQ_PENDING An IRQ has occurred on the line; its occurrence has been acknowledged to the PIC, but it has not yet been serviced by the kernel.
IRQ_REPLAY The IRQ line has been disabled but the previous IRQ occurrence has not yet been acknowledged to the PIC.
IRQ_AUTODETECT The kernel is using the IRQ line while performing a hardware device probe.
IRQ_WAITING The kernel is using the IRQ line while performing a hardware device probe; moreover, the corresponding interrupt has not been raised.

During system initialization, the init_IRQ( ) function sets the status field of each IRQ main descriptor to IRQ _DISABLED. Moreover, init_IRQ( ) updates the IDT by replacing the interrupt gates set up by setup_idt( ) with new ones. This is accomplished through the following statements:
for (i = 0; i < NR_IRQS; i++)
       if (i+32 != 128)

Notice that the interrupt gate corresponding to vector 128 is left untouched, because it is used for the system call's programmed exception Linux uses a PIC object, consisting of the PIC name and seven PIC standard methods. The data structure that defines a PIC object is called hw_interrupt_type (also called hw_irq_controller).

    struct hw_interrupt_type i8259A_irq_type = {
        .typename     = "XT-PIC",
        .startup      = startup_8259A_irq,
        .shutdown     = shutdown_8259A_irq,
        .enable       = enable_8259A_irq,
        .disable      = disable_8259A_irq,
        .ack          = mask_and_ack_8259A,
        .end          = end_8259A_irq,
        .set_affinity = NULL

As described earlier, multiple devices can share a single IRQ. Therefore, the kernel maintains irqaction descriptors (see Figure 4-5 earlier in this chapter), each of which refers to a specific hardware device and a specific interrupt. The fields included in such descriptor are shown in Table 4-6, and the flags are shown in Table 4-7.

Table 4-6. Fields of the irqaction descriptor

Field name Description
handler Points to the interrupt service routine for an I/O device. This is the key field that allows many devices to share the same IRQ.
flags This field includes a few fields that describe the relationships between the IRQ line and the I/O device (see Table 4-7).
mask Not used.
name The name of the I/O device (shown when listing the serviced IRQs by reading the /proc/interrupts file).
dev_id A private field for the I/O device. Typically, it identifies the I/O device itself (for instance, it could be equal to its major and minor numbers; see the section "Device Files" in Chapter 13), or it points to the device driver's data.
next Points to the next element of a list of irqaction descriptors. The elements in the list refer to hardware devices that share the same IRQ.
irq IRQ line.
dir Points to the descriptor of the /proc/irq/n directory associated with the IRQn.
Table 4-7. Flags of the irqaction descriptor

Flag name Description
SA_INTERRUPT The handler must execute with interrupts disabled.
SA_SHIRQ The device permits its IRQ line to be shared with other devices.
SA_SAMPLE_RANDOM The device may be considered a source of events that occurs randomly; it can thus be used by the kernel random number generator. (Users can access this feature by taking random numbers from the /dev/random and /dev/urandom device files.)

Sin has many tools, but a lie is the handle which fits them all.

sponsored by andamiro
sponsored by cdnetworks
sponsored by HP

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2006-05-28 22:44:24
Processing time 0.0131 sec