Hardware and Software Concepts:Hardware Components.

Hardware Components

A computer’s hardware consists of its physical devices—processor(s), main memory and input/output devices. The following subsections describe hardware components that an operating system manages to meet its users’ computing needs.

Mainboards

Computers rely on interactions between many hardware devices to satisfy the requirements of the system. To enable communication among independent devices, computers are equipped with one or more printed circuit boards (PCBs).A PCB is a hardware component that provides electrical connections between devices at var- ious locations on the board.

The mainboard (also called the motherboard), the central PCB in a system, can be thought of as the backbone of a computer. The mainboard provides slots into which other components—such as the processor, main memory and other hardware devices—are inserted. These slots provide access to the electrical connec- tions between the various hardware components and enable users to customize their computers’ hardware configuration by adding devices to, and removing them from, the slots. The mainboard is one of four hardware components required to exe- cute instructions in a general-purpose computer. The other three are the processor (Section 2.3.2, Processors), main memory (Section 2.3.5, Main Memory) and secondary storage (Section 2.3.6, Secondary Storage).

Traditional metal wires are too wide for establishing the large number of electrical connections between components in today’s systems. Thus, mainboards typically consist of several extremely thin layers of silicon containing microscopic electrical connections called traces that serve as communication channels and pro- vide connectivity on the board. A large set of traces forms a high-speed communication channel known as a bus.

Most mainboards include several computer chips to perform low-level operations. For example, mainboards typically contain a basic input/output system (BIOS) chip that stores instructions for basic hardware initialization and management. The BIOS is also responsible for loading the initial portion of the operating system into memory,a process called bootstrapping (see Section 2.4.3, Bootstrapping). After the operating system has been loaded, it can use the BIOS to communicate with a system’s hardware to perform low-level (i.e., basic) I/O operations. Mainboards also contain chips called controllers that manage data transfer on the board’s buses. A mainboard’s chipset is the collection of controllers, coprocessors, buses and other hardware integrated onto the mainboard that determine the system’s hardware capabilities (e.g., which types of processors and memory are supported).

A recent trend in mainboard design is to integrate powerful hardware components onto the PCB. Traditionally, many of these were inserted into slots as add-on cards. Many of today’s mainboards include chips that perform graphics processing, networking and RAID (Redundant Array of Independent Disks) operations. These on-board devices reduce the overall system cost and have contributed significantly to the continuing sharp decline in computer prices.A disadvantage is that they are permanently attached to the mainboard and cannot be replaced easily.

Self Review

1. What is the primary function of the mainboard?

2. Why is the BIOS crucial to computer systems?

Ans: 1) The mainboard serves as the backbone for communication between hardware components, allowing them to communicate via the electrical connections on the board. 2) The BIOS performs basic hardware initialization and management and loads the initial component of the operating system into memory. The BIOS also provides instructions that enable the operating system to communicate with system hardware.

Processors

A processor is a hardware component that executes a stream of machine-language instructions. Processors can take many forms in computers, such as a central pro- cessing unit (CPU), a graphics coprocessor or a digital signal processor (DSP). A CPU is a processor that executes the instructions of a program; a coprocessor, such as a graphics or digital signal processor, is designed to efficiently execute a limited set of special-purpose instructions (such as 3D transformations). In embedded systems, processors might perform specific tasks, such as converting a digital signal to an analog audio signal in a cell phone—an example of a DSP. As a primary processor in the system, a CPU executes the bulk of the instructions, but might increase efficiency by sending computationally intensive tasks to a coprocessor specifically designed to handle them. Throughout the rest of this book, we use the term “processor” or “general-purpose processor” when referring to a CPU.

The instructions a processor can execute are defined by its instruction set. The size of each instruction, or the instruction length, might differ among architectures and within each architecture—some processors support multiple instruction sizes. The processor architecture also determines the amount of data that can be operated on at once. For instance,a 32-bit processor manipulates data in discrete units of 32 bits.

Modern processors perform many resource management operations in hard- ware to boost performance. Such features include support for virtual memory and hardware interrupts—two important concepts discussed later in this book.

Despite the variety of processor architectures, several components are present in almost all contemporary processors. Such components include the instruction fetch unit, branch predictor, execution unit, registers, caches and a bus interface (Fig. 2.2). The instruction fetch unit loads instructions into high-speed memory called instruction registers so that the processor can execute the instruction quickly. The instruction decode unit interprets the instruction and passes the corresponding input for the execution unit to perform the instruction. The main portion of the exe- cution unit is the arithmetic and logic unit (ALU), which performs basic arithmetic and logical operations, such as addition, multiplication and logical comparisons (note that the “V” shape of the ALU is common in architecture diagrams).

The bus interface allows the processor to interact with memory and other devices in the system. Because processors typically operate at much higher speeds than main memory, they contain high-speed memory called cache that stores copies of data in main memory. Caches increase processor efficiency by enabling fast access to data and instructions. Because high-speed caches are significantly more expensive than main memory, they tend to be relatively small. The caches are classified in levels—Level 1 (L1) is the fastest and most expensive cache and is located on the processor; the Level 2 (L2) cache, which is larger and slower than the L1 cache,

image

is often located on the mainboard, but is increasingly being integrated onto the processor to improve performance.7

Registers are high-speed memories located on a processor that hold data for immediate use by the processor. Before a processor can operate on data, the data must be placed in registers. Storing processor instructions in any other slower type of memory would be inefficient, because the processor would idle while waiting for data access. Registers are hard-wired to the processor circuitry and physically located near the execution units, making access to registers faster than access to the L1 cache. The size of the registers is determined by the number of bits the processor can operate on at once. For example,a 32-bit processor can store 32 bits of data in each register. The majority of processors in personal computers today are 32-bit processors; 64-bit processors are becoming increasingly popular.8 Each processor architecture provides a different number of registers, and each register serves a particular purpose. For example, the Intel Pentium 4 processor provides 16 program execution registers. Typically, half of these registers are reserved for use by applications for quick access to data values and pointers during execution. Such registers are called general-purpose registers. IBM’s PowerPC 970 processor (used in Apple’s G5 computers) contains 32 general-purpose registers. The other registers (often called control registers) store system-specific information, such as the program counter, which the processor uses to determine the next instruction to execute.9

Self Review

1. Differentiate between a CPU and a coprocessor. How might a system benefit from multiple CPUs? How might a system benefit from multiple coprocessors?

2. What aspects of a system does a processor architecture specify?

3. Why is access to register memory faster than access to any other type of memory, including L1 cache?

Ans: 1) A CPU executes machine-language instructions; a coprocessor is optimized to per- form special-purpose instructions. Multiple CPUs would allow a system to execute more than one program at once; multiple coprocessors could improve performance by performing processing in parallel with a CPU. 2) A CPU’s architecture specifies the computer’s instruction set, virtual memory support and interrupt structure. 3) Registers are hard-wired to the processor circuitry and physically located near the execution units.

Clocks

Computer time is often measured in cycles, also called a clocktick. The term cycle refers to one complete oscillation of an electrical signal provided by the system clock generator. The clock generator sets the cadence for a computer system, much like the conductor of an orchestra. Specifically, the clock generator determines the frequency at which buses transfer data, typically measured in cycles per second, or hertz (Hz). For example, the frontside bus (FSB), which connects processors to memory modules, typically operates at several hundred megahertz (MHz; one megahertz is one million hertz).

Most modern desktop processors execute at top speeds of hundreds of mega- hertz (MHz) or even several billion hertz, or gigahertz (GHz), which is often faster than the frontside bus. Processors and other devices generate derived speeds by multiplying or dividing the speed of the frontside bus.10 For example,a 2GHz processor with a 200MHz frontside bus uses a multiplier of 10 to generate its cycles; a 66MHz sound card uses a divider of 2.5 to generate its cycles.

Self Review

1. (T/F) All components of a system operate at the same clock speed.

2. What problems might arise if one component on a bus has an extremely high multiplier and another component on the same bus has an extremely high divider?

Ans: 1) False. Devices usually use a multiplier or a divider that defines the device’s speed relative to the speed of the frontside bus. 2) Bottlenecks could occur, because a component with a high divider will operate at a much slower speed than a device with a high multiplier.A high- multiplier device that relies on information from a high-divider device will be made to wait.

Memory Hierarchy

The size and the speed of memory are limited by the laws of physics and economics. Almost all electronic devices transfer data using electrons passing through traces on PCBs. There is a limit to the speed at which electrons can travel; the longer the wire between two terminals, the longer the transfer will take. Further, it is prohibitively expensive to equip processors with large amounts of memory that can respond to requests for data at (or near) processor speeds.

The cost/performance trade-off characterizes the memory hierarchy (Fig. 2.3). The fastest and most expensive memory is at the top and typically has a small capacity. The slowest and least expensive memory is at the bottom and typically has a large capacity. Note that the size of each block represents how capacity increases for slower memories, but the figure is not drawn to scale.

Registers are the fastest and most expensive memory on a system—they oper- ate at the same speed as processors. Cache memory speeds are measured according to their latency—the time required to transfer data. Latencies are typically measured in nanoseconds or processor cycles. For example, the L1 cache for an Intel Pentium 4 processor operates at a latency of two processor cycles.11 Its L2 cache operates with a latency of approximately 10 cycles. In many of today’s processors, the L1 and L2 cache are integrated onto the processor so that they can exploit the processor’s high-speed interconnections. L1 caches typically store tens of kilobytes of data while L2 caches typically store hundreds of kilobytes or several megabytes. High-end processors might contain a third level of processor cache (called the L3 cache) that is slower than the L2 cache but is faster than main memory.

Next in the hierarchy is main memory—also called real memory or physical memory. Main memory introduces additional latency because data must pass through the frontside bus, which typically operates at a fraction of processor speeds. Main memory in today’s architectures exhibits latencies of tens or hundreds of pro-

image

cessor cycles.12 Current general-purpose main memory sizes range from hundreds of megabytes (PCs) to tens or hundreds of gigabytes (high-end servers). Main memory is discussed in Section 2.3.5, Main Memory, and in Chapter 9, Real Memory Organization and Management. Registers, caches and main memory are typically volatile media, so their data vanishes when they lose power.

The hard disk and other storage devices such as CDs, DVDs and tapes are among the least expensive and slowest data storage units in a computer system. Disk storage device latencies are typically measured in milliseconds, typically a mil- lion times slower than processor cache latencies. Rather than allow a processor to idle while a process waits for data from secondary storage, the operating system typically executes another process to improve processor utilization. A primary advantage to secondary storage devices such as hard disks is that they have large capacities, often hundreds of gigabytes. Another advantage to secondary storage is that data is stored on a persistent medium, so data is preserved when power is removed from the device. Systems designers must balance the cost and the performance of various storage devices to meet the needs of users (see the Operating Systems Thinking feature, Caching).

Self Review

1. What is the difference between persistent and volatile storage media?

2. Why does the memory hierarchy assume a pyramidal shape?

Ans: 1) Volatile media lose their data when the computer is turned off, whereas persistent media retain the data. In general, volatile storage is faster and more expensive than persistent storage. 2) If a storage medium is less expensive, users can afford to buy more of it; thus, storage space increases.

Main Memory

Main memory consists of volatile random access memory (RAM), “random” in the sense that processes can access data locations in any order. In contrast, data locations on a sequential storage medium (e.g., tape) must be read sequentially. Unlike tapes and hard disks, memory latencies for each main memory address are essentially equal.

The most common form of RAM is dynamic RAM (DRAM), which requires that a refresh circuit periodically (a few times every millisecond) read the contents or the data will be lost. This differs from static RAM (SRAM), which does not need to be refreshed to maintain the data it stores. SRAM, which is commonly employed in processor caches, is typically faster and more expensive than DRAM.

An important goal for DRAM manufacturers is to narrow the gap between processor speed and memory-transfer speed. Memory modules are designed to minimize data access latency within the module and maximize the number of times data is transferred per second. These techniques reduce overall latency and increase

Operating Systems Thinking

Caching

We all use caching in our everyday lives. Generally speaking, a cache is a place for storing provisions that can be accessed quickly. Squirrels stashing acorns as they prepare for the winter is a form of caching. We keep pencils, pens, staples, tape and paper clips in our desk drawers so that we can access them quickly when we need them (rather than having to walk down the hall to the supply closet). Operating systems employ many caching techniques, such as caching a process’s data and instructions for rapid access in high-speed cache memories and caching data from disk in main memory for rapid access as a pro- gram runs.

Operating systems designers must be cautious when using caching because in computer systems, cached data is a copy of the data whose original is being maintained at a higher level in the memory hierarchy. The cached copy is usually the one to which changes are made first, so it can quickly become out of sync with

the original data, causing inconsistency. Ifa system were to fail when the cache contains updated data and the original does not, then the modified data could be lost. So operating systems frequently copy the cached data to the original—this process is called flushing the cache. Distributed file systems often place cache on both the server and the client, which makes it even more complex to keep the cache consistent.

bandwidth—the amount of data that can be transferred per unit of time. As manufacturers develop new memory technologies, the memory speed and capacity tend to increase and the cost per unit of storage tends to decrease, in accordance with Moore’s law.

Self Review

1. Compare main memory to disk in terms of access time, capacity and volatility.

2. Why is main memory called random access memory?

Ans: 1) Access times for main memory are much smaller than those for disk. Disks typically have a larger capacity than main memory, because the cost per unit storage for disks is less than for main memory. Main memory is typically volatile, whereas disks store data persistently. 2) Processes can access main memory locations in any order and at about the same speed, regardless of location.

Secondary Storage

Due to its limited capacity and volatility, main memory is unsuitable for storing data in large amounts or data that must persist after a power loss. To permanently store large quantities of data, such as data files and applications software, computers use secondary storage (also called persistent or auxiliary storage) that maintains its data after the computer’s power is turned off. Most computers use hard disks for secondary storage.

Although hard disk drives store more and cost less than RAM, they are not practical as a primary memory store because access to hard disk drives is much slower than access to main memory. Accessing data stored on a hard disk requires mechanical movement of the read/write head, rotational latency as the data spins to the head, and transfer time as the data passes by the head. This mechanical move- ment is much slower than the speed of electrical signals between main memory and a processor. Also, data must be loaded from the disk into main memory before it can be accessed by a processor.13 A hard disk is an example of a block device, because it transmits data in fixed-size blocks of bytes (normally hundreds of bytes to tends of kilobytes).

Some secondary storage devices record data on lower-capacity media that can be removed from the computer, facilitating data backup and data transfer between computers. However, this type of secondary storage typically exhibits higher latency than other devices such as hard disks.A popular storage device is the com- pact disk (CD), which can store up to 700MB per side. Data on CDs is encoded in digital form and “burned” onto the CD as a series of pits on an otherwise flat surface that represent ones and zeroes. Write-once, read-many (WORM) disks, such as write-once compact disks (CD-R) and write-once digital versatile disks (DVD-R) are removable. Other types of persistent storage include Zip disks, floppy disks, Flash memory cards and tapes.

Data recorded on a CD-RW (rewritable CD) is stored in metallic material inside the plastic disk. Laser light changes the reflective property of the recording medium, creating two states representing one and zero. CD-Rs and CD-ROMs consist of a dye between plastic layers that cannot be altered, once it has been burned by the laser.

Recently, digital versatile disk (DVD; also called digital video disk) technology, which was originally intended to record movies, has become an affordable data storage medium. DVDs are the same size as CDs, but store data in thinner tracks on up to two layers per side and can store up to 5.6 GB of data per layer.

Some systems contain levels of memory beyond secondary storage. For exam- ple, large data-processing systems often have tape libraries that are accessed by a robotic arm. Such storage systems, often classified as tertiary storage, are characterized by larger capacity and slower access times than secondary storage.

Self Review

1. Why is accessing data stored on disk slower than accessing data in main memory?

2. Compare and contrast CDs and DVDs.

Ans: 1) Main memory can be accessed by electrical signals alone, but disks require mechanical movements to move the read/write head, rotational latency as the disk spins to move the requested data to the head and transfer time as the data passes by the head. 2) CDs and DVDs are the same size and are accessed by laser light, but DVDs store data in multiple layers using thinner tracks and thus have a higher capacity.

Buses

A bus is a collection of traces (or other electrical connections) that transport information between hardware devices. Devices send electrical signals over the bus to communicate with other devices. Most buses consist of a data bus, which transports data, and an address bus, which determines the recipient or sources of that data.14 A port is a bus that connects exactly two devices.A bus that several devices share to perform I/ O operations is also called an I/O channel.15

Access to main memory is a point of contention for channels and processors. Typically, only one access to a particular memory module may occur at any given time; however, the I/O channels and the processor may attempt to access main memory simultaneously. To prevent the two signals from colliding on the bus, the memory accesses are prioritized by a hardware device called a controller, and channels are typically given priority over processors. This is called cycle stealing, because the I/O channel effectively steals cycles from the processor. I/O channels consume a small fraction of total processor cycles, which is typically offset by the enhanced I/O device utilization.

Recall that the frontside bus (FSB) connects a processor to main memory. As the FSB speed increases, the amount of data transferred between main memory and a processor increases, which tends to increase performance. Bus speeds are measured in MHz (e.g., 133MHz and 200MHz). Some chipsets implement an FSB of 200MHz but effectively operate at 400MHz, because they perform two memory transfers per clock cycle. This feature, which must be supported by both the chipset and the RAM, is called double data rate (DDR). Another implementation, called quad pumping, allows up to four data transfers per cycle, effectively quadrupling the system’s memory bandwidth.

The Peripheral Component Interconnect (PCI) bus connects peripheral devices, such as sound cards and network cards, to the rest of the system. The first version of the PCI specification required that the PCI bus operate at 33MHz and be 32 bits wide, which considerably limited the speed with which data was transferred to and from peripheral devices. PCI Express is a recent standard that provides for variable-width buses. With PCI Express, each device is connected to the system by up to 32 lanes, each of which can transfer 250MB per second in each direction—a total of up to 16GB per second of bandwidth per link.16

The Accelerated Graphics Port (AGP) is primarily used with graphics cards, which typically require tens or hundreds of megabytes of RAM to perform 3D graphics manipulations in real time. The original AGP specification called for a 32- bit 66MHz bus, which provided approximately 260MB per second of bandwidth. Manufacturers have increased the speed of this bus from its original specification— denoting an increase in speed by a factor of 2 as 2x, by a factor of 4 as 4x, and so on. Current specifications allow for 2x, 4x and 8x versions of this protocol, permitting up to 2GB per second of bandwidth.

Self Review

1. How does FSB speed affect system performance?

2. How do controllers simplify access to shared buses?

Ans: 1) The FSB determines how much data can be transferred between processors and main memory per cycle. If a processor generates requests for more data than can be transferred per cycle, system performance will decrease, because that processor may need to wait until its requested transfers complete. 2) Controllers prioritize multiple simultaneous requests to access a bus so that devices do not interfere with one another.

Direct Memory Access (DMA)

Most I/O operations transfer data between main memory and an I/O device. In early computers, this was accomplished using programmed I/O (PIO), which specifies a byte or word to be transferred between main memory and an I/O device, then waits idly for the operation to complete. This led to wasting a significant number of processor cycles while waiting for PIO operations to complete. Designers later implemented interrupt-driven I/O, which enabled a processor to issue an I/O request and immediately continue to execute software instructions. The I/O device notified the processor when the operation was complete by generating an interrupt.17

Direct memory access (DMA) improves upon these techniques by enabling devices and controllers to transfer blocks of data to and from main memory directly, which frees the processor to execute software instructions (Fig. 2.4). A direct memory access (DMA) channel uses an I/O controller to manage data trans- fer between I/O devices and main memory. To notify the processor, the I/O control-

image

ler generates an interrupt when the operation is complete. DMA improves performance significantly in systems that perform large numbers of I/O operations (e.g., mainframes and servers).18 DMA is compatible with several bus architectures. On legacy architectures (i.e., architectures that are still in use but are no longer actively produced), such as the Industry Standard Architecture (ISA), extended ISA (EISA) or Micro Channel Architecture (MCA) buses, a DMA controller (also called a “third-party device”) manages transfers between main memory and I/O devices (see the Operating Sys- tems Thinking feature, Legacy Hardware and Software). PCI buses employ “first- party” DMA using bus mastering—a PCI device takes control of the bus to perform the operation. In general, first-party DMA transfer is more efficient than third- party transfer and has been implemented by most modern bus architectures.19

Self Review

1. Why is DMA more efficient than PIO?

2. How does first-party DMA differ from third-party DMA?

Ans: 1) In a system that uses PIO,a processor waits idly for each memory transfer to complete. DMA frees processors from performing the work necessary to transfer information between main memory and I/O devices, which enables the processor to execute instructions instead. 2) Third-party DMA requires a controller to manage access to the bus. First-party DMA enables devices to take control of the bus without additional hardware.

Peripheral Devices

A peripheral device is any hardware device that is not required for a computer to exe- cute software instructions. Peripheral devices include many types of I/O devices (e.g., printers, scanners and mice), network devices (e.g., network interface cards and modems) and storage devices (e.g., CD, DVD and disk drives). Devices such as the processor, mainboard and main memory are not considered peripheral devices. Internal peripheral devices (i.e., those that are located inside the computer case) are often referred to as integrated peripheral devices; these include modems, sound cards and internal CD-ROM drives. Perhaps the most common peripheral device is a hard disk. Figure 2.5 lists several peripheral devices.20 Keyboards and mice are example of character devices—ones that transfer data one character at a time. Peripheral devices can be attached to computers via ports and other buses.21 Serial ports transfer data one bit at a time, typically connecting devices such as keyboards and mice; parallel ports transfer data several bits at a time, typically connecting printers.22 Universal Serial Bus (USB) and IEEE 1394 ports are popular high-speed serial interfaces. The small computer systems interface (SCSI) is a popular parallel interface.

USB ports transfer data from and provide power to devices such as external disk drives, digital cameras and printers. USB devices can be attached to, recognized by and removed from the computer while the computer is on without damaging the system’s hardware (a technique called “hot swapping”). USB 1.1 allows data transfer at speeds of 1.5Mbit (megabits, or 1 million bits; 8 bits = 1 byte) per second and 12Mbit per second. Because computers required fast access to large quantities of data on USB devices such as disk drives, USB 2.0 was developed to provide data transfers at speeds up to 480Mbit per second.23

Operating Systems Thinking

Legacy Hardware and Software

The latest versions of operating systems are designed to support the latest available hardware and software functionality. However, the vast majority of hardware and software that is “out there” is

often older equipment and applications that individuals and organizations have invested in and want to keep using, even when a new operating system is installed. The older items are called legacy

hardware and legacy software. An enormous challenge for OS designers is to provide support for such legacy systems, one that real-world operating systems must meet.

image

The IEEE 1394 standard, branded as “iLink” by Sony and “FireWire” by Apple, is commonly found in digital video cameras and mass storage devices (e.g., disk drives). FireWire can transfer data at speeds up to 800Mbit per second; future revisions are expected to scale to up to 2Gbit (gigabits, or 1 billion bits) per second. Similar to USB, FireWire allows devices to be “hot swappable” and can provide power to devices. Further, the FireWire specification allows multiple devices to communicate without being attached to a computer.24 For example, a user can directly connect two FireWire hard disks to copy the contents of one to the other.

Other interfaces used for connecting peripheral devices to the system include the small computer systems interface (SCSI) and the Advanced Technology Attachment (ATA), which implements the Integrated Drive Electronics (IDE) interface. These interfaces transfer data from a device such as a hard drive or a DVD drive to a mainboard controller, where it can be routed to the appropriate bus.25 More recent interfaces include Serial ATA (SATA), which permits higher transfer rates than ATA, and several wireless interfaces including Bluetooth (for short-range wireless connections) and IEEE 802.11g (for medium-range, high-speed wireless connections).

SCSI (pronounced “scuh-zee”) was developed in the early 1980s as a high- speed connection for mass storage devices. It is primarily used in high-performance environments with many large-bandwidth devices.26 The original SCSI specification allowed a maximum data transfer rate of 5MB per second and supported eight devices on an 8-bit bus. Current specifications, such as Ultra320 SCSI, permit data transfer at up to 320MB per second for 16 devices on a 16-bit bus.27

Self Review

1. What is the main difference between a peripheral device, such as a printer, and a device such as a processor?

2. Compare and contrast USB and FireWire.

Ans: 1) Peripheral devices are not required for a computer to execute software instructions. By contrast, all computers need at least one processor to run. 2) Both USB and FireWire pro- vide large bandwidths and powered connections to devices. FireWire has a greater capacity than USB and enables devices to communicate without being attached to a computer.

Comments

Popular posts from this blog

Input Output (IO) Management:HW/SW Interface and Management of Buffers.

Introduction to Operating Systems:Early History: The 1940s and 1950s

Input Output (IO) Management:IO Organization.