![]()
Memory organization and access of 80x86 processors 1) The
different types of memory
Picture from http://www.faculty.iu-bremen.de/course/FundCS2/Lectures/OSN-02%20Hardware.e.pdf
A couple of general remarks about memory: o There are several memory levels, each of which has different purposes and can be accessed within different time limits. o The fastest access is for the register set located directly on the CPU. Unfortunately, the space available is limited. o Due to this, the introduction of cache memory was necessary. Cache memory is fast-access SRAM memory, which is typically more expensive and therefore also used in limited amounts. Cache memory usually keeps frequently used data to which it provides faster access than the main RAM. For 80486 and later processors, there are two different levels of cache, one inside the CPU (Level 1) and one which is external to the CPU (Level 2). o The main memory (RAM) is typically slow to access but has a much larger capacity of holding information. 2) Segmentation: a characteristic feature of 80x86 processors
Picture from http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/pdf/ch04.pdf
o This feature is characteristic of and was introduced along with the 80x86 processors. o Our current perception of memory is something like a linear stack or array of bytes with a single-index address (this is called flat or linear addressing). o In contrast, segmented addressing specifies an offset within a given segment. The segment size is 16 bits on all 80x86 processors. This resembles a two-dimensional array. o This gives the programmer the ability to attach blocks of variables (a segment) to a particular piece of code. o It also allows partitioning of programs into modules which operate independently, as the segments are theoretically and ideally independent of each other. The best example for the use of this feature is object oriented programming. o Moreover, segmentation allows for the extension of addressability of processor (from 64kB to 1MB on the 8086 processor) o Segmented addressing does not imply that the physical memory must also be non-linear. 80x86 memory is still linear, and there is a function which converts segment:offset type address to a meaningful location. In processors using only real addressing mode, this function is very simple: the CPU multiplies the segment value by sixteen (10h), which is the number of bits on a segment, and adds the offset portion. In 80286 processors, Intel changed this function by introducing the protected addressing mode and thereby extended the memory available for addressing. 3) Real
and protected mode addressing o Real mode addressing in 80x86 refers to strictly converting one (or two in the case of segmentation) address value into a physically meaningful location in the RAM. This limits the available addressable memory of a16 bit processor to 64kB and 1MB respectively in the case of segmentation. o
Segmentation introduces an additional problem,
which is that there are several
different memory addresses that refer to the same physical address. For
example, 11F0:0, 1100:F00, and even 1080:1700 all correspond to physical
address 11F00h. The problem is solved by introducing a normalized address which
limits the offset portion to values between 0 and 0Fh o In 80286 and later processors, a
different addressing mode, namely the protected addressing mode, was
introduced. Rather
than using a function to determine the physical address, protected mode
processors use a look up table. Segment
registers simply point to data structures that contain the information needed
to access a location. Every protected mode program must include a table of
"descriptors", which are 8 byte data structures that define the start
and end of a segment. Depending on the type of segment, a descriptor may have
other information such as access rights and the like. A typical descriptor
contains the following information, packed into an 8 byte record: · Segment start: absolute 32 bit address · Segment limit: Maximum address this segment can reference ·
Segment status: privilege level, segment
present, segment available, segment type, etc. (contents
of descriptor according to http://www.ganssle.com/articles/aprot1.htm)
o
Given
the information about the starting point and the length of the segment, the CPU
computes the physical address as follows:
Picture
from http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/pdf/ch04.pdf
o As one can imagine, the protected mode is particularly important because it provides the segment limit and status, which prevents a program from overwriting previous machine instructions during execution. In real mode, it is possible for a program to go on writing data in memory (in the following segment) after the program segment has ended, which can alter data from that segment. 4) Addressing
modes 80x86 processors provide several ways of addressing operands. Among these, the most common ones are:
For
more information see http://www3.wittenberg.edu/bshelburne/Comp255S/AddressingModes.htm
5)
Cache organization and the segment descriptor cache
Picture from http://www.faculty.iu-bremen.de/course/FundCS2/Lectures/OSN-02%20Hardware.e.pdf Ø
The idea
behind cache memory use is to keep those blocks of memory, which are used
frequently, stored in more expensive but faster access memory. Cache memory is
always accessed first to determine if the data is there. The cache performance
is relying on both spatial and temporal locality meaning that o
The
nearby memory cells of the cell which is being used are likely to also be used
soon, and should be stored in the cache o
Memory
cells, which have been accessed recently, are likely to be used again soon and
should be stored in the cache. (See http://www.faculty.iu-bremen.de/course/FundCS2/Lectures/OSN-02%20Hardware.e.pdf) Ø
Studies
show that there is a certain size of the cache memory (around 128kB) beyond
which performance does not improve anymore. Ø
Cache
memory is organized in slots of fixed size (e.g. 16 bytes), such that an entire
block from main memory can be mapped to a slot in the cache. The question now
arises how to identify which blocks of main memory are mapped into the cache.
This information is encoded into the main memory address, which is divided into
at least two fields: the tag, used to identify which block is in the cache; the
offset (see 2) and optionally the slot/set used to determine which slots in the
cache a block could be mapped to. Ø
There
are several ways to map main blocks of memory to cache memory slots. o
The fully
associative cache, in which the caching controller can place a block
of bytes in any one of the cache lines present in the cache memory. This is
very flexible but the extra circuitry to achieve full associativity is
expensive and can slow down the memory subsystem. o
The direct mapped
cache (also known as the one-way set
associative cache), in which a block of main memory is always loaded
into the same cache line in the cache. One of the problems with this is that it
may not make effective use of all the cache memory, due to the fact that we
would need more bits for addressing. o The n-way set associative cache is a mixture between direct and associative mapping. The cache is divided into sets of cache lines. The CPU selects a particular set using some subset of the address bits, just as for direct-mapping. Within each set there are n cache lines. The caching controller uses a fully associative mapping algorithm to select one of the n cache lines within the set. (According to http://webster.cs.ucr.edu/Page_AoAWin/HTML/MemoryArchitecturea2.html)
Ø
Most cache designs are direct-mapped, two-way
set associative, or four-way set associative. The 80x86 family CPUs use all
three (depending on the CPU and cache). The caches of 80x86 processors are
usually unified, meaning that data and instructions are contained in the same
cache. Ø
One aspect of cache memory is essential to main
memory accessing, and that is the segment descriptor cache (in direct
relation with the descriptor table discussed already in 3), which was
implemented beginning with the 80286 processor. This segment is updated each
time a segment register is loaded and is essential to protected mode operations,
because without it, determining segment base, limit, and access rights would
require more than one CPU cycle to complete. The
microprocessor accesses the descriptor tables for each memory access. Due to
the size of this data, these values are kept in slow-access memory. Therefore,
without an internal segment-descriptor cache, each memory access would require
many other accesses to memory. Ø Although the layout of the segment descriptor cache is different for almost all processors, a typical structure (showing also the relationship between the descriptor table and the segment descriptor) is shown below, for 32-bit descriptor table entries. Figure 1: Combining fields from the descriptor table into the segment-descriptor cache.
Picture from http://www.x86.org/ddj/aug98/aug98.htm
6) Main
memory and programming o Of course, the ultimate goal of having a computer is to make use of an operating system. How does this system organize the main memory?
Picture from http://www.briceg.com/linux/aoal/MemoryAccessandOrg.html
o An
operating system like Linux or Windows tends to put different types of data
into different sections (or segments) of main memory. o The lowest memory addresses are reserved by the operating system and are usually user inaccessible (attempts to access this part of the memory will result in a protection fault). o The remaining six areas in memory are reserved for different types of data. These sections of memory include the stack section, the heap section, the code section, the READONLY section, the STATIC section, and the STORAGE section. The properties specified for each of these areas in memory (via descriptors) are one more check which prevents the programmer from misusing the data. For example, the user is not allowed to write data into the code section. References: http://www.briceg.com/linux/aoal/MemoryAccessandOrg.html http://www.ganssle.com/articles/aprot1.htm http://www3.wittenberg.edu/bshelburne/Comp255S/AddressingModes.htm http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/pdf/ch04.pdf http://www.faculty.iu-bremen.de/course/FundCS2/Lectures/OSN-02%20Hardware.e.pdf http://www.x86.org/articles/pmbasics/ http://wgc.chem.pu.ru/OS2/concepts/conc2.html http://condor.depaul.edu/~jourada/97summer/CacheMemory/sld005.htm http://webster.cs.ucr.edu/Page_AoAWin/HTML/MemoryArchitecturea2.html http://www.x86.org/ddj/aug98/aug98.htm |