|
|
64-bit architectureIntroduction:To get a first idea, how the 64-bit architecture works and also how it differs significantly from a 32-bit implementation it is useful to consider one definition first: "A 64-bit processor is a microprocessor with a word size of 64 bits, a requirement for memory and data intensive applications such as computer-aided design (CAD) applications, database management systems, technical and scientific applications, and high-performance servers. 64-bit computer architecture provides higher performance than 32-bit architecture by handling twice as many bits of information in the same clock cycle." (search390.com) The most important parts, which define a 64-bit architecture are boldfaced and give a rough idea that one can now process not only 2^32 = 4294967296 basic units of information, but 2^64 = 18446744073709551616 units. The numbers are quite impressive and show that the architecture level has to be updated accordingly. There are several companies, which actually implemented 64-bit processors, but the two main companies are AMD and Intel. Other enterprises certainly have their place in the development of 64-bit processors, too, but the mainstream market is going to face those products by AMD and Intel. Therefore it is reasonable to explain, how those two companies designed the 64-bit processors and moreover there are only details to consider in translating the two special layouts and implementations to the general concept. There are quite some differences how the two companies chose to convert 32-bit programs to work with the 64-bit architecture and those differences will be outlined in the 32-bit part of this document, but in the following part the structure of a "pure" 64-bit architectural level will be outlined. As there is not much public information available about the physical structure of current 64-bit processors due to the fact that neither AMD nor Intel want to provide crucial information to the corresponding rival on the processor market it is useful to focus on the instruction set architecture (ISA) and the general differences between a 32-bit processor and the new 64-bit one. AMD - layout & features:
So the question is what exactly happens inside the processor core and where does the 64-bit component come into play? First of all, there is a distinction to make between the size of the databus, which is already 64-bits large for many 32-bit processors and the architecture of the central processing unit (CPU). Here the difference e.g. between the AMD Athlon and the new AMD Hammer or Opteron is that the complete architecture is now based on 64-bits. For AMD it is called x86-64 and is activated by one special bit called Long Mode Active (LMA). If LMA is activated all 64-bit features are activated and the CPU leaves the compatibility mode for older 32-bit applications. The following main concepts become viable when LMA is activated:
But those three conceptual changes for the "pure" 64-bit mode are build upon the main structures continuously present (whether used or not) in the 64-bit architecture. These most basic principles are:
In the following part information will be taken from the source, which is seemingly consulted by anybody, who publishes something about the AMD 64-bit architecture. This is the AMD x86-64 Programmer's Manual . It comes in several volumes and describes the implications and applications of the instruction set architecture for AMD's 64-bit processing technology. As mentioned above, one main - if not the main - change from 32-bits to 64-bits is the change of the size for register files and in particular for the GPRs (general purpose registers). The following picture shows, how AMD partitioned their system of register files:
This shows again that the AMD's design of processor architecture can be viewed as an extension of the old 32-bit approach as all the general registers concepts displayed here existed already before in a 32-bit design, so GPRs, media registers, an instruction pointer and a register for flags are not new, but rather enhanced by the new approach. But what is even more important, becomes visible when one looks at the ISA closely. AMD - Instruction Set Architecture:The most basic units of organization for the instructions are specified the following way (see AMD manual again - page 38/39):
When the LMA is activated the maximum speed for instructions to be performed is enabled and this is usually done by the operating system. This is the stage we would like to call "pure" 64-bit mode and this mode can be recognized for both architectures, the one described here from AMD and the Intel IA64 described later on this page. For the following part of the analysis we assume that LMA is activated and the processor is in "pure" 64-bit mode, which is not to be confused with legacy mode or long mode compatibility mode; these are features to support the transition from 32-bit machines and software to the new architecture. Those should not be considered yet, but in the 32-bit section.
There exist five addressing modes:
And again one realizes that there are no real differences in the structure compared to non-64-bit ISAs. The PC, the Stack and absolute addressing just carry over with more bits. The RIP (relative instruction pointer / program counter) keeps its function, but due to 64-bits provides a more efficient way to directly access segments of code with relative addressing. This is one reason , why there is a significant increase in speed for the AMD 64-bit architecture - direct access to program code. For the Absolute Addressing it gets even easier due to the common standard base 0. The same holds for pointers in general. As one is no longer able to access the segmented registers the concept of far pointers, which store a segment address and the usual address, is no longer needed as the memory is just one linear chunk. Near pointers are enough and one can return for 64-bit applications for the AMD architecture to the general term pointer as it is obvious that it can only point into one data segment. The immediate and displacements remain of 32-bit size but can be extended to a virtual 64-bit mode if needed. This finishes the broad outline of the instruction set architecture for AMD based on the document mentioned above and their philosophy to keep it simple and easy becomes apparent, but this is only true for AMD, not for 64-but processors in general. They might demand more sophisticated instruction sets and might not rather focus and build upon established concepts. One has to know more certain technical details, which should not be emphasized here as the new registers must be taken into account and therefore the possibility of combinations to address and declare correctly rises, but their complexity level does not rise significantly for AMD. Outlining the new instructions for every new register would be tedious and cumbersome work and is only valid for the ISA of AMD, anyway, so we go on the comparison with Intel's implementation. Intel - layout & features:Intel takes a quite different approach in its 64-bit architecture called Itanium. The main two catchwords for the ideas, which are used are IA-64 and even more important VLIW (very large instruction word). Intel aims at even more parallel computing power and a more involved approach in implementing the possibilities of 64-bits. In this context one might even say 128-bit as the instruction word for IA-64 is 128-bits, which gives an impressive amount of information 2^128=3,4028236692093846346337460743177*10^38. Also this might sound superior to the approach taken by AMD and other companies (which also have 128-bit registers available) at first it also entails a lot of problems especially in compatibility issues - discussed in the next part - and also in the complexity of the structure of registers and instructions. One instruction word encodes three basic instructions and contains a pointer:
EPIC was shortly explained above and the concept of speculation is quite intuitive as the compiler tries to schedule data access and operations before the time they would normally be needed or executed. This should avoid that slow operations halt the whole process. Predication is more complex and means that a branch, which is conditional and might not even be needed at all for the whole operation is prepared to be executed beforehand to guarantee the maximum amount of speed. Predication is not to be confused with prediction. For predication a certain number of parallel operations are prepared by marking conditional branches, which might be taken for the next parallel instruction bundle (bundle will be explained later on). But these branches are actually only computed, when they are really needed and the conditional evaluates to true. The number of different types of registers in the Intel architecture IA-64 is greater than the ones for AMD and also the general concept is more involved as the most basic overview below shows (data & pictures used for this part: hardwaresecrets):
Intel - New Concepts:The following part of this analysis will just take up some important points as there is no possibility to describe Intel's approach in a complete broad context properly in reasonable amount of space. Therefore one primary resource will be a brief presentation given by Gautam Doshi already in 1999, which can be found here. Again we are going to focus here on the "pure" 64-bit mode of the processor. First of all one statement from the first part of the IA-64 architecture carries directly over into this part: It is all about parallelism! This entails addressing the questions, what is the structure of the registers and the instructions to enable and use this feature extensively. Obviously some operations are not dependent or only partly dependent and the structured program pattern holds all these operations back. One easy striking example is the following:
Certainly the second instruction can be performed without any knowledge of the first one and actually this would help to prepare for the third operation, so Intel makes an interesting point here that parallelism can lead to superior performance. Therefore to increase the size of the instruction word is a natural step. The other step is to remove the stops ";;" to implement the following code:
But how to manage all this just with registers and simple instructions? One concept, which is of importance for Intel's architecture is the RSE (register stack engine). It automatically saves and restores the stack; a feature, which is crucially needed to perform background operations or to cope with outputs of speculation properly and to reduce over-/underflow. There are several other concepts like the advanced load address table, the translation lookaside buffer or the hardware page walker, which help to achieve the primary goal of parallelism (reference so far). Intel - Architecture:
The basic structural unit of the Itanium looks like the picture to the right.The databus can cope according to Intel with a data rate of 2.1GB/sec. The Itanium processor contains 4 integer ALUs, 4 multimedia ALUs, 2 AGUs, 3 branching units and 4 FPUs for arithmetic with floating point numbers. The processor is capable of theoretically performing 20 operations in one clock cycle by loading 16 operands and evaluating 4 ALU operations. This possibility should not be confused with the number of instructions possible within one clock cycle - namely six. The instructions are retrieved from memory and are bundled by a process called bundle rotation; this prepares the execution of parallel instructions on the hardware level. The instructions are fetched from the cache speculatively. All this is implemented with the help of 128 floating point registers, 128 integer registers and 8 branching registers, which all support explicitly 64-bits (reference). But actually the different types of registers even extend this basic structure and there is maybe no other point to see the direct distinction between the approach of AMD and Intel more directly. Common components in both architectures are the GPRs (64-bit), registers for floating-point, instruction pointer and branch registers. Nevertheless, there is a reason why sometimes the Itanium is called "The King of Registers". Besides the sheer number also additional types are specified:
If one views this scheme in contrast to the register distribution of the AMD x86-64 above a contrast becomes apparent and this continues virtually into every part of the ISA. Memory addressing, for example, is accessed with 64-bit pointers and uses sizes of 1,2,4,8,10 and 16 byte. Data/Instructions might be stored in memory either in little endian of high endian and this is controlled by another special purpose register file (reference). Some features of these constructs are certainly quietly implemented in the AMD architecture, too, but the real difference is then made by instruction bundling for the IA-64. Bundling refers to the 3 instructions of 41-bits mentioned at the beginning of this paragraph and one instruction word is also called a bundle in this case and needs special management, but this management of the bundle has been only touched on the surface in the first part of the IA-64 architecture and would need at least the space used so far to give a brief overview of the IA-64 and X86-64. Just as a short remark to remind the reader: These are only two companies, two approaches and two quite different goals. Nevertheless AMD and Intel are not alone and other enterprises are working on 64-bit processors, too. Perspectives:At this point one has only an overview to judge the two approaches presented. But one thing should be emphasized clearly that it should be apparent, how fundamental differences in the view of processing technology become visible in these two cases: The struggle for power against stability/compatibility. Certainly Intel's emphasis on parallel computing offers more power and extends computing to a new level, but whether this is the way to go can definitely not be decided yet because we have to ask, what about our old software, everything developed so far in 32-bits should work in the new age, too?! The next section is going to address this topic in more depth.
And last but not least an explanation, how the marketing tries to explain this complex architecture and instruction set. Some specialists of the AMD marketing department have done quite a nice job to explain the whole construction within one minute and introduced some simplifications for the explanation to the public. It is just interesting to see how two series of manuals - containing several thousand pages each - boil down to this view. References for this part are basically placed in the appropriate positions - this list gives an overview: - Search390.com: http://search390.techtarget.com/sDefinition/0,,sid10_gci498697,00.html - Hammer Review A1- Electronics: http://www.a1-electronics.co.uk/AMD_Section/CPUs/Hammer_Review_pg2.shtml - Article X86-64 Hardwaresite: http://www.hardwaresite.net/x86-64.html - AMD Developer's Manual X86-64: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf - Article IA-64 Hardwaresite: http://www.hardwaresite.net/ia64.html - Presentation IA-64: http://www.eg.bucknell.edu/~bsprunt/comp_arch/intel/ia64_tutorial.pdf - Softwware Developer's Manual Itanium: http://developer.intel.com/design/itanium/manuals/245317.pdf - Hardware Developer's Manual Itanium: http://developer.intel.com/design/itanium/downloads/248701.htm - AMD Opteron video: http://www.amd.com/us-en/assets/content_type/DigitalMedia/AMD_Opteron.wmv - Article 64-bit computing: c't 12/99 page 28 - basic notations, definitons and concepts are taken from "Computer Organization and Design", Hennessey and Patterson |
|||||||||||