Wysiging soos op 06:10, 20 November 2017 wysig KabouterBot (besprekings \| bydraes) Robotte 412 535 edits k Kategorieë gealfabetiseer ← Ouer wysiging		Wysiging soos op 17:04, 10 September 2018 wysig maak ongedaan KabouterBot (besprekings \| bydraes) Robotte 412 535 edits Uitleg van bronteks Nuwer wysiging →
Lyn 7: Vroeë SVE's was pasgemaakte dele van 'n groter en gewoonlik unieke rekenaar. Pasgemaakte SVE's word deesdae min gebruik en word gestandaardiseerde veeldoelige verwerkers eerder gebruik. Hierdie neiging tot standardisasie het reeds begin in die tydperk van transistor [[hoofraam]]- en minirekenaars en het vinnig versnel vandat [[geïntegreerde stroombaan\|geïntegreerde stroombane]] bekend gestel is. == Geskiedenis == [[Lêer:Edvac.jpg\|thumb\|250px\|links\|[[EDVAC]], een van die eerste rekenaars met elektronies gestoorde programme.]] Lyn 18: SVE's is [[digitaal\|digitale]] toestelle wat slegs met diskrete toestande werk en vereis daarom skakelelemente om tussen hierdie toestande te kan onderskei. Voor die kommersiële aanvaarding van die transistor is [[elektriese wisselaar]]s en <!-- Maak seker hierdie is die regte vertaling van Relay -->[[vakuumbuis]]e algemeen gebruik as skakelelemente. Al het hierdie elemente besliste voordele bo vroeëre suiwer meganiese ontwerpe gehad was hulle om verskeie redes onbetroubaar en was die vroeëre elektroniese rekenaars oor die algemeen minder betroubaar as elektromeganiese rekenaars al was hulle baie vinniger. Vakuumbuisrekenaars soos [[EDVAC]] was geneig om net ongeveer agt uur lank te werk voor hulle gefaal het en weer moes herstel word, terwyl die elektromeganiese masjiene soos die [[Harvard Mark I]] baie selde gefaal het. Uiteindelik het vakuumbuisgebaseerde rekenaars egter die oorhand gekry omdat die spoedvoordele die betroubaarheidsprobleme ver oorskry het. Die [[klokfrekwensie]] van hierdie vroeë sinkrone SVE's was egter baie laag in vergelyking met vandag se moderne rekenaars. Klokseinfrekwensies wat gewissel het tussen 100 [[Hertz\|kHz]] tot 4 MHz was algemeen op dié stadium en is hoofsaaklik beperk deur die skakelspoed van die komponente waarmee hulle gebou is. === Diskrete transistor en Geïntegreerde stroombaan SVE's === [[Lêer:PDP-8i cpu.jpg\|thumb\|350px\|SVE, [[Magnetiese kerngeheue\|Kerngeheue]] en [[Rekenaarbus\|eksterne bus]] koppelvlak van 'n MSI [[PDP-8]]/I.]] Lyn 27: Transistorgebaseerde rekenaars het verskeie beduidende voordele bo hul voorgangers gehad. Buiten dat hulle verhoogde betroubaarheid en laer kragverbruik in die hand gewerk het, het transistors dit ook vir SVE's moontlik gemaak om teen baie hoër snelhede te werk vanweë die korter skakeltyd van 'n transistor vergeleke met 'n buis of relê. Danksy die verhoogde betroubaarheid en die dramatiese verhoogde snelheid van die skakelelemente, is SVE's met kloksnelhede van etlike tien megahertz in die tydperk verkry. Terwyl diskrete transistor- en GIS-SVE's al meer alledaags geword het, het nuwe hoë werkverrigtingontwerpe soos [[Enkelinstruksie Veelvuldige Data]] [[vektorverwerker]]s ook hul verskyning begin maak. Hierdie vroeëre eksperimentele ontwerpe het later gelei tot die era van gespesialiseerde [[superrekenaar]]s soos dié wat deur [[Cray Inc.]] vervaardig word. === Mikroverwerkers === [[Lêer:80486dx2-large.jpg\|regs\|thumb\|300px\|Die [[Intel 80486DX2]] mikroverwerker (werklike grootte: 12×6.75 mm) in sy verpakking]] Lyn 37: Moet nog vertaal word!!! === Microprocessors === As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like [[electromigration]] and [[subthreshold leakage]] to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the [[quantum computer]], as well as to expand the usage of [[Parallel computing\|parallelism]] and other methods that extend the usefulness of the classical Von Neumann model. == CPU operation == The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. Discussed here are devices that conform to the common [[Von Neumann architecture]]. The program is represented by a series of numbers that are kept in some kind of [[Memory (computers)\|computer memory]]. There are four steps that nearly all Von Neumann CPUs use in their operation: '''fetch''', '''decode''', '''execute''', and '''writeback'''. Lyn 58: After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "[[Classic RISC pipeline]]," which in fact is quite common among the simple CPUs used in many electronic devices (often called [[microcontroller]]s). <ref>This description is, in fact, a simplified view even of the [[Classic RISC pipeline]]. It largely ignores the important role of [[CPU cache]], and therefore the '''access''' stage of the pipeline. See the respective articles for more details.</ref> == Design and implementation == {{Prerequisites header}} {{main\|CPU design}} Lyn 66: {{Prerequisites footer}} === Integer range === The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common [[decimal]] (base ten) [[numeral system]] to represent numbers internally. A few other computers have used more exotic numeral systems like [[ternary logic\|ternary]] (base three). Nearly all modern CPUs represent numbers in [[Binary numeral system\|binary]] form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" [[volt]]age. <ref>The physical concept of [[voltage]] is an analog one by its nature, practically having an infinite range of possible values. For the purpose of physical representation of binary numbers, set ranges of voltages are defined as one or zero. These ranges are usually influenced by the operational parameters of the switching elements used to create the CPU, such as a [[transistor]]'s threshold level.</ref> Lyn 77: Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and generally expense. It is not at all uncommon, therefore, to see 4- or 8-bit [[microcontroller]]s used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM [[System/370]] used a CPU that was primarily 32 bit, but it used 128-bit precision inside its [[floating point]] units to facilitate greater accuracy and range in floating point numbers {{Ref harvard\|Amdahl1964\|Amdahl et al. 1964\|b}}. Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required. === Clock rate === [[Image:1615a_logic_analyzer.jpg\|thumb\|250px\|right\|[[Logic analyzer]] showing the timing and state of a synchronous digital system.]] {{main\|Clock rate}} Lyn 89: One method of dealing with the switching of unneeded components is called [[clock gating]], which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. <ref>One notable late CPU design that uses clock gating is that of the IBM [[PowerPC]]-based [[Xbox 360]]. It utilizes extensive clock gating in order to reduce the power requirements of the aforementioned videogame console it is used in. {{Ref harvard\|Brown2005\|Brown 2005\|a}}</ref> Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the [[ARM architecture\|ARM]] compliant [[AMULET microprocessor\|AMULET]] and the [[MIPS architecture\|MIPS]] R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous [[Arithmetic logic unit\|ALUs]] in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for [[embedded computer]]s {{Ref harvard\|Garside1999\|Garside et al. 1999\|a}}. === Parallelism === [[Image:Nopipeline.png\|thumb\|300px\|right\|Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions.]] {{main\|Parallel computing}} Lyn 99: Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. [[Instruction level parallelism]] (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and [[thread level parallelism]] (TLP) purposes to increase the number of [[Thread (computer science)\|threads]] (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application. <ref>It should be noted that neither [[Instruction level parallelism\|ILP]] nor [[Thread level parallelism\|TLP]] is inherently superior over the other; they are simply different means by which to increase CPU parallelism. As such, they both have advantages and disadvantages, which are often determined by the type of software that the processor is intended to run. High-TLP CPUs are often used in applications that lend themselves well to being split up into numerous smaller applications, so-called "[[embarrassingly parallel]] problems." Frequently, a computational problem that can be solved quickly with high TLP design strategies like SMP take significantly more time on high ILP devices like superscalar CPUs, and vice versa.</ref> ==== ILP: Instruction pipelining and superscalar architecture ==== [[Image:Fivestagespipeline.png\|thumb\|300px\|left\|Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle.]] {{main articles\|[[Instruction pipelining]], [[Superscalar]]}} Lyn 117: Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle ('''IPC'''). <ref>Best-case scenario (or peak) IPC rates in very superscalar architectures are difficult to maintain since it is impossible to keep the instruction pipeline filled all the time. Therefore, in highly superscalar CPUs, average sustained IPC is often discussed rather than peak IPC.</ref> Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the [[very long instruction word]] (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity. ==== TLP: Simultaneous thread execution ==== Another strategy commonly used to increase the parallelism of CPUs is to include the ability to run multiple [[thread (computer science)\|threads]] (programs) at the same time. In general, high-TLP CPUs have been in use much longer than high-ILP ones. Many of the designs pioneered by [[Cray]] during the late 1970s and 1980s concentrated on TLP as their primary method of enabling enormous (for the time) computing capability. In fact, TLP in the form of multiple thread execution improvements has been in use since as early as the 1950s {{Ref harvard\|Smotherman2005\|Smotherman 2005\|a}}. In the context of single processor design, the two main methodologies used to accomplish TLP are [[chip-level multiprocessing]] (CMP) and [[simultaneous multithreading]] (SMT). On a higher level, it is very common to build computers with multiple totally independent CPUs in arrangements like [[symmetric multiprocessing]] (SMP) and [[non-uniform memory access]] (NUMA). <ref>Even though SMP and NUMA are both referred to as "systems level" TLP strategies, both methods must still be supported by the CPU's design and implementation.</ref> While using very different means, all of these techniques accomplish the same goal: increasing the number of threads that the CPU(s) can run in parallel. The CMP and SMP methods of parallelism are similar to one another and the most straightforward. These involve little more conceptually than the utilization of two or more complete and independent CPUs. In the case of CMP, multiple processor "cores" are included in the same package, sometimes on the very same [[integrated circuit]]. <ref>While TLP methods have generally been in use longer than ILP methods, Chip-level multiprocessing is more or less only seen in later [[Integrated circuit\|IC]]-based microprocessors. This is largely because the term itself is inapplicable to earlier discrete component devices and has only come into use recently.<br/>For several years during the late 1990s and early 2000s, the focus in designing high performance general purpose CPUs was largely on highly superscalar IPC designs, such as the Intel [[Pentium 4]]. However, this trend seems to be reversing somewhat now as major general-purpose CPU designers switch back to less deeply pipelined high-TLP designs. This is evidenced by the proliferation of dual and multi core CMP designs and notably, Intel's newer designs resembling its less superscalar [[P6]] architecture. Late designs in several processor families exhibit CMP, including the [[x86-64]] [[Opteron]] and [[Athlon 64 X2]], the [[SPARC]] [[UltraSPARC T1]], IBM [[POWER4]] and [[POWER5]], as well as several [[video game console]] CPUs like the [[Xbox 360]]'s triple-core PowerPC design.</ref> SMP, on the other hand, includes multiple independent packages. NUMA is somewhat similar to SMP but uses a nonuniform memory access model. This is important for computers with many CPUs because each processor's access time to memory is quickly exhausted with SMP's shared memory model, resulting in significant delays due to CPUs waiting for memory. Therefore, NUMA is considered a much more scalable model, successfully allowing many more CPUs to be used in one computer than SMP can feasibly support. SMT differs somewhat from other TLP improvements in that it attempts to duplicate as few portions of the CPU as possible. While considered a TLP strategy, its implementation actually more resembles superscalar design, and indeed is often used in superscalar microprocessors (such as IBM's [[POWER5]]). Rather than duplicating the entire CPU, SMT designs only duplicate parts needed for instruction fetching, decoding, and dispatch, as well as things like general-purpose registers. This allows an SMT CPU to keep its execution units busy more often by providing them instructions from two different software threads. Again, this is very similar to the ILP superscalar method, but simultaneously executes instructions from ''multiple threads'' rather than executing multiple instructions from the ''same thread'' concurrently. === Vector processors and SIMD === {{main articles\|[[Vector processor]], [[SIMD]]}} Lyn 129: Most early vector CPUs, such as the [[Cray-1]], were associated almost exclusively with scientific research and [[cryptography]] applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after [[Floating point unit\|floating point execution units]] started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's [[MMX]] were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with [[floating point]] numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's [[Streaming SIMD Extensions\|SSE]] and the PowerPC-related [[AltiVec]] (also known as VMX). <ref>Although SSE/SSE2/SSE3 have superseded MMX in Intel's general purpose CPUs, later [[IA-32]] designs still support MMX. This is usually accomplished by providing most of the MMX functionality with the same hardware that supports the much more expansive SSE instruction sets.</ref> == See also == * [[Addressing mode]] * [[CISC]] Lyn 145: * [[Wait state]] == Notes == <div class="references-small"> <references /> </div> == References == <div class="references-small"> * {{note label\|Amdahl1964\|Amdahl et al. 1964\|a}} {{note label\|Amdahl1964\|Amdahl et al. 1964\|b}} {{cite paper Lyn 223: </div> == External links == ;Microprocessor designers *[http://www.amd.com/ Advanced Micro Devices] - [[Advanced Micro Devices]], a designer of primarily [[x86]]-compatible personal computer CPUs.

Sentrale verwerkingseenheid: Verskil tussen weergawes