Are RISC processors more expensive than CISC

CISC and RISC - the opposites of computer architectures

What do the abbreviations mean?

RISC and CISC stand for the terms Reduced Instruction Set Computer and Complex Instruction Set Computer, respectively. This means two basic philosophies in computer construction. Which I want to introduce in the following. Computers today are mostly not pure RISC or CISC machines but have taken over something from both.

CISC - the natural evolution

The terms CISC and RISC are about how a computer's CPU works. In the CISC model, the CPU usually has a few registers (memory locations in the CPU), some of which usually have special tasks. There are many commands for this, including some very powerful ones that, for example, process several registers in a loop. The commands usually have different command lengths. The most commonly used only one byte, the less common two or three bytes. This allows the code to be kept compact.

Usually these commands are implemented using microcode. This is a breakdown of the commands into smaller units, a kind of "interpreter" on the CPU. With modern CPUs, this microcode can even be changed, which means that the manufacturer can import bug fixes.

When a CPU is further developed, it often happens that it moves more and more towards CISC. The reason is relatively simple. As time progresses, the technology has to integrate more and more transistors on a chip. The first 8086 CPU had 29,000 transistors and a Pentium 4 had 42 million. This option is almost always used to integrate new commands that, for example, make programming easier. However, this makes the entire design more complex, which is what makes CISC.

Other features are powerful commands: With the 8086, for example, string commands that copy an entire character string with one command or the option of only using parts of the register, with a 32-bit processor, for example, also to work with 8-bit or 16-bit numbers and a register in split in two halves as with the 8086, for example, the AX register in 2 8-bit registers AH and AL.

Even with the mainframe computers of earlier days, the CISC architecture was often predominant: It was believed that the more "powerful" the commands of a CPU, the faster they are and the less memory space the code requires. (Storage space has always been in short supply). A fundamental disadvantage of CISC is that it is usually not possible to use many registers. Registers are memory locations in the CPU and the fastest memory that a CPU can address. The small number is due to the fact that with most commands the register on which it acts is also encoded. The more registers you have, the fewer commands you can implement. Very often there were intermediate solutions such as commands that only worked with certain registers.

RISC - Reduced to speed

RISC takes a different approach. You limit yourself to the really necessary commands. To compensate for this, more has considerably more registers (up to 256) on the chip, so that one has fast register-register operations much more often than slow memory-register operations. The few commands make the design easier and the processor can be made cheaper.

Instead of microcode, commands are wired directly in the decoder, which makes execution faster than with CISC. At the same time, the data format is standardized. With CISC, a command can be one byte or three bytes long, plus a few bytes for data. In short, it takes time to decode such an instruction, which means that time is lost. With RISC, all commands have a uniform length, which means that decoding is faster, but the code is inevitably larger.

The roots of RISC

There has always been an effort to make the machine more powerful, so the historical development was also with the mainframe computers, for a long time that of CISC. But one machine showed what RISC meant:

It was the Cyber ​​6600, the first supercomputer from legendary computer builder Seymour Cray. The Cyber ​​was designed to be fast - when it was launched in 1965, it already managed 9 megaflops. The Cyber ​​was a computer with a working width of 60 bits. At that time a byte was only 6 bits. With an address space of 18 bits (256 KByte). (More address space would not have been an advantage, because toroidal core memories, the technology common at the time, were not small chips but 5 × 5 cm modules with a maximum of 1024 small iron rings. Memory therefore takes up a lot of space). 16 registers were available for addressing (18 bit width) and a total of 60 data registers of 60 bits each. With 2 computation units, the Cyber ​​anticipated what the Pentium only achieved 30 years later: the pipelining of 2 computation units, but in contrast to the Pentium, these were independent of each other. For this, the CDC 6600 only had 64 machine commands - exactly as many as fit in a byte of 6 bits. In comparison, the 8086 had 90 machine instructions and many more have been added since then.

In order to achieve maximum performance, all commands had a uniform format: a length of 15 or 30 bits, together with half a data word of 30 bits. All commands had a uniform length of 60 bits. The reward was speed. IBM's standard model, the IBM 360, only reached 330 KFlops at the same time, so the Cyber ​​6600 was almost 30 times faster than a "normal" mainframe computer.

Although the Cyber ​​was very successful and was the basis for Seymour Crays later supercomputers, no other computers of the time were inspired. With the usual mainframes, they preferred to stick with CISC.

CISC or RISC - The battle of home computers

10 years later the first 8 bit processors came onto the market and one could assume that with the limited number of transistors that could be accommodated at the time, the way to RISC was clear. But it was not like that. The historic duel began in 1975 between 8080 and 6502 and would last over 10 years.

1975, West Coast Computer Fair (WSCOM). There is a rush around a stand of the small company "MOS Industries". The company is offering its new microprocessor 6502 in a transparent container - nothing new, it seems, there are already types 8080 from Intel, 6800 from Motorola and RCA 1802. But one thing was new: the price of the new processor was offered for 25 USD , while the 8080 is priced at $ 175. Everyone bought a processor, and by the end of the first day the jar was empty.

A number of computers that made history were later created on the 6502 - the Apple II, the C64 or the Atari VCS 2600 game console. But many computers were also built on the 8080 successor Z80 - the Schneider CPC, Tandy TRS-80, the Sinclair ZX-81 or Spektrum. Both architectures succeeded, although they couldn't have been more different. The 8080 was already a typical CISC representative. But the Z80, built by a former INTEL development team, easily topped it.

The 8080 had an accumulator (A) and 6 times 8 bit registers (B, C, D, E, H, L) which could be switched together to 16 bit registers (BC, DE, HL). In addition there was the 16 bit stack pointer and the program counter. With the 8080, memory operations only went via the HL register, but this also offered 16-bit operations such as addition and subtraction. The 1980s expanded the design of the 8080. He introduced a second set of registers, plus two new registers IX and IY, which were used for index operations. The BC and DE registers were just as suitable as the HL register for memory operations and 16 bit calculations, plus a wealth of commands - for block operations, in / out, bit-by-bit rotation / shifting. From the 57 commands of the 8080, the Z80 made over 134! But that cost time when decoding. A command could be between 1 and 5 bytes long. The execution took between 4 and 28 bars.

How simple, on the other hand, was the design of the 6502, only 3 8-bit registers, even only an 8-bit stack pointer. The stack was fixed between address 256 and 511 and addresses 0 to 255 could be addressed via the index register (and above that the rest of the RAM). However, this 256 byte RAM could also be viewed as 256 additional registers, because when the 6502 appeared, RAM was faster than microprocessors. In addition, the commands had to be decoded quickly - in 1-2 cycles, as the 6502 used both the rising and falling edges of the clock signal (this meant that the clock frequency could already be halved compared to an 8080 or Z80).

As different as the designs were, in reality both were successful. The MOS 6502 was faster than a Z80 at the same clock frequency (it transferred one byte from the memory per edge instead of once per clock), but that was balanced out by the fact that it usually ran between 1-2 MHz, while the Z80 3 -4 MHz. From 1983 even faster Z80 versions were released that went up to 10 MHz. This development was not possible with the 6502 because it relied on the fast RAM. On the other hand, a 6502 was always a bit cheaper because it was easier to manufacture.

Probably the most radical approach of RISC in the 8-bit age was the relatively seldom used RCA 1802. The processor was equipped with only 16 commands and ran at 6 MHz as early as 1975, while the 6502 with one and the 8080 with 2.5 MHz were introduced. It didn't play a role in the home computer market, but it is in numerous space probes. Because it was technically very robust and not very susceptible to static electricity.

Intel and the IBM PC

The Intel 8086 is also a typical representative of CISC. It was used in the IBM PC. The computing power of the x86 architecture was considered to be the measure of all things from 1986 at the latest, when the Intel 386 caught up with the competition. The reason for the CISC design was that the 8080 design was inflated to save development costs and time. As a result, the x86 had some quirks that made its architecture complex. The most important one was that it could only address 3 segments (code, data and stack) of 64 K each, these windows were shifted in an address space of 1 megabyte. When the architecture was later expanded to a full 16 MB address space with the 286 and to 4 GB and 32 bits with the 386, the old "real mode" remained, but was still retained as an additional mode. In addition, it was not possible to simply introduce new registers, although the instruction set of the x86 series was expanded over time.

The competition had to see how the x86 computers took on ever larger market shares. With its widespread use, Intel had two trump cards in hand: They earned more from the larger number of units and the development costs were spread across more chips. But other manufacturers also had the chance to break new ground; they were not necessarily reliant on slavish backward compatibility.

The renaissance of RISC

But now something changed in the programming up to the mid-eighties. Up to the 8086 with its limited computing power and small memory, it was necessary to program in assembler for fast programs. With the introduction of the four times faster 286 and more memory, programs that were developed with high-level languages ​​such as C gained more and more importance and assembler was limited to small, system-level routines. However, studies have now shown that compilers exploit the instruction set of a processor far less than the human programmer. 80% of the compiler code only used 20% of the commands, yes certain commands were not used at all. This is because compilers have a very rigid pattern. Even more: many of the advantages of CISC - powerful commands that only work with a certain register - were often not used at all. This is also due to the rigid approach of a compiler. This does not have several code alternatives available - one if you can use register X, another if you can use register Y, but just one version and ideally uses the instructions that work with all registers.

This was the renaissance of RISC. At the end of the 1980s, almost all manufacturers built RISC processors. The common features were: Few commands that could be decoded quickly, uniform command format and fast decoding. The individual concepts were quite different. RISC was also a possibility to achieve a very high computing power with relatively few transistors. This was important for manufacturers from whom more and more market shares were withdrawn from Intel in the processor market. Motorola created the Power PC chip together with Apple and IBM. As a result, the chip had to meet very different requirements: Apple and Motorola wanted a successor to the aging 68000 series, so it could emulate their commands. At the same time, however, IBM wanted a fast processor for minicomputers and even mainframes. The Power PC is very successful and has seen the fourth generation since 1993. It was only with this that the introduction of a passive cooler became necessary with otherwise similar computing power as a Pentium counterpart (but always a slightly lower clock rate).

The ARM processor played a sad role. Most of the new RISC processors were designed for workstations. But Acorn - a well-known company in England - released a PC with this processor that could rival a 386 er in 1988 at a fraction of the cost. But the company lacked awareness of this device, the Archimedes. ARM recognized the signs of the times and made the ARM processor successful as a microcontroller. ARM processors have a market share of 70% in the 32-bit microcontroller sector and are manufactured under license from numerous companies - even Intel. Most PDAs with Windows CE use this processor because it is fast and energy efficient.

Sun heralded the change with the Sun 4 and the SPARC processor in 1987 as the first company. The SPARC only had 55,000 transistors in the first version, but a higher speed than a 386 with 275,000 transistors. Above all, special features made high-level language programming on this processor fast. Other workstation manufacturers soon followed suit, such as MIPS or DEC with the Alpha, which still holds the top speed today. The transputer processors from INMOS were a specialty. Several of these processors could be interconnected and work on a problem in parallel. The illustration of the processors Intel 80386 and INMOS T400, which appeared at the same time, shows the differences between CISC and RISC on the processor: Large uniform fields on the T400 with lots of internal registers and 1 KByte cache on board and "rough" looking fields on the 80386, with logic functions that are considerable require more transistors. The INMOS T400 was faster than the 80386 despite the lower integration density.

What Makes RISC Fast?

Now you have to know how a command is carried out by a microprocessor. Each processor has to perform at least 4 elementary operations per instruction:
  • Get data from memory
  • Decode data, i.e. determine what kind of command is meant
  • execute order
  • Write data back to the registers or memory.
A processor needs at least one clock cycle for each stage, but usually more. Well, here are some things that are important.
  • First, it takes time to fetch the data from memory. Memory chips are slow compared to a processor.
  • Second, the more and more complex instructions a processor has, the longer it takes to decode

CISC processors now have the following strategy: Decoding takes longer because the code does not have a uniform format, but is very short (less memory access). In return, the commands can do more so that fewer have to be executed. Many registers are not possible because of the short code requirement. The complex commands are broken down into simpler instructions in the chip. There is microcode for this - a ROM on the chip. The individual functional units are also very powerful, e.g. in that there are hardware multiplier units that are fast but also require a lot of transistors.

RISC tries to minimize memory access in two ways:

First - many registers

The data can be kept in these. Memory addresses are omitted from most commands. Ideally, there is only one command that communicates directly with the memory - a load and store command that always uses a register. All other commands work with registers and are so fast. Another advantage is that there are no additional cycles for memory access. Depending on the type, RISC processors have between 32 and 256 registers that can be used universally. CISC types such as the Motorola 680x0 or Intel 80x86 have 16 or 12 registers.

Second, uniform command format

There are different instructions in a processor. Some need data, others don't. With a CISC processor like the 80386 a command can look like this:
  • A "common command" - encoded in the first byte
  • A "rare command" - more information is in the second byte
  • Data can be attached:
  • An 8 bit, 16 or 32 bit value or an offset. - accommodated in the following byte, word or double word.

That makes decoding complicated. You can't just decode every byte, because it could be data or the second part of an instruction. In principle, the problem also exists with RISC. But one tries to minimize it there by making the code so that one can at least make predictions about the end of the command. I would like to explain this using a hypothetical example:

  • Each command should be 6 bytes long: either in the format:
  • Command (1 byte), register number (1 byte), data (always 32 bits) - commands that write to and from the memory.
  • > or
  • Command (1 byte), register numbers 1,2,3 (1 byte each), 2 bytes without data - commands that work with registers. A maximum of 3 registers are required for operations such as Reg1 = Reg2 + Reg3. However, there are also commands with only one or no register as an indication.

This means that each command is 6 bytes long and at first glance it looks like a waste of space. With 16-bit values, for example, you don't need 32 bits for data and with the register commands at least 2 bytes are always unused.

But the whole thing also has a downside. Because now two things are certain: First: The first byte is always an instruction and the second is a register, which can be automatically decoded. You don't have to worry about what could be in the second byte with CISC. After decoding the command, you immediately know what is in the other bytes.

But the more important advantage: The next command is guaranteed to start 6 bytes further. Why is this important? Pipelines were now being used in microprocessors around the same time as RISC. Pipelines fetch the data and start to decode. The length of a pipeline is given in stages. A 5-stage pipeline like the 486 processor has, e.g. 5 instructions in different stages of decoding in the "flow". If the execution time of the simplest instruction is 5 cycles, then such a pipeline can offer an instruction decoded for each cycle and thus increase the performance.

CISC processors have to come up with a lot in order to decode the various possible combinations in a pipeline. The pipelines are long there because there are so many options and you need more steps. RISC processors, on the other hand, know exactly where the next instruction begins and can automate more. The pipeline is shorter. This is an advantage if there is a jump in the program, because then all the data in the pipeline is no longer correct.

The demand for simple commands also arises from this: The fast decoding of the commands in the "assembly line" has only one advantage if the commands are also executed quickly, because otherwise one does not benefit from the speed advantage. Therefore there are only elementary commands that are ideally executed in one cycle.

Intel Victory - A CISC Victory?

Even INTEL brought out a RISC processor in 1989, the Intel 860. But even at Intel, customers were not ready to break away from the traditional PC standard. In the mid-nineties, Intel had increased its x86 series to such an extent that it now penetrated the RISC-dominated workstation market with multiprocessor systems. Thanks to the use of PC hardware, these were significantly cheaper to manufacture. Now it got its revenge that manufacturers like Sun, Silicon Graphics or DEC never paid attention to the PC market. When more and more networks began to find their way into companies around the same time, the customers who had previously had PCs also stayed with what they knew: servers with x86 hardware and Windows NT instead of workstations and UNIX.

The end consumer is interested in "what comes out at the back" to quote a Federal Chancellor. What use is it if a Power PC with 450 MHz and a passive cooler is just as fast as a Pentium III (left picture) with 700 MHz and a complex cooler? As long as a PC is cheaper than a Mac, because it is manufactured much more frequently, it works out in favor of the PC. And Intel's processors can keep up, instead of changing the processor code, the clock frequency was increased, and L1 and L2 caches were integrated into the core to increase speed. In return, Intel, together with HP's PA series, has the crown when it comes to transistors per chip: Nobody puts so much effort into achieving a certain computing power.

So CISC won, right?

No! On the one hand, there are considerably more RISC processors in number than CISC processors. They are used wherever IBM compatibility is not required - as a printer controller, in PDAs (because of the power consumption), in game consoles and in countless electronic components in cars. (A mid-range car has between 13 and 30 microprocessors, most of them as embedded controllers).

On the other hand, the separation of CISC and RISC no longer exists today. Intel's first 64-bit Itanium processor borrowed from RISC. The RISC processors also became more complex with the further development and are approaching CISC. In the end, however, the CISC architecture of the x86 processor no longer exists ...

A lot has happened at Intel since the 486. The 486 came out in 1989 and showed that the x86 architecture was slowly coming to an end. Thanks to a 5-stage pipeline, 80% of the instructions could now be executed in one clock cycle - it couldn't be faster with the x86 architecture, but the 486 was only about 60% faster than a 386 and for a long time too expensive. It was not until 1993 that sales were able to surpass those of the 386, although Intel may have annoyed AMD, which had previously made big business with AMD with the 40 MHz version of the 386 (Intel only had a 33 MHz version).

In short: the 486 successor Pentium had to be compatible, but new. What Intel did was the following: After decoding the commands, they are translated into simpler RISC commands, regrouped and fed to two arithmetic units that are pure RISC machines. This has been the case since the Pentium Pro and a Pentium 4 is in principle a good x86 emulator ... An x86 instruction can only contain one but also 3 or 4 simple RISC operations. These are relatively long, 118 bits for the Pentium III. With the Pentium 4, probably only about half.

When converting commands to RISC, however, there were now and then errors, which caused a lot of excitement. All competitors (AMD, Cyrix, IDT) do it similarly, only of course not identical. Cycle speeds were wasted because everyone was using a different RISC machine.

With the new 64-bit Itanium processor, Intel will also have a RISC unit with 128 registers. The new 64-bit mode is a real RISC architecture with a constant command width of 41 bits, whereby 3 commands are combined into a 128-bit double word for optimization - the compiler and not the processor should store information about the dependency of the commands in the remaining 5 bits. This is intended to facilitate the complex speculative execution and regrouping of the x86 series.

However, classic RISC processors such as the Alpha or Ultrasparc are still leading in terms of performance, as the following table shows:

processorClock rate (MHz)SPEC fp_Base 2000per MHz
Pentium III108018241.689
Athlon133326722.004
Pentium 4170034562.033
Ultrasparc III90037604.178
Alpha 21264A83341214.947

The Pentium can only come close to keeping up with the twice as high clock rate. However, since this has increased very rapidly, it is foreseeable that the Intel processors will also achieve the absolute speed crown in the next few years. However, it must also be said that this benchmark only indicates the computing speed. Most of the processors that appear here work as server processors. This is about the quick shoveling of data. Here processors have the advantage that have thick data buses quickly connected to the CPU. The Power PC 4 processors from IBM achieve very good values ​​here, while a Pentium 4 looks very old here thanks to a 16-bit RAMBUS. It is not for nothing that Intel manufactures separate server processors and chipsets for servers, which then cost more than 2 normal PCs (only processor and mainboard ...).

I have also published a book on the subject of computers. "Computer history (s)" contains what the title says: individual episodes from the early days of the PC. They are episodes from the résumés of Ed Roberts, Bill Gates, Steve Jobs, Stephen Wozniak, Gary Kildall, Adam Osborne, Jack Tramiel and Chuck Peddle and how they created the PC.

The book is rounded off by a brief explanation of the computer technology in front of the PC, as well as a summary of what happened afterwards when the claims were staked. I have tried to write a book that sets it apart from other books in that it not only tells history but also explains why certain products were successful, i.e. it deals with the technology.

The second edition, published in 2014, has been updated and slightly expanded. The most extensive change is a 60-page chapter on Seymour Cray and the supercomputers he designed. Due to price reductions for new editions, at 19.90 euros it is 5 euros cheaper than the first edition, despite the increased volume. It has also been published as an e-book for 10.99 euros.

More about the book on its own page.

Here is a complete overview of my books with direct links to the BOD bookshop. The books can also be ordered directly from bookstores (since I write about very special topics, you will hardly find them in the display) and they are of course available on the popular online platforms such as Amazon, Libri, Buecher.de.


© of the text: Bernd Leitenberger. Any publication of this text in whole or in part may only take place with the consent of the author.