The One Board CPU

What is it?

This document describes the ins and outs of how to build your own processor on one board. I'm busy building three different boards and this is the middle one. The others:
- My TTL CPU, the smallest one
- Big One Board CPU, the biggest one
The idea is that it should replace the processor of an host system.

Is it possible?

Building your own processor, is that possible? Yes, it is. First of all, forget that processors are just one single IC like the 6502, Z80 or 80486. Computers have been constructed already long before the IC was invented. Intel's 4004 is considered to be the first one-chip CPU in the world and its production started in 1971.
In the very early days computers were built using simple components like relays, tubes, transistors and small ICs. And if you don't believe me, please have a look at this site, a ring of self built processors. Very interesting indeed! It will show you, for example, processors using relays as logic gates.

A bit of history

Modern CPUs have a circuit called Instruction Decoder. Older computers worked in a different way. If you don't know how, have a look at My TTL CPU so you get an idea. These first processors needed quite a lot of instruction to perform certain tasks. Then some engineers noticed that all those tasks quite often had a lot of instructions in common. So they decided to combine group of instructions to new mega-instructions, nowadays known as opcodes.
This little processor has not much hardware. If you compare it with My TTL CPU, it isn't that much bigger. If you compare the schematics you will see that its left-bottom part is bigger but that is because it supports interrupts. OTOH the Program Counter is missing, that is done by the Instruction Decoder as well.

What processor is it going to simulate?

You can see I use the word simulate, not emulate. This design is about the smallest CPU with an Instruction Decoder I could think about that still has some extras. I simply hope that at the end it can execute 6502 instructions although I already know very very slow. The Big One Board CPU is meant to emulate (at least) a 6502 at 1 MHz or better.
I hope that I can run opcode of the Z80, 6800, 6809 and some other CPUs as well. But programming the FlashRAMs and testing it will also take quite some time. Then I prefer having a processor that is running 6502 very well.

Picture of the board.

Schematic of the board.

General description

The hardware is kept as simple as possible. The whole can be divided in five parts:
- the interface to the outside world
- the opcode circuit and the Instruction Decoder
- the registers
- the ALU
- the branch part

Please notice: no Stack Pointer circuit, no Program Counter, no special circuit for the Flag Register and no temporary address registers. All this is taken care of by the internal RAM, Instruction Decoder and the ALU.

The processor port: the interface to the outside world

This female AC DIN 64-pins connector is the interface to the outside world. As said before, the idea is that a you connect an interface board to it with a corresponding male connector. This board should contain all the needed hardware to connect the processor to the target system. This enables you to use it in a 6502 system, Z80 system and maybe even in a 8088 system (see later). The hardware can be as minimal as a single connector to connect it to a 6502 system or a connector and some extra AND gates to connect it to a Z80 system.

An explanation of the various pins:
- A0..A23: 24 address lines enable the processor to address up to 16 MB of RAM, ROM or I/O. If the processor is going to replace a real 6502 one day, the extra eight address lines won't be in the way, just leave them unconnected.
- D0..D7: the eight data lines
- RDY, NMI, IRQ, RESET, SO, AEC and SYNC are the well known 6502/6510 pins. Some of them can get a completely different function when the processor is used in a different system. For example, the SO input can be used as the FIRQ input when emulating a 6809 processor. AEC is not supported on the board, you have to tri-state the various busses on the interface board.
- PHI0, PHI1 and PHI2: the well known 6502 clock signals. - IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognise these lines; they are used to control memory and I/O operations. The 6502 doesn't need all these four signals, so why are they here then? One of my intentions is to use the processor on an IBM PC-XT board. But if I want to use the processor on a Z80 I need four signals as well: IORQ, MREQ, RD and WR. For a 6502 system I only would need MEMW which would act then as the well known R/W pin. But for a Z80 system these four lines won't work either. Reprogramming the micro code won't work: I can only set one line at the time but I need to be able to handle two. In this case four AND gates on the interface board do the trick to create the needed signals.
IC08A, a 2-to-4 demultiplexer, generates these four signals. But just like with a 8088, sometimes you don't want any of the signals to be active. That where control line I3 comes in: it can disable all lines, if needed.

The opcode circuit and the Instruction Decoder

The Instruction Decoder (ID) is the heart of the processor because it is the ID that decides how the processor behaves. The heart of the ID is in this case a number of FlashRAMs and those will contain the so called micro code.

What inputs do the FlashRAMs need?
- the opcode
- a counter
- a signal that tells there is a reset, interrupt or other important signal
- a branch signal

If you have a look at the schematic, you will notice the weird order in which the various signals have been connected to the FlashRAMs. The reason is quite simple: it made it for me much easier to design the PCB. I'm working on a program to fill the FlashRAMs with data and it was much easier to program a small subroutine that de-scrambled the various signals than connecting all signals on the PCB to the FlashRAMs in such a way that the schematic looked good.

The counter circuit

Executing the opcode is executing a collection of several micro instructions. Some of these micro instructions can be executed parallel, some have to be executed in a certain order and at the right step. The counter tells the ID what step has to be executed.
The base signal for the counter is the clock signal provided by the processor port, in this case PHI0. Because PHI0 must serve as input for many other gates in my design and not being sure if this will stress the original system, I decided to buffer it first with IC25D, a left over AND gate. The buffered PHI0 is called PHI0'.
The 6502 output PHI1, nothing more than an inverted PHI0, is created by IC24F. The PHI2 needed for this card and the motherboard is generated by inverter IC24B plus two AND gates (see later).

The actual counter is IC57, a 393 dual 4-bits binary counter with clear. We only use five of the eight outputs. So together with CLK0, = PHI2 after the OR gate, we now have a 6-bits counter where CLK0 is the Lowest Significant Bit (LSB),. These six bits, good for 31 clock cycles/63 steps, are directly fed to the FlashRAMs. I know this is an excessive number of steps but better be sure than sorrow later on.
The outputs of the 393, CLK1..5, are also fed to a 5-input NOR gate, a gate created out of a left over 139 de-multiplexer (IC08B) that now functions as a 3-input OR gate, and a real 3-input NOR gate (IC11C).
At the end of an instruction the 393 is cleared and all its output become (L). At that moment the output of this 5-input NOR gate becomes (H). This output is ANDed with PHI0' using AND gate IC25B. The output of IC25B is used as trigger to latch some data into two 573s, IC58 and IC12 (see later).

Resetting the counter
The idea is that two things have to be done at the last cycle of an instruction:
- prepare the board to read the opcode of the next instruction.
- reset the counter.
The last is done by setting ID output R1 (L). Its inverted (NAND gate IC10D) signal will activate the 393's CLR input. But the Reset signal must be able to reset the counter as well. therefore the Reset signal is fed to the other input of the already mentioned NAND gate, IC10D. The effect is that an active Reset keeps the 393 in reset mode; it will output 00000 as long as the reset signal is active.
Once the counter has been resetted by R1, the clear signal must be disabled again. The ID pulls R1 (L) at the last even step of an instruction. Then we will be sure that resetting the 393s will result in the fact that all CLKx signals will be (L) = step zero. All we have to do now is to program the FlashRAM so that R1 is (H) at step zero and the 393s can resume counting again.

Latching the opcode: the Instruction Register

IC58, a 573 8 bits latch, is the so called Instruction Register (IR): it latches the opcode which is only present at step 00001. The Instruction Register makes sure that the opcode is available for the other steps as well. The signal to clock the data into the 573 comes from the AND gate IC25B, as already mentioned above.

Latching SO, Reset, NMI and IRQ

My very first idea was to feed these signals directly to the inputs of the FlashRAMs. But when I started to design TTL6502, FlashRAMs were very expensive and I had to use EPROMs. But the reasonable priced EPROMs had too less inputs, thus another idea was needed.
Then it occurred to me that during a reset or interrupt, the opcode part of the ID wasn't used. So I decided to feed the FlashRAMs with these signals instead of the opcode. Now I only needed one signal the tell the ID whether it was dealing with an opcode or a reset, interrupt or equivalent signal: RSNI. IC12, a 573, latches Reset, SO, IRQ and NMI and feeds them to the ID when needed.

NMI
NMI, a negative edge triggered interrupt, is inverted first (IC24D) and fed to the positive edge triggered CLK input of a 74 D-flipflop (IC55A). Why not tying NMI directly to PRE and saving a gate? At the end of the process the ID has to reset the flipflop. Using PRE as input would immediately set the flipflop again and thus forcing the decoder to repeat the whole process.
After handling NMI, the flipflop is resetted by R2.

IRQ
IRQ is a level triggered interrupt and it is only checked at the end of PHI0. IRQ can also be disabled. NOR gate (IC11A) serves all these demands in one go. If the IRQ is active (L) at the end of PHI0 and the "disable interrupt" flag is inactive, read: (L), as well, then the moment PHI0' becomes (L) the rising edge of the output of the NOR gate will trigger the D-flipflop U14A.
After handling IRQ, the flipflop is resetted by R3.

SO
SO is a pin that sets the Overflow flag when activated. Our Flag Register is nothing more than a register inside a static RAM and directly setting a bit is out of the question. The nearest I could think of is treating it like an interrupts and giving it the highest priority.
I can be very short about the circuit: SO is treated the same way NMI is. IC24E is the inverter, IC36B the used flipflop and R0 the reset signal.

Reset
Instead of a 74- D-flipflop I use a flipflop created out of two NAND gates, IC10A and IC10B. After handling Reset, the flipflop is resetted by R0. Yes, this is the same signal as for resetting the SO flipflop. The idea behind it is simple: when resetting the processor the status of any other signal doesn't matter at all. So why not using an existing signal? For the same money I could also have used the one for NMI or IRQ.

The further processing of SO, Reset, IRQ and NMI
The four outputs of the flipflops are latched by a 573 (IC12) at step 00001 and NORed by IC53B and IC11b. If one or more of these signal are set = (H), IC11B's output becomes (L). The output of AND gate IC25B that is used to clock the data into IC12 and IC58, is also used to clock D-flipflop IC14B. The flipflop latches the output of the NOR gate.
The outputs of this flipflop now either enable the outputs of IC58 (= opcode) or those of IC12. The Q output of the flipflop, RSNI, is the signal that tells the ID whether a special signal is detected or not.

Why is D-flipflop IC14B needed? The output of IC11A can change in the middle of an instruction and therefore has to be preserved as well. We cannot use a free pin of IC12 because this output of the latch wouldn't be available all the time so we have to use a separate latch; IC14B in this case.
The flipflop does not have to be resetted: the moment all special signals have been handled, the D input becomes (H) and at step 000x, the latch for the opcodes is selected.

SYNC
SYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. Just by coincidence the output of IC11D is exactly what we need.

ReaDY signal
RDY is a signal to tell the 6502 to halt as long as RDY is (L). The outputs of the 393 are increased at every falling edge of PHI0'. The basic idea is that RDY prevents PHI0' to reach the counter. Before PHI0' is fed to the 393, it goes through an OR gate, IC53A. This gate ORs PHI2 with the \Q output of IC55B, a 74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1. If RDY is (L), output \Q becomes (H) and will block all pulses from PHI2 towards the 393 and ID by keeping CLK0 (H).
Maybe you noticed these AND gates, IC25A and IC25C, that seem to have no function. IMHO there was a risk that the blocking signal coming from D-flipflop IC55B would arrive too late and the 393 would still count up. I solved this, at least I hoped so, by delaying the PHI2 signal towards the OR gate using these AND gates.

The address bus buffers

Three 573 latches, IC18, IC19 and IC20, buffer the address lines A0..A23 toward the processor port. No special explanation needed, IMHO. But I found out that this processor design may have a flaw that could cause trouble: the address is outputted in two steps. An example: a program is running at address $121F and needs to read data from address $C00F. In the original design the address bus was changed in one single step because the final buffer was fed by another, for this task reserved buffer. Now assume there is a 6522 at address $C00x and one at $C01x. Updating the low byte address first would mean that the address $C01F appears on the bus first. That would cause register 15 to be read, unwantedly resetting some other registers. Updating the high byte address first would mean that the address $120F is read and assuming that that is a part of the program, I don't see any harm done. But, to be honest, I'm not really convinced if this solution works out fine. Time will tell.

The data bus buffers

Two 573s, IC16 and IC17, take care of the data bus by latching and buffering the data coming from and going to the internal data bus.
- IC17 takes care of reading and latching the data coming from outside. Why is latching needed? That is to make sure that the data is also available to the processor after PHI0 has become (L). The clocking is done by PHI0', the buffered PHI0 signal.
- IC16, takes care of latching the internal data and presenting it to the outside world. Both the latching and enabling is done by the ID.

Registers

Every processor has several registers. The Program Counter of a 6502 and Z80 is 16 bits wide, as well as the Stack Pointer of the Z80. The one of the 6502 is only 8 bits wide. The 6502 has only six registers, the Z80 twelve 16-bits and four 8-bits ones. The use of 74ALS573s to create the registers would blow my mini design out of proportion. And as already mentioned, more practical was to use a standard memory IC. The smallest 8-bits one I know of is the already mentioned 6116, IC21, a 2 KB SRAM. To make sure that there is enough room for simulating a 6502, I reserved 64 bytes of RAM.
Using one RAM has one disadvantage: internal memory transfers like TAX will take at least two steps: one step to copy the byte from the location inside the RAM dedicated to register A into one of the ALU buffers and one step to copy if from the ALU into the location inside the RAM dedicated to register X.

The ALU

The ALU, short for Arithmetic Logic Unit, is the calculator of the processor. But, in this case, one with extended functions. The 74181 is a real ALU IC and is used by me in My TTL CPU. The problem is that the ALU of the 6502 also has a so called BCD mode (Binary Conversion to Decimal): it is able to calculate in decimal mode. And so far I haven't seen any ALU IC capable of doing that. I have thought about using 181s plus an extra circuit for the BCD mode but that would expand the design too much. So I decided to use FlashRAMs instead. The idea is just to program every possible situation into the FlashRAMs.

The needed inputs for the ALU FlashRAMs
As you can see I use two cascaded FlashRAMs, IC42 and IC43, each handling four data bits. Handling eight bits would mean I would need a FlashRAM with at least 24 inputs but I don't know of one that can work with +5 Volt. Cascading two FlashRAMs and each handling just four bits has the same result.

The FlashRAMs need to be able to handle at least the next commands:
- ADC
- AND
- BIT
- ASL
- CMP
- DEC
- EOR
- INC
- LSR
- ORA
- ROL
- ROR
- SBC
Four selection bits will cover the above 12 commands.

An extra bit is needed to deal with the decimal mode. It has to be an extra bit because it is an external input coming from the Flag Register.
After every operation the processor may want to know if the result was zero. In this case the second FlashRAM first has to know whether the result of the first FlashRAM was zero or not.
In case of an addition, subtraction, rotation or a shift to the left, the second FlashRAM needs the Carry of the first FlashRAM and the first one needs the Carry of the Flag Register. In case of a rotation or shift to the right, the first FlashRAM needs the Carry of the second FlashRAM and the second one needs the Carry of the Flag Register. This means two inputs for two different Carrys for both the FlashRAMs.

The result (for the moment): - Zero flag
- Carry flag (2*)
- Decimal mode flag
- 4 bits 1st operand
- 4 bits 2nd operand
- 4 command bits

A 27512 EPROM would be sufficient but I decided to use AM29F020s anyway for three reasons:
- they are faster than EPROMs
- they are much faster to reprogram than EPROMs
- because of the two extra inputs I have room for extra functions

Static addresses

After a reset, in case of a 6502 the address bus has to output the addresses $FFFC and $FFFD. And when serving an interrupts or accessing the stack, other static addresses have to be outputted. What circuit is going to take care of that all? I decided to let the ALU perform this function as well. And now the two extra inputs of the 29F020s are more than welcome!

Increasing an address

In this simplified design the ALU does ALL the calculations. If the Program counter needs to be incremented, the ALU will do that. Most of the time only the low byte of the address needs to be increased. But if a page boundary is crossed, that is the low byte goes from $FF to $00, the ALU will generate a Carry. This will be a sign for the ID to increment the high byte of the address as well. We have to notify the ID that in one or another way but we cannot use the Flag Register as this isn't a regular Carry. A 74LS74 D-flipflop, IC36, stores the state of this Carry and a 151 multiplexer, IC31, sends it on request to the Instruction Decoder.
The ALU does also all the calculations in case of a branch. But in this case a branch can be either positive or negative. In case of a negative branch, the high byte of the Program Counter may have to be decreased. Again the ID has to be informed of this. The 151 multiplexer does this by selecting the MSB of the second operand as BRAD signal.

The Flag Register

In case of a conditional branch, the processor needs the data of the Flag Register to decide whether it has to make a jump, or not. But how is the Instruction Decoder notified of the state of each flag? When needed, the contents of the Flag Register inside the SRAM is copied to a 74ALS573 data latch, IC23. The processor only needs to know the state of only one flag at a time. The 74LS151 8-to-1 multiplexer, IC31, takes care of that by selecting the needed flag. It outputs the state of this flag as the signal BRAD (BRAnch Data) towards the Instruction Decoder.

The 74ALS573 is not only needed for branching but it also makes sure that the state of the Interrupt Disable bit is outputted all the time.

Testing the card

One question to myself was: "How do I test the processor?". I just happened to have some PC ISA cards with four 8255s on them. These four 8255s are good for 96 input and outputs, presented to the outside world by two 50-pins headers. In my case I only need one of these headers:

In my design the two 8255s behind CN1 are used to simulate the host system. This is done by connecting it to the processor connector. As you can see one 8255 is used to read the first 16 address lines and to handle the data bus. The second 8255 handles the last eight address lines and all the control lines. The reason I teamed the data bus up with the first 16 address lines is that the 8255 has a disturbing feature, one cannot call a bug because it is documented:

The moment you change the data direction of a port, any other port that has been defined as output, will be resetted.

The two ports handling the first 16 address lines are input ports and therefore are not affected by the above behavior when changing the direction of the port handling the data bus.

The outputs of the Instruction Decoder

I0: \

I1:  - IC08A /74LS139 generates MEMR, MEMW, IORD, IOWR

I2: output data towards data bus, 573 / IC16

I3: enable IC08 / 74LS139

I4: C-input 74LS138 / IC04

I5: \

I6:  - select reason for branch, 151 / IC31

I7: /

I8: A-input 74LS138 / IC04

I9: B-input 74LS138 / IC04
I10..15: select function for ALU, 29F020 / IC32 and IC33


I16..21: select register in SRAM, 6116 / IC21

I22: write enable SRAM, 6116 / IC21

I23: select SRAM, 6116 / IC21


I24..26: select Cx from IC13 / 138

I27: enable IC13 / 138

I28: clock output ALU into IC05 (inverted) PLUS

     clock flags ALU into IC09 (inverted)

I29: enable output data towards data bus, 573 / IC16

I30: free

I31: free




inverted: !!!

C0: clock data into address buffer A0..7 573 / IC18

C1: clock data into data bus buffer, 573 / IC16

C2: clock data into hardware flag register 573 / IC23

C3: clock data into address buffer A8..15 573 / IC19

C4: clock Carry into D-flipflop, 74 / IC36A

C5: clock data into address buffer A16..23 573 / IC20

C6: clock data into ALU buffer A, 573 / IC27

C7: clock data into ALU buffer B, 573 / IC28




Outputs of 74LS138 / IC04

R0: reset Reset (IC10A and IC10B) and SO flipflop, 74 / IC36B

R1: reset 393 counters

R2: reset NMI flipflop, 74 / IC55A

R3: reset IRQ flipflop, 74 / IC14A

R4: read flags from ALU, 573 / IC05

R5: read result from ALU, 573 / IC05

R6: read data from data bus, 573 / IC17

Remarks

I already mentioned that I wanted to use the processor in combination with a PC-XT motherboard. IMHO it could be done by replacing the 8088 itself by the processor. But it would need some additional hardware on the board to be placed on the motherboard. One reason the extra hardware is needed: the first 8 bits of the address bus and the data bus have been multiplexed. My idea is to remove various ICs like the 8288 and various latches and let the processor control the lines directly, for example the MEMx and IOxx lines.

Afterword

I think that it is clear that the ALU does most of the work. Not just the calculations like those of a calculator but also thing like increasing the Program Counter and in or decreasing the Stack Pointer.
Did I oversimplify the things too much? No, I didn't. It is in fact the way the first computers worked. I read somewhere that some Americans managed to create a working computer in the early 50's using 'only' 3000 radio tubes. That was only possible using circuits over and over again within the same design.
Then what about speed? The only 'computers' that were around these days were humans with mechanical calculators. The first American computers were used to calculate artillery trajectories. It took some humans several hours to calculate one trajectory, something the ENIAC did in 20 seconds. And compared to this computer, the processor is far better (I think).

Having questions or comment? You want more information?
You can email me here.