The Big One Board CPU

Froosen

What is it?

This document describes the ins and outs of hoe to build your own processor on one board: the Big One Board CPU (BOBC). I'm busy building three different boards and this is the biggest one. The others:
- My TTL CPU, the smallest one
- One Board CPU, the middle one

Froosen for the moment....

Is it possible?

Building your own processor, is that possible? Yes, it is. First of all, forget that processors are just one single IC like the 6502, Z80 or 80486. Computers have been constructed already long before the IC was invented. Intel's 4004 is considered to be the first one-chip CPU in the world and its production started in 1971.
In the very early days computers were built using simple components like relays, tubes, transistors and small ICs. And if you don't believe me, please have a look at this site, a ring of self built processors. Very interesting indeed! It will show you, for example, processors using relays as logic gates.

A bit of history

Once I wanted to build My Own 6502. Nice goal but with eight cards on the end, a steep one. So I decided to design a CPU that could run 6502 code but more or less with as less ICs as possible: One Board CPU.
The I learned that producing PCBs wasn't as expensive anymore as it used to be so I deciced to have another go at a TTL CPU that would be able to replace a real 6502. Plus what extras: I wanted to implement eight extra address lines some extra control lines that would enable me to use this CPU on IBM (compatible) PC-XT boards and maybe even on 80286 boards. But still the original design was too big. So decreased the size by:
- omitting the single registers and only using the SRAM
- omitting the address adder
- omitting the 573 latches after the FlashRAMs of the Instruction Decoder
An explanation for the last: I wondered why some designs were not afraid for latches until I found one where the designer explained things. Glitches could only happen when an output was held (L). So I designed things so that any output in rest is held (H).

What processor is it going to emulate?

As you could read above, the whole idea started with the goal to emulate a 6502. Various factors determine how this self built CPU will behave. The most important one is the micro code. Change the micro code for a certain opcode and then CPU will behave differently.
My idea is to start with the 6502 as base. But the 6502 has about hundred unused opcodes and one of my goals to implement her my own opcode. I already mentioned the IBM boards above which means I need at least the equivalent of the IN and OUT instruction of the Intel 8088 to be able to uses the I/O on these boards, but I also must be able to handle 1 MB of ROM and RAM.
But then what about 6502 compatibility? I already know it won't be cycle compatible: by scratching some of the hardware mentioned above, I'm sure that some instructions will need more cycles then the original ones. So I can forget 100% real emulation IMHO it will still be fast enough so it won't hardly be noticable.
As said, my intention is to use it on a PC-XT system but still using 6502 code plus some extensions to handle the extra address lines and I/O part. But maybe it is even possible to write micro code in the future that emulates the original 8086 code.

General description

The hardware is kept as simple as possible. The whole can be divided in five parts:
- the Instruction Decoder
- the interface to the outside world
- the registers
- the ALU
- the branch part

Instruction Decoder

The Instruction Decoder (ID) is the heart of the processor because it is the ID that decides how the processor behaves. The heart of the ID is a number of FlashRAMs and that will contain the so called microcode.

What inputs do the FlashRAMs need?
- the opcode
- a counter
- a signal that tells there is a reset, interrupt or other important signal
- a branch signal

I already mentioned the use of a GAL to simplify the hardware but using a GAL also has a disadvantage: what is the best way to describe what is going on inside it? The idea of using the last design that didn't use the GAL won't work: I made some changes in the mean time. So the next solution: take this last design and add the changes:

The counter circuit

Executing the opcode is executing a collection of several micro instructions. Some of these micro instructions can be executed parallel, some have to be executed in a certain order and at the right step. The counter tells the ID what step has to be executed.
The base signal for the counter is the clock signal provided by the processor port, in this case PHI0. Because PHI0 must serve as input for many other gates in my design and not being sure if this will stress the original system, I decided to buffer it first with IC13C, a left over AND gate for the original and IC51C, an inverter for the GAL version. The fact that the inverter inverts the signal is no problem; the GAL takes care of it.
The 6502 outputs PHI1, nothing more than an inverted PHI0 is created by inverting PHI0 using IC30A. The PHI2 needed for this card and the motherboard is generated by inverter IC30B.
It should be quite obvious that the GAL takes care of these operations. I won't mention this anymore in the rest of this document.

The actual counter is IC57A, a 393 4 bits binary counter with clear. Together with PHI2 we now have a five bits counter where PHI0' is the (Lowest Significant Bit (LSB), now called CLK0. These five bits, good for 15 clock cycles/31 steps, are directly fed to the FlashRAMs.
The outputs of the 393 are also fed to a 4 input NOR gate, a gate created out of an OR gate (IC53B) and a 3 input NOR gate (IC11B). At the end of an instruction the 393 is cleared and all its output become (L). At that moment the output of this 4 input NOR gate becomes (H). This output is ANDed with PHI0' using AND gate IC13A. The output of IC13A is used as trigger to latch the data into two 573s, IC58 and IC12 (see later).

Resetting the counter
The idea is that two things have to be done at the last cycle of an instruction:
- prepare the BOBC to read the opcode of the next instruction
- reset the counter
The last is done by setting ID output R1 (H). But first some more processing has to be done.
First, the Reset signal must be able to reset the counter as well. So before the signal reaches the 393, it has to be ORed (IC53C) with the inverted (IC30H) Reset signal. The effect is that an active Reset keeps the 393 in reset mode, it will output 0000 as long as the reset signal is active.
Second, once the counter has been resetted by R1, the clear signal must be disabled again. Having no 573 in the Instruction Decoder anymore, that is a piece of cake now: one of the things step 0000x does is resetting R1. So the moment the clear signal actually makes the 393 to output 0000, R1 is resetted. Mission achieved!

Latching the opcode: the Instruction Register

IC58, a 573 8 bits latch, is the so called Instruction Register (IR): it latches the opcode which is only present at step 00001. The Instruction Register makes sure that the opcode is available for the other steps as well. The signal to clock the data comes from the AND gate IC13C as already mentioned above.

Latching FIRQ, Reset, NMI and IRQ

My very first idea was to feed these signals directly to the inputs of the FlashRAMs. But when I started to design TTL6502, FlashRAMs were very expensive and I had to use EPROMs. But the reasonable priced EPROMs had too less inputs, thus another idea was needed.
Then it occurred to me that during a reset or interrupt, the opcode part of the ID wasn't used. So I decided to feed the FlashRAMs with these signals instead of the opcode. Now I only needed one signal the tell the ID whether it was dealing with an opcode or a reset, interrupt or equivalent signal: RIND. IC12, a 573, latches Reset, SO, IRQ and NMI and feeds them to the ID when needed.

NMI
NMI, a negative edge triggered interrupt, is inverted first (IC54C) and fed to the CLK input of a 74 D-flipflop (IC55A). Why not tying NMI directly to PRE and saving a gate? At the end of the process the ID has to reset the flipflop. Using PRE as input would immediately set the flipflop again and thus forcing the decoder to repeat the whole process.
After handling NMI, the flipflop is resetted by R0.

IRQ
IRQ is a level triggered interrupt and it is only checked at the end of PHI0. IRQ can also be disabled. NOR gate (IC11C) serves all these demands in one go. If the IRQ is active (L) at the end of PHI0 and the "disable interrupt" flag is inactive (read: (L)) as well, then the moment PHI0 becomes (L) the rising edge of the output of the NOR gate will trigger the D-flipflop U14A.
After handling IRQ, the flipflop is resetted by R3.

FIRQ
FIRQ is an input that is in fact typical for the Motorola 650x series. So why is it here? I copied the circuit around the GAL from the One Board CPU to the BOBC without realising at all at that moment that the Flag Register of the BOBC had its own SO input. When connecting the flipflop to the SO pin of the CPU port, I noticed the mistake. I could omit this part from the whole circuit but it would not save me any parts. One idea was to connect this board to a Eurocom-1, a 6802 equivalent of the KIM or Micro-Professor, and that triggered the idea to use this part for the FIRQ interrupt signal.
To be very short about the circuit: FIRQ is treated the same way NMI is. IC24E is the inverter, IC52B the used flipflop and R0 the reset signal.
Please notice that GAL-OnePCB.png mentions "SO" where you should read "FIRQ" because it is also used for the OBC.

Reset
Reset doesn't suffer the above problems because the counter cannot start until the Reset signal is inactive again. Therefore it can be connected directly to the PRE input.
After handling Reset, the flipflop is resetted by R0. Yes, this is the same signal as for resetting the FIRQ flipflop. The idea behind it is simple: when resetting the BOBC the status of any other signal doesn't matter at all. So why not using an existing signal? For the same money I could also have used the one for NMI or IRQ.
Remark: for the GAL version the flipflop for the Reset is placed inside the GAL. This saved another IC.

The further processing of FIRQ, Reset, IRQ and NMI
The four Q outputs of the flipflops are latched by a 573 (IC12) at step 00001 and NORed by IC53A and IC11A. If one or more of these signal are set = (H), IC11A's output becomes (L). The output of AND gate IC13A that is used to clock the data into IC12 and IC58, is also used to clock D-flipflop IC14B. The flipflop latches the output of the NOR gate.
The outputs of this flipflop now either enable the outputs of IC58 (= opcode) or those of IC12. The Q output of the flipflop, RIND, is the signal that tells the ID whether a special signal is detected or not.

Why is D-flipflop IC14B needed? The output of IC11A can change in the middle of an instruction and therefore has to be preserved as well. We cannot use a free pin of IC12 because this output of the latch wouldn't be available all the time so we have to use a separate latch; IC14B in this case.
The flipflop does not to be resetted: the moment all special signals have been handled, the D input becomes (H) and at step 000x, the latch for the opcodes is selected.

SYNC
SYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. Just by coincidence the output of IC11D is exactly what we need.

ReaDY signal
RDY is a signal to tell the 6502 to halt as long as RDY is (L). The outputs of the 393 are increased at every falling edge of PHI0'. The basic idea is that RDY prevents PHI0' to reach the counter. Before PHI0' is fed to the 393, it goes through an OR gate, IC1B/IC30C (originally a 2 input OR gate). This gate ORs PHI2 with the \Q output of IC55B, a 74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1. If RDY is (L), output \Q becomes (H) and will block all pulses from PHI2 towards the 393 and ID by keeping CLK0 (H).
In the original GALless design there was a risk that the blocking signal coming from the flipflop arrived too late and the 393 would still count up. I solved this, at least I hoped so, by delaying the PHI2 signal towards the OR gate. But I had no idea how to implement a delay in a GAL. So I came with another solution: inside the GAL I NORed RDY with PHI1 using IC2A and fed the result to the 3 input OR gate, IC1B/IC30C, already mentioned above. The idea: if RDY becomes active (L) when PHI2 is (H), read: PHI1 is (L), then the OR gate already starts blocking PHI2. If RDY becomes (H) again before the falling edge of PHI2, no harm is done and the 393 continues counting. If RDY is still (L) at the falling edge of PHI2, the flipflop makes sure PHI2 remains blocked.

The processor port: the interface to the outside world

This female AC DIN 64-pins connector is the interface to the outside world. As said before, the idea is that a you connect a board to it with a corresponding male connector. This board should contain all the needed hardware to connect the OBC to the target system. This enables you to use the BOBC in a 6502, 680x or Z80 system and even in a 8088 system (see later). The hardware can be as minimal as a single connector to connect it to a 6502 system or a connector and some extra gates to connect it to a Z80 system.

An explanation of the various pins:
- A0..A23: 24 address lines enable the OBC to address up to 16 MB of RAM, ROM or I/O. If the OBC is going to replace a real 6502 one day, the extra eight address lines won't be in the way, just leave them unconnected.
- D0..D7: the eight data lines
- RDY, NMI, IRQ, RESET, SO, AEC and SYNC are the well known 6502/6510 pins. Some of them can get a completely different function when the OBC is used in a different system. For example, the SO input can be used as the FIRQ input when emulating a 6809 processor.
- PHI0, PHI1 and PHI2: the well known 6502 clock signals.
- FIRQ: a signal needed for connecting to a 680x system.
- IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognise these line; they are used to control memory and I/O operations. The 6502 doesn't need these four signals, so why? One of my intentions is to use the BOBC on an IBM PC-XT board. But if I want to use the BOBC on a Z80 I need four signals as well: IORQ, MREQ, RD and WR. For a 6502 system I only would need MEMW which would act then as the well known R/W pin. But for a Z80 system this is not good enough. Reprogramming the micro code won't work: I can only set one line at the time but I need to be able to handle two. In this case just two AND gates on the interface board would do the trick to create the two needed signals.

The Stack Pointer

The Stack Pointer (SP) takes care of providing the address when pushing data to or pulling data from the stack. The heart of the SP is made out of two pairs of cascaded 191s (IC9..12), preloadable 4-bits binary up/down counters. Because the counters don't have any tristate capabilities, two 541's 8-bit buffers (IC3, IC4) have been added to provide this function
Why two pairs of 191s, this make 16 bits? Indeed the 6502 has only an 8 bits SP but the 8088, Z80 and 6809 have a 16 bits one. To avoid problems when used purely in the 6502 mode, jumper J1 makes sure that page overflows due to programming errors won't reach the hi-byte of the SP.

The Program Counter

The Program Counter (PC) takes care of the address lines during normal operations. The heart of the PC is made out of four cascaded 191s (IC1, IC2, IC7, IC8). In contrary to the original TTL6502 design, they are only used in the "count up" mode.
Again, because the counters don't have any tristate capabilities, two 541's 8-bit buffers (IC15, IC16) have been added to provide this function.
The two 573's (IC13, IC14) are needed when the PC has to be fed with a new address. For example, in case of a JMP or JSR instruction the new address has to be stored temporarily because, during loading the two bytes, the original address is still needed. The ID takes care of copying their contents to the 191s at the end of the instruction.

If an interrupt has to be served, the CPU must be able to read the output of the PC somehow. Two 541s, IC17 and IC18, take care of that, they output the momentary address of the PC to the data bus when needed.

Temporary address

During the execution of the instruction "LDA $1000" the CPU needs to output the address $1000 to be able to read the data that is to be found there. We could use the PC or or SP for this purpose but that would mean we had to store the original address first and to restore it after the action. Better it is to have a separata buffer. In the TTL6502 design the "Address adder" took care but I removed that part and replaced it with just two 573 latches, IC56 and IC58.

The final address buffers

Two 573 latches take car of outputting either the temporary address, PC or SP towards the processor port. I have been thinking about omitting these two buffers and connecting the the temporary address, PC and SP directly to the processor port. But that would make it more difficult to save the PC to the stack for example. Without the extra buffers the PC has to output the address, it has to be saved somewhere, then the SP has to be activated and the saved value has to be written to the data bus. With the two extra buffers the address can be read directly using IC16/IC18 and outputted to the data bus in one go without any delay.

The address lines A16..23

I use two 573 latches and one 541 buffer to generate the address lines A16..23 directly towards the processor port. One 573 is used for accessing the program, one can consider it as an extension of the PC, the other for accessing data, a kind of extension of the temporary address.
One idea was emulating the 65816 more or less. Its stack is always found in the first 64 KB of memory. So one idea I copied from the TTL6502 is to use a 541 to output $00 at the address lines A16..23.

The data bus buffers

Two 573s, IC25 and IC26, take care of the data bus by latching and buffering the data coming from and going to the processor port.
- IC25 takes care of reading and latching the data coming from the processor port. Why is latching needed? That is to make sure that the data is also available to the BOBC after PHI0 has become (L). The clocking is done by PHI0', a signal generated by the GAL.
- IC26, takes care of latching the internal data and presenting it to the processor port. Both the latching and enabling is done by the ID. An 139 used as OR gate, IC59BC, makes sure that IC26 is disabled as well when the AEC input is activated.

Registers

Every processor has several registers. The use of 74ALS573s, an 8-bits latch, to create the various registers would blow this mini design out of proportion. More practical was to use a standard memory IC. The smallest 8-bits one I know of is a 6116, a 2 KB SRAM. But I chose for a 62256 (IC38), a 32 KB RAM, for the simple reason that it is smaller in size.

The ALU

The ALU, short for Arithmetic Logic Unit, is the calculator of the processor. But, in this case, one with extended functions. The 74181 is a real ALU IC and is used by me in My TTL CPU. The problem is that the ALU of the 6502 also has a so called BCD mode (Binary Conversion to Decimal): it is able to calculate in decimal mode. And so far I haven't seen any ALU IC capable of doing that. I have thought about using 181s plus and extra circuit for the BCD mode but that would expand the design too much. So I decided to use FlashRAMs here as well. The idea is just to program every possible situation into the FlashRAMs.

The needed inputs for the ALU FlashRAMs
As you can see I use two cascaded FlashRAMs, IC42 and IC43, each handling four data bits. Handling eight bits would mean I would need a FlashRAM with at least 24 inputs but I don't know of one that can work with +5 Volt. Cascading two FlashRAMs and each handling just four bits has the same result.

The FlashRAMs need to be able to handle at least the next commands:
- ADC
- AND
- BIT
- ASL
- CMP
- DEC
- EOR
- INC
- LSR
- ORA
- ROL
- ROR
- SBC
Four selection bits will cover the above 12 commands.

An extra bit is needed to deal with the decimal mode. It has to be an extra bit because it is an external input coming from the Flag Register.
After every operation the processor may want to know if the result was zero. In this case the second FlashRAM first has to know whether the result of the first FlashRAM was zero or not.
In case of an addition, subtraction, rotation or a shift to the left, the second FlashRAM needs the Carry of the first FlashRAM and the first one needs the Carry of the Flag Register. In case of a rotation or shift to the right, the first FlashRAM needs the Carry of the second FlashRAM and the second one needs the Carry of the Flag Register. This means two inputs for two different Carrys for both the FlashRAMs.

The result (for the moment): - Zero flag
- Carry flag (2*)
- Decimal mode flag
- 4 bits 1st operand
- 4 bits 2nd operand
- 4 command bits

A 27512 EPROM would sufficient but I decided to use AM29F020s anyway for three reasons:
- they are faster than EPROMs
- they are much faster to reprogram than EPROMs
- because of the two extra inputs I have room for extra functions

Static addresses

After a reset, the address bus has to output the addresses $FFFC and $FFFD. And when serving an interrupts or accessing the stack, other static addresses have to be outputted. What circuit is going to take care of that all? I decided to let the ALU perform this function as well. And now the two extra inputs of the 29F020s are more than welcome!

The Flag Register

This design is based on the one of My Own 6502. If you see that then quite possibly your first impression will be: this is a huge design! Yes, it is, and with reason. This design enables the TTL6502 to update one or more registers in just one step. The 6502 and the M02 mentioned above need more. I kept this design because it fitted on the board I found. It certainly did after the only alteration: I replaced ten needed OR gates with just one GAL.

The core of the Flag register (FR) is made out of ten 74 D-flipflops. Seven of them are part of the official FR, the other three are needed for other functions.

In most cases these flipflops can be set or reset either using the Preset or Clear input or using the data input. The Preset or Clear input will be used when there is a specific need for it. For example the opcode SEC (= Set Carry) demands that the Carry Flag is set which means that the Preset input has to be asserted.
After an addition done by the opcode ADC, the resulting Carry has te be stored. As we have no idea what the value of this Carry is, we need need to use the data and clock input. Another and more obvious use of the data input is when the content of the FR needs to be restored during the execution of the PLP or RTI instruction.

How is the data clocked into the individual flipflops? Ten outputs of the ID plus ten OR gates are used to tell a flipflop if the value at the the data input has to be clocked or not. Each ID output goes to its own OR gate. The other input of the OR gate is the inverted PHI0 signal, /PHI0. At the falling edge of PHI0 = the rising edge of /PHI0 the data is clocked into the flipflop. That is, if the ID output at the other pin of the OR gate is (L). If it (H), nothing happens at all.
As you probably noticed, there are no OR gates present in the schematic. As already mentioned before, the ten needed OR gates have been implemented inside the GAL.
Using /PHI0 as the clock signal has one disadvantage: it means that data can only be clocked at the start of an even step. It is just something we have to live with. Why did I choose for /PHI0? Mainly for two reasons:
- The data is clocked before any other change in the circuit happens. /PHI0 will end up at the 393 counter after having gone through the GAL and that will lead to a count=up which on its turn will change the output ot ID FlashRAMs. Whatever change in our processor will happen now, we can be sure that the data is already safe inside the flipflops.
- In case of PLP, data read from the computer's memory is stored in the FR at the end of the same step. That is also the case if this data has to be stored in a register and we need to save the Zero and Negative flag. And if the ALU has to do something with the data first, in most cases flags and/or data can be saved at the end of the same step again.

As said before, either data or flags can be stored in the FR. That means we need a mechanism to distinguish between these two sources. This is where IC22, a 157 quadruple 2-to-1 multiplexer comes in. It enables us to choose between:
- Data bit 0 or the Carry flag coming from the ALU flag part
- Data bit 1 or the Zero flag coming from the ALU flag part
- Data bit 6 or the Overflow flag coming from the ALU flag part
- Data bit 7 or the Negative flag coming from the ALU data part
The Break, Decimal and Disable Interrupt flags will only need data input. (bit 5: see later)

In case of a branch the ID needs the state of a particular flag of the FR to decide whether the branch should be taken or not. IC72, a 151 8-to-1 multiplexer, enables the decoder to select the flag needed for a particular branch. The result, BRAD, is fed to the ID.

IC21, a 245 buffer, enables BOBC to read the contents of the Flag Register.

What about bit 5 of the FR? In a real 6502 bit 5 isn't used, in a 65816 it is. In the original TTL6502 I just happened to have a left over flipflop, so why not use it? The original idea was that a free input at the 151 multiplexer enabled this bit to use BRAD to tell the ID if an 8 bits or 16 bits operation had to be performed. IMHO it was a too good idea to omit just to save the halve of a 74LS74 IC.
Remark: combining this halve and the one needed for the FIRQ I could save an IC. But again: having the room I decided to keep these features.

Testing the BOBC

One question to myself was: "How do I test the BOBC?". I just happened to have some PC ISA cards with four 8255s on them. These four 8255s are good for 96 input and outputs, presented to the outside world by two 50-pins headers. In my case I only need one of these headers:

In my design the two 8255s behind CN1 are used to simulate the host system. This is done by connecting it to the processor connector. As you can see one 8255 is used to read the first 16 address lines and to handle the data bus. The second 8255 handles the last eight address lines and all the control lines. The reason I teamed the data bus up with the first 16 address lines is that the 8255 has a disturbing feature, one cannot call a bug because it is documented:

The moment you change the data direction of a port, any other port that has been defined as output, will be resetted.

The two ports handling the first 16 address lines are input ports and therefore are not affected by the above behaviour when changing the direction of the port handling the data bus.

The outputs of the Instruction Decoder

Remarks

I already mentioned that I wanted to use the BOBC in combination with a PC-XT motherboard. IMHO it could be done by replacing the 8088 itself by the BOBC. But it would need some additional hardware on the board to be placed on the motherboard. One reason the extra hardware is needed: the first 8 bits of the address bus and the data bus have been multiplexed. My idea is to remove various ICs like the 8288 and various latches and let the OBC control the lines directly, for example the MEMx and IOxx lines.

Having questions or comment? You want more information?
You can email me here.