Ruud's Commodore Site: The Mini6502 Home Email

The Mini6502




Updated: 8 april 2022


What is it?

This document describes the ins and outs of how to build your own TTL CPU on one board. I'm busy building three different boards and this is the middle one. The others:
- My TTL CPU, the smallest one
- Big6502, the biggest one
The idea is that it should replace the processor of an host system.

Remark: this page describes version 2. The differences with version 1 are minimal. I addes the Z80 signals M1 and HALT to the processor port and used the already available hardware to generate these signals.


Is it possible?

Building your own processor, is that possible? Yes, it is. First of all, forget that processors are just one single ICs like the 6502, Z80 or 80486. Computers have been constructed already long before the IC was invented. Intel's 4004 is considered to be the first one-chip CPU in the world and its production started in 1971.
In the very early days computers were built using simple components like relays, tubes, transistors and small ICs. And if you don't believe me, please have a look at this site, a ring of self built processors. Very interesting indeed! It will show you, for example, processors using relays as logic gates.


A bit of history

Modern CPUs have a circuit called the "Instruction Decoder". Older computers worked in a different way. If you don't know how, have a look at My TTL CPU so you get an idea. These first processors needed quite a lot of instruction to perform certain tasks. Then some engineers noticed that all those tasks quite often had a lot of instructions in common. So they decided to combine group of instructions to new mega-instructions, nowadays known as opcodes.
This little processor does not have much hardware. If you compare it with the right part of My TTL CPU, it isn't that much bigger. If you compare the schematics you will see that its left-bottom part is bigger but that is because it supports interrupts. OTOH the Program Counter is missing, that is done by the Instruction Decoder as well.


What processor is it going to simulate?

You can see I use the word simulate, not emulate. This design is about the smallest CPU with an Instruction Decoder I could think about and I simply hope that in the end it can execute 6502 instructions. But I already know that it will be done much slower than the original 6502. The Big6502 is meant to emulate (at least) a 6502 at 1 MHz or better.


Picture of the board.

Schematic of the board.



The idea behind this design

It all started with the TTL6502, build your own 6502. But it was too big, certainly 15 years ago. Two years ago I started with My TTL CPU and Big6502. Then I had this idea: what about a minimal TTL-CPU but with an Instruction Decoder?

The basic idea to minimalize things is: avoid hardware if it can be calculated by an Instruction Decoder. A Program Counter and a Stack Pointer are things that can be calculated IMHO so why should I build it in hardware? Of course there is a penalty: speed.

What are the things (I think) that cannot be calculated?
- To calculate things at all we need an ALU in the first place.
- To simulate a 6502 or other modern CPU, we need registers.
- To decide whether a branch should be taken or not, we need a kind of decision maker. I have thought about using the ALU here as well but I needed extra parts anyway, for example to remember the decision, so sticked with my first solution (discussed later).


General description

The hardware is kept as simple as possible. The whole can be divided in five parts:
- the interface to the outside world
- the opcode circuit and the Instruction Decoder
- the registers
- the ALU
- the branch part

Please notice: no Stack Pointer circuit, no Program Counter, no special circuit for the Flag Register and no temporary address registers. All this is taken care of by the internal RAM, Instruction Decoder and the ALU.


The processor port: the interface to the outside world

The used female AC DIN 64-pins connector is the interface to the outside world. As said before, the idea is that a you connect an interface board to it with a corresponding male connector. This board should contain all the needed hardware to connect the processor of the target system. This enables you to use it in a 6502 system, Z80 system and maybe even in a 8088 system (see later). The hardware can be as minimal as a single connector to connect it to a 6502 system or a Z80 system.

An explanation of the various pins:
- A0..A23: 24 address lines enable the processor to address up to 16 MB of RAM, ROM or I/O. If the processor is going to replace a real 6502 one day, the extra eight address lines won't be in the way, just leave them unconnected.
- D0..D7: the eight data lines
- RDY, NMI, IRQ, RESET, SO, AEC and SYNC are the well known 6502/6510 pins. Some of them can get a completely different function when the processor is used in a different system. For example, the SO input can be used as the FIRQ input when emulating a 6809 processor. AEC is not supported on this board, if really needed, you have to tri-state the various bussesand signals on the interface board.
- HALT, M1: the well known Z80 signals.
- PHI0, PHI1 and PHI2: the well known 6502 clock signals. - IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognise these lines; they are used to control memory and I/O operations. The 6502 doesn't need all these four signals, so why are they here then? One of my intentions is to use the processor on an IBM PC-XT board. But if I want to use the processor on a Z80 I need four signals as well: IORQ, MREQ, RD and WR. Just the matter of renaming these signals. For a 6502 system I only would need MEMW which could act then as the well known R/W pin.


The opcode circuit and the Instruction Decoder

The Instruction Decoder (ID) is the heart of the processor because it is the ID that decides how the processor behaves. The heart of the ID is in this case a number of FlashRAMs and those will contain the so called micro code.

What inputs do the FlashRAMs need?
- a counter
- the opcode
- a signal that tells that a reset, interrupt or other important signal has been activated
- a branch signal

If you have a look at the schematic, you will notice the weird order in which the various signals have been connected to the FlashRAMs. The reason is quite simple: it made it for me much easier to design the PCB. I'm working on a program to fill the FlashRAMs with data and it was much easier to program a small subroutine that de-scrambled the various signals than connecting all signals on the PCB to the FlashRAMs in such a way that the schematic looked good.


The counter circuit

Executing the opcode is executing a collection of several micro instructions. Some of these micro instructions can be executed parallel, some have to be executed in a certain order and at the right step. The counter tells the ID what step has to be executed.
The base signal for the counter is the clock signal provided by the processor port, in this case PHI0. Because PHI0 must serve as input for many other gates in my design and not being sure if this will stress the original system, I decided to buffer it first with IC25D, a left over AND gate. The buffered PHI0 is called PHI0'.
The 6502 output PHI1, nothing more than an inverted PHI0, is created by IC24F. The PHI2 needed for this card and the motherboard is generated by inverter IC24B plus two AND gates (see later).

The actual counter is IC57, a 393 dual 4-bits binary counter with clear. We only use five of the eight outputs. So together with CLK0 (= PHI2 after the OR gate), we now have a 6-bits counter where CLK0 is the Lowest Significant Bit (LSB). These six bits, good for 31 clock cycles/63 steps, are directly fed to the FlashRAMs. I know this is an excessive number of steps but better be sure than sorrow later on. (And the parts were available, so why not use them?)
The outputs of the 393, CLK1..5, are also fed to a 5-input NOR gate, a gate created out of two 32 OR gates, IC53D and IC53C, and a 27 3-input NOR gate (IC11C).
At the end of an instruction the 393 is cleared and all its output become (L). This causes the output of this 5-input NOR gate to become (H). This signal is ANDed with PHI0' using AND gate IC25B. Its output is used as trigger to latch some data into two 573s, IC58 and IC12 (see later).

Resetting the counter
The idea is that two things have to be done at the last cycle of an instruction:
- prepare the board to read the opcode of the next instruction.
- reset the counter.
The last is done by setting ID output R1 (L). Its inverted (NAND gate IC10D) signal will activate the 393's CLR input. But the Reset signal must be able to reset the counter as well. therefore the Reset signal is fed to the other input of the already mentioned NAND gate, IC10D. The effect is that an active Reset keeps the 393 in reset mode; it will output 00000 as long as the reset signal is active.
Once the counter has been resetted by R1, the clear signal must be disabled again. The ID pulls R1 (L) at the last even step of an instruction. Then we will be sure that resetting the 393s will result in the fact that all CLKx signals will be (L) = step zero. All we have to do now is to program the FlashRAM so that R1 becomes (H) at step zero and the 393s can resume counting again.


Latching the opcode: the Instruction Register

IC58, a 573 8 bits latch, is the so called Instruction Register (IR): it latches the opcode which is present at step 00001. The Instruction Register makes sure that the opcode is available for the other steps as well. The signal to clock the data into the 573 comes from the AND gate IC25B, as already mentioned above.
Two 7-segment LED displays, DIS3 and DIS4, show the momentary opcode.


Latching SO, Reset, NMI and IRQ

My very first idea was to feed these signals directly to the inputs of the FlashRAMs. But when I started to design TTL6502, FlashRAMs were very expensive and I had to use EPROMs. But the reasonable priced EPROMs had too less inputs, thus another idea was needed.
Then it occurred to me that during a reset or interrupt, the opcode part of the ID wasn't used. So I decided to feed the FlashRAMs with these signals instead of the opcode. Now I only needed one signal the tell the ID whether it was dealing with an opcode or a reset, interrupt or equivalent signal: RSNI. IC12, a 573, latches Reset, SO, IRQ and NMI and feeds them to the ID when needed.

NMI
NMI, a negative edge triggered interrupt, is inverted first (IC24D) and fed to the positive edge triggered CLK input of a 74 D-flipflop (IC55A). Why not tying NMI directly to PRE and saving a gate? At the end of the process the ID has to reset the flipflop. Using PRE as input would immediately set the flipflop again and thus forcing the decoder to repeat the whole process.
After handling NMI, the flipflop is resetted by R2.

IRQ
IRQ is a level triggered interrupt and it is only checked at the end of PHI0. IRQ can also be disabled. NOR gate (IC11A) serves all these demands in one go. If the IRQ is active (L) at the end of PHI0 and the "disable interrupt" flag is inactive, read: (L), as well, then the moment PHI0' becomes (L) the rising edge of the output of the NOR gate will trigger the D-flipflop U14A.
After handling IRQ, the flipflop is resetted by R3.

SO
SO is a pin that sets the Overflow flag when activated. Our Flag Register is nothing more than a register inside a static RAM and directly setting a bit is out of the question. The nearest I could think of is treating it like an interrupts and giving it the highest priority.
I can be very short about the circuit: SO is treated the same way NMI is. IC24E is the inverter, IC36B the used flipflop and R0 the reset signal.

Reset
Instead of a 74- D-flipflop I use a flipflop created out of two NAND gates, IC10A and IC10B. After handling Reset, the flipflop is resetted by R0. Yes, this is the same signal as for resetting the SO flipflop. The idea behind it is simple: when resetting the processor the status of any other signal doesn't matter at all, so why not using an existing signal? For the same money I could also have used the one for NMI or IRQ.

The further processing of SO, Reset, IRQ and NMI
The four outputs of the flipflops are latched by a 573 (IC12) at step 00001 and NORed by IC53B and IC11b. If one or more of these signal are set = (H), IC11B's output becomes (L). The output of AND gate IC25B that is used to clock the data into IC12 and IC58, is also used to clock D-flipflop IC14B. The flipflop latches the output of the NOR gate.
The outputs of this flipflop now either enable the outputs of IC58 (= opcode) or those of IC12. The Q output of the flipflop, RSNI, is the signal that tells the ID whether a special signal is detected or not.

Why is D-flipflop IC14B needed? The output of IC11B can change in the middle of an instruction and therefore has to be preserved as well. We cannot use a free pin of IC12 because this output of the latch wouldn't be available all the time so we have to use a separate latch; IC14B in this case.
The flipflop does not have to be resetted: the moment all special signals have been handled, the D input becomes (H) at step 000x and the latch for the opcodes is selected.


SYNC
SYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. Just by coincidence the output of IC11D is exactly what we need.


M1 signal
M1 is the Z80 equivalent of SYNC. But in contrary to SYNC is M1 active (L). M1 is generated by inverting SYNC, using inverter IC24A.


ReaDY signal
RDY is a signal to tell the 6502 to halt as long as RDY is (L). The outputs of the 393 are increased at every falling edge of PHI0'. The basic idea is that RDY prevents PHI0' to reach the counter. Before PHI0' is fed to the 393, it goes through an OR gate, IC53A. This gate ORs PHI2 with the \Q output of IC55B, a 74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1. If RDY is (L), output \Q becomes (H) and will block all pulses from PHI2 towards the 393 and ID by keeping CLK0 (H).
Maybe you noticed these AND gates, IC25A and IC25C, that seem to have no function. IMHO there was a risk that the blocking signal coming from D-flipflop IC55B would arrive too late and the 393 would still count up. I solved this, at least I hoped so, by delaying the PHI2 signal towards the OR gate a bit by using these AND gates.
As you can see, the CLR input of the D-flipflop has been connected to control line I30. The moment I30 is negated, output /Q is pulled (H). In combination with OR gate IC53A this will block PHI2 towards the 393 counter and Program Counter and thus will stopping them. See it as an equivalent of the 80x86 instruction HLT (= HaLT) or even better, the 65816 instruction STP (= SToP).
FYI: I only added this feature for the simple reason that I30 was left over. But in contrary to an 80x86 that can be awaken again by an interrupt, this processor cannot for the simple reason this CPU doesn't have the hardware to release the flipflop again.


HALT signal
Z80's HALT signal tells the system that the HALT opcode has been executed and that the processor has been halted. In this case the Q-output of the RDY flipflop, IC55BB, generates HALT.


The address bus buffers

Three 573 latches, IC18, IC19 and IC20, buffer the address lines A0..A23 toward the processor port. No special explanation needed, IMHO. But I found out that this processor design may have a flaw that could cause trouble: the address is outputted in two steps. An example: a program is running at address $121F and needs to read data from address $C00F. In the original design the address bus was changed in one single step because the final buffer was fed by another, for this task reserved buffer. Now assume there is a 6522 at address $C00x and one at $C01x. Updating the low byte address first would mean that the address $C01F appears on the bus first. That would cause register 15 to be read, unwantedly resetting some other registers. Updating the high byte address first would mean that the address $120F is read and assuming that that is a part of the program, I don't see any harm done. But, to be honest, I'm not really convinced if this solution works out fine. Time will tell.


Afterthought regarding address lines A16..A23

The idea was simple: reserve a 573 latch for the address lines A16..A23. Problem: the moment you set an 64 KB segment, for example $12xxxx, the next instruction is fetched from this segment and not from the original one. Bummer :(

Is it a waist now? No, not really. It can be used like a kind of I/O port, just like the 6510 has. "Like" because it are only outputs. Any ideas about a good use for it are welcome!


The data bus buffers

Two 573s, IC16 and IC17, take care of the data bus by latching and buffering the data coming from and going to the internal data bus.
- IC17 takes care of reading and latching the data coming from outside. Why is latching needed? That is to make sure that the data is also available to the processor after PHI0 has become (L). The clocking is done by PHI0', the buffered PHI0 signal.
- IC16, takes care of latching the internal data and presenting it to the outside world. Both the latching and enabling is done by the ID.
Two 7-segment LED displays, DIS1 and DIS2, show the content of the data bus at all times.


Registers

Every processor has several registers. The Program Counter of a 6502 and Z80 is 16 bits wide, as well as the Stack Pointer of the Z80. The one of the 6502 is only 8 bits wide. The 6502 has only six registers, the Z80 twelve 16-bits and four 8-bits ones. So the use of 74ALS573s to create the registers would blow my mini design out of proportion. And as already mentioned, more practical was to use a standard memory IC. The smallest 8-bits one I know of is the already mentioned 6116, IC21, a 2 KB SRAM. To make sure that there is enough room for simulating a Z80, I reserved 64 bytes of RAM in the first place. It was only after producing my first PCBs that I realized that I could use two still free outputs, I30 and I31, for having even more internal registers: 256 bytes.
Using a single RAM IC has one disadvantage: internal memory transfers like TAX will take at least two steps: one step to copy the byte from the location inside the RAM dedicated to register A into one of the ALU buffers and one step to copy if from the ALU into the location inside the RAM dedicated to register X. But you will find out that this delay is peanuts compared to having no Program Counter.


The ALU

The ALU, short for Arithmetic Logic Unit, is the calculator of the processor. But, in this case, one with extended functions. The 74181 is a real ALU IC and is used by me in My TTL CPU. The problem is that the ALU of the 6502 also has a so called BCD mode (Binary Conversion to Decimal): it is able to calculate in decimal mode. And so far I haven't seen any ALU IC capable of doing that. I have thought about using 181s plus an extra circuit for the BCD mode but that would expand the design too much. So I decided to use FlashRAMs instead. The idea is just to program every possible situation into the FlashRAMs.

The needed inputs for the ALU FlashRAMs
As you can see I use two cascaded FlashRAMs, IC42 and IC43, each handling four data bits. Handling eight bits would mean I would need a FlashRAM with at least 24 inputs but I don't know of one that can work with +5 Volt. Cascading two FlashRAMs and each handling just four bits has the same result.

The FlashRAMs need to be able to handle at least the next commands:
- ADC
- AND
- BIT
- ASL
- CMP
- DEC
- EOR
- INC
- LSR
- ORA
- ROL
- ROR
- SBC
Four selection bits will cover the above 12 commands.

An extra bit is needed to deal with the decimal mode. It has to be an extra bit because it is an external input coming from the Flag Register.
After every operation the processor may want to know if the result was zero. In this case the second FlashRAM first has to know whether the result of the first FlashRAM was zero or not.
In case of an addition, subtraction, rotation or a shift to the left, the second FlashRAM needs the Carry of the first FlashRAM and the first one needs the Carry of the Flag Register. In case of a rotation or shift to the right, the first FlashRAM needs the Carry of the second FlashRAM and the second one needs the Carry of the Flag Register. This means two inputs for two different Carrys for both the FlashRAMs.

The result (for the moment): - Zero flag
- Carry flag (2*)
- Decimal mode flag
- 4 bits 1st operand
- 4 bits 2nd operand
- 4 command bits

A 27512 EPROM would be sufficient but I decided to use AM29F020s anyway for three reasons:
- they are faster than EPROMs
- they are much faster to reprogram than EPROMs
- because of the two extra inputs I have room for extra functions


Static addresses

After a reset, in case of a 6502 the address bus has to output the addresses $FFFC and $FFFD. And when serving an interrupt or accessing the stack, other static addresses have to be outputted. What circuit is going to take care of that all? I decided to let the ALU perform this function as well. And now the two extra inputs of the 29F020s are more than welcome!


Increasing an address

As mentioned before, this design does not have a Program Counter. In this design the ALU does ALL the calculations. If the Program counter needs to be incremented, the ALU will do that. Most of the time only the low byte of the address needs to be increased. But if a page boundary is crossed, that is the low byte goes from $FF to $00, the ALU will generate a Carry. This will be a sign for the ID to increment the high byte of the address as well. We have to notify the ID that in one or another way but we cannot use the Flag Register as this isn't a regular Carry. A 74LS74 D-flipflop, IC36, stores the state of this Carry and a 151 multiplexer, IC31, sends it on request to the Instruction Decoder.
The ALU does also all the calculations in case of a branch. But in this case a branch can be either positive or negative. In case of a negative branch, the high byte of the Program Counter may have to be decreased. Again the ID has to be informed of this. The 151 multiplexer does this by selecting the MSB of the second operand as BRAD signal.


The Flag Register

In case of a conditional branch, the processor needs the data of the Flag Register to decide whether it has to make a jump, or not. But how is the Instruction Decoder notified of the state of each flag? When needed, the contents of the Flag Register inside the SRAM is copied to a 74ALS573 data latch, IC23. The processor only needs to know the state of only one flag at a time. The 74LS151 8-to-1 multiplexer, IC31, takes care of that by selecting the needed flag. It outputs the state of this flag as the signal BRAD (BRAnch Data) towards the Instruction Decoder.

The 74ALS573 is not only needed for branching but it also makes sure that the state of the Interrupt Disable bit is outputted all the time.


Testing the card

One question to myself was: "How do I test the processor?". I just happened to have some PC ISA cards with four 8255s on them. These four 8255s are good for 96 input and outputs, presented to the outside world by two 50-pins headers. In my case I only need one of these headers:


But I wanted other people also being able to test the card so another solution was needed: an Arduino ATmega2560. This quite inexpensive little computer only needs a shield, some wires and a 64-pins connector. And a program of course. But that is in devellopment yet.


The outputs of the Instruction Decoder

    8088   6502    Z80
I0: IORD     -     IORQ
I1: IOWR     -     MEMR
I2: MEMR     -     RD
I3: MEMW    R/W    WR
I4: C-input 74LS138 / IC04
I5: \
I6:  - select reason for branch, 151 / IC31
I7: /

I8: A-input 74LS138 / IC04
I9: B-input 74LS138 / IC04
I10..15: select function for ALU, 29F020 / IC32 and IC33

I16..21: select register in SRAM, 6116 / IC21
I22: write enable SRAM, 6116 / IC21
I23: select SRAM, 6116 / IC21

I24..26: select Cx from IC13 / 138
I27: enable IC13 / 138
I28: clock output ALU into IC05 (inverted) PLUS
     clock flags ALU into IC09 (inverted)
I29: enable output data towards data bus, 573 / IC16
I30..31: select register in SRAM, 6116 / IC21


inverted: !!!
C0: clock data into address buffer A0..7 573 / IC18
C1: clock data into data bus buffer, 573 / IC16
C2: clock data into hardware flag register 573 / IC23
C3: clock data into address buffer A8..15 573 / IC19
C4: clock Carry into D-flipflop, 74 / IC36A
C5: clock data into address buffer A16..23 573 / IC20
C6: clock data into ALU buffer A, 573 / IC27
C7: clock data into ALU buffer B, 573 / IC28


Outputs of 74LS138 / IC04
R0: reset Reset (IC10A and IC10B) and SO flipflop, 74 / IC36B
R1: reset 393 counters
R2: reset NMI flipflop, 74 / IC55A
R3: reset IRQ flipflop, 74 / IC14A
R4: read flags from ALU, 573 / IC09
R5: read result from ALU, 573 / IC05
R6: read data from data bus, 573 / IC17
R7: rest position
Notice that IC10C, an NAND gate used as an inverter, is placed between I28 and the Clock inputs of IC05 and IC09. The reason is that when an output is held (L), positive glitches can occur when the address is changed. Such a glitch can cause the 573 latches to load data unwantingly. When I28 is held positive, nothing will happen but the inverter makes sure that no data is loaded. When I28 is (L), a positive glitch won't affect the loading of the data.


Remarks

I already mentioned that I wanted to use the processor in combination with a PC-XT motherboard. IMHO it could be done by replacing the 8088 itself by the processor. But it would need some additional hardware between the motherboard and the TTL6502, build your own 6502. One reason why the extra hardware is needed: the first 8 bits of the address bus and the data bus have been multiplexed. One idea is to remove various ICs like the 8288 and various latches and let the processor control the lines directly, for example the MEMx and IOxx lines.

Hmmmm, afterthought: do we need the extra hardware at all? The 8088 outputs the data and address at different clock cycli. Maybe it is possible to use the outgoing data latch, IC17, for the data as well as the address bits A0..7. Then what about the signals S0..5? We will see.....


Afterword

I think that it is clear that the ALU does most of the work. Not just the calculations like those of a calculator but also thing like increasing the Program Counter and in- or decreasing the Stack Pointer.
Did I oversimplify the things too much? No, I didn't. It is in fact the way the first computers worked. I read somewhere that some Americans managed to create a working computer in the early 50's using 'only' 3000 radio tubes. That was only possible using circuits over and over again within the same design.
Then what about speed? The only 'computers' that were around these days were humans with mechanical calculators. The ENIAC, the first American computer, was used to calculate artillery trajectories. It took some humans several hours to calculate one trajectory, something the ENIAC did in 20 seconds. And compared to this computer, this little processor is far better (I think).





Having questions or comment? You want more information?
You can email me here.