My TTL CPU

Last update: 2023-04-13

What is it?

This is a 32-bit processor made of only TTL ICs. I'm busy building three different TTL CPUs and this is the smallest one. The others:
- Mini6502, the middle one
- Big6502, the biggest one
All three projects are still under construction.....

Some history: the first processors

If you are Dutch, you should be familiar with a "Draaiorgel" / "Barrel organ". The music of the bigger barrel organs is directed by its Draaiorgelboek / Book music. The holes in the paper of the rolls make the various instruments to produce their sounds. There are barrel organs having more than 100 instruments on board.

But what have barrel organs to do with processors? The various registers, counters, adders and whatever else you find inside a processor can be considered as its instruments and the program, more or less, as its book music. ("More or less" because modern processors have an "Instruction decoder", see later)
The fact is, the first computers operated this way: the bits of every byte of a program directly manipulated the various registers, counters etc. The more registers, counters or whatever such a computer had, the more bits a byte had. Zuse's Z3, the first operational Turing-complete computer in the world, was a 22 bitter. Its successor, the Z4, was 32 bits.

Remark: nowadays we are used to the fact that bytes are only 8 bits. But in the early computer days a byte could hold any number of bits. Due to the overwhelming present of 8-bitters from the 1970s on, byte became a synonym for 8 bits. From now on, if I use the word "byte" here, I mean 8 bits as well.

But there is one big difference between the organ and this type of processor: each line of the book music only contains code that plays the instruments whereas each line of code for the processor that "plays" the registers etc., also includes a number of bits of data. And the designer of the computer decided how many data bits would be used.

But in time the computer guys got in trouble. Once one computer was built, people wanted already a better one. This meant more data bits, more counters, more registers, etc., etc., etc., so the bus became wider and wider. A bad side effect of improving a computer was that practically any change made it incompatible with all earlier models. More important, already developed software could not be used on the changed models. The Z4 mentioned above was an improved version of the Z3 and also ran into this problem: software developed for the Z3 had to be rewritten so it would run on the Z4.
Then some guys invented the Instruction Decoder and a lot of problems were solved. For example, it is the instruction decoder that enables an AMD 80386 to run code originally written for an Intel 8088. But that is not discussed here, see the Mini6502.

My idea

For quite some time I wanted to build my own processor but so far, the designs were too complex, mainly due to the Instruction Decoder. I had thought about building such a "Barrel organ processor" as well but I didn't like the idea of having a n-bits wide data bus because it meant that I needed a lot of parallel ROMs and RAMs to be able to run any program.
Then I had a brain wave: instead of reading n bits parallel, I would read them one by one bytewise (= 8 bits) from memory, store each byte in a latch and once all bytes had been loaded, I would activate the latches and all needed functions would be performed at that moment. The big advantage: just one ROM needed for (in this case) a 24 bits instruction code.

But now the weird fact: how many bits is this processor? The actual data it can handle is only 8 bits. But the size of its opcode is 24 bits. Then I considered this: if I had designed like the Z3, I would have needed a byte of 32 bits. So, a 32-bitter it is.

What I said above about changing/improving this type of processor is also valid for this one. Once I have built it, I have to stick with it.

What processor will it emulate?

The answer is simple: none. To be more precise: it is a new processor with its own opcode set. By the way, do you know a processor only using opcodes of 24 bits? Again: this has to do with the strong relation between the bytes to be loaded and the hardware of the TTL-CPU. In this case I have to load 24 bits to tell the TTL-CPU what to do where a lot of processors, like the 6502, could do with just eight bits. And even then it will probably do only a part of what the 6502 can do. So, in short: this will be a brand new, one of a kind, processor in this world.

Version 1 and 2

At this moment I am at version 4, which is the production version. Version 1, in fact two versions: one with and one without on board RAM, contained errors. Version 2 is an improved version 1 without RAM and is described here in the first place. Version 3 is version 2 plus some add-ons: RAM, ROM and I/O. It is described later. In version 4 a GAL replaces some glue logic.

Picture (of version 1)

The schematics

This schematic is not a complete jump into cold water, I will use ideas gathered when designing previous TTL-CPUs.

The processor port: the interface to the outside world
The idea is to use a female 64 pins AC DIN connector as port to the outside world, one that is to be used for all my TTL-CPU boards. A card with a male DIN connector is attached to the TTL-CPU. This card should contain at least a connector to connect the board to the target system. In some cases some extra hardware is needed.
Remark: being a 6502 fan, the comment and design is a bit 6502 flavored.

An explanation of the various pins:
- A0..23: 24 address lines but only 16 are used in this case. These 16 address lines enable the TTL-CPU to address up to 64 KB of RAM, ROM or I/O. I planned to use all lines but being only human, I made a mistake and had to reduce the number to 16 lines.
- D0..D7: the eight data lines
- AEC is meant for tri-stating the various busses. Not used here.
- NMI, IRQ: To keep the design simple, I don't use these signals. But to be honest, I also had no idea (yet) how to implement them in this design.
- RDY, RESET will be used, see later.
- SO, a typical 6502 signal, won't be used here either.
- SYNC is also 6502 signal and will be used here to mark the actual execution of an instruction.
- HALT, M1: the well-known Z80 signals, not used here.
- PHI0, PHI1 and PHI2: the well-known 6502 clock signals.
- IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognize the names of these lines; they are used to control memory and I/O operations. But forget about these names, only the names I0..I3 are used and of those signals only I1..I3 are used.

PHI0, PHI1 and PHI2
The three clock signals as used by the 6502. I need these signals inside the circuit anyway, so I connected them to the processor port. PHI1 is generated by inverting PHI0 using IC35A and PHI2 is generated by inverting PHI1 using IC24B.

Reset
The Reset signal, active (L), is used for two tasks:
- resetting the Program Counter (four 161 4-bits counters)
- resetting a 74LS393 counter, IC25A.
The last is done using an AND gate, IC23D, and an inverter, IC24D. The inverter creates the needed active (H) CLR signal for the 393.

SYNC
SYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. I use it to tell the outside world that the instruction is executed at that moment. And to be honest, I didn't need to create it, it just happened to be there (more or less).

ReaDY signal
RDY is a signal to tell the 6502, and in this case the TTL-CPU, to halt all activities as long as RDY is (L). The basic idea is that an active RDY prevents PHI2 to reach the 393 counter and Program Counter and so will stop our TTL-CPU doing anything.
RDY is fed to the D-input of IC36B, a 74LS74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1 = falling edge of PHI0. If RDY is (L), output /Q becomes (H). /Q is fed to OR gate, IC34D, (through an AND gate, IC23A, see later) together with PHI2. Its output is fed to the clock input of IC25A, a 393 4-bits counter. The outputs of the 393 are increased at every falling edge of PHI2. The moment the second input of the OR gate becomes (H) because of the flipflop, the OR gate seizes to send the pulses of PHI2 to the 393 counter and the processor stops.

As you can see, the CLR input of the D-flipflop has been connected to control line I16. The moment I16 is negated, output /Q is pulled (H) and this will stop the 393. See it as an equivalent of the 80x86 instruction HLT (= HaLT) or even better, the 65816 instruction STP (= SToP).
FYI: I only added this feature for the simple reason that I16 was left over. But in contrary to an 80x86 that can be awaken again by an interrupt, this processor cannot for the simple reason this CPU doesn't have any means to release the flipflop again. In this case halt is really HALT.

After I finished the first schematic, I got another idea. When connecting the board to my Debugger, in single step mode my CPU will stop at every odd step. So the question rose if I could stop it only during the execution step, thus skipping the first six steps. IMHO it only needed one AND gate (IC23A) and that just happened to be left over. It is placed between the RDY flipflop and IC34D, the OR gate. The second input is connected to SYNC.
The function of the AND gate is simple: it blocks the /Q signal coming from the flipflop towards OR gate IC34D during all steps except step 0101 and 0111 when SYNC is (H). In case you want to see all steps, just open jumper J1 and my CPU will stop at every step again.

The Counter circuit and the instruction latches

The TTL-CPU has 24 inputs that need to be controlled. These controls can, for example, tell a latch to clock the data on its inputs or enable the outputs of a buffer. 24 inputs mean three 74ALS573 8-bit latches, the so called "code latches". They have to be loaded with data first and when that has been done, the instruction must be executed. That means I need at least four cycles. The 74LS393 4-bits counter mentioned above, IC25A, takes care of this.
As said before, the 393 is clocked by PHI2. PHI2 and three outputs of the 393 are fed to IC21, a 154 4-to-16 demultiplexer. The outputs of the 154 represent the various steps in the process. One idea was to read a byte at every step i.e.at every half cycle of PHI2 but then I realized that would not work because it would mean that PHI2 had to be connected to the address lines of the ROM in one or another way. So I only used every odd step, thus when PHI2 is (H).
At step 0001 the clock input of IC01, a 573 latch, is activated and the byte read by IC17 is stored inside IC01. At step 0011 and 0101 this is done as well for IC02 and IC06.
A 7-segment LED display with internal decoder, DIS3, is used to make the cycles visible.

During the first three cycles the three instruction 573s are tri-stated and all outputs are kept (H) by pull-up resistors. The idea behind this is to make sure that all controls are in a neutral state. An example: assume that an instruction took care of writing data in one of the ALU buffers. To disable the clock signal for this latch after the instruction, a byte has to be read and to be stored into the according instruction latch. But as the ALU latch is still open, this byte will overwrite the one just written into the ALU buffer as well! So by disabling all outputs we make sure that nothing can be changed, overwritten or outputted by accident.

An exception is the output of IC17, the 573 latch that outputs the data coming from the processor port into the internal data bus. An inverter, IC35F, inverts control signal I4 and in this way takes care of negating IC17's OC input, thus allowing the data to reach the instruction latches.
If the instruction is to read data from the outside world, bit 4 of the first opcode byte has to be set (H) so that during the actual execution, IC17 keeps on transferring data from the outside world into the processor.

At step 0110 and 0111 all the outputs of the instruction latches are enabled and, on their turn, activate the needed controls. For example, this can be reading the content on the data bus and this content is written into buffer A of the ALU. Notice that two steps are involved (using AND gate IC23B to combine them) so the actual output mimics the behavior of a 6502. For example, in case of a 6522 VIA the address must be present before the rising edge of PHI2.
This combined signal is also be used to create SYNC, after it has been inverted by IC24F.
If the program wants to read data from the bus, this data is only valid during step 0111. This data, coming from RAM, ROM or I/O, is clocked into IC17 when PHI0 is (H). And that is during step 0111. During step 0110 the third byte of the opcode is placed on the internal bus by IC17. Can this jeopardise things? I don't think so because, if data, that is read from the external bus, needs to be stored, it can only be done at the end of step 0111. And then the real data is present.

At step 1000 the output of pin 9 is fed to the second input of AND gate IC23D and inverted by IC24D so it can reset the 393 counter. This causes the 393 to go to step 0000 at that moment which on its turn will automatically pull pin 9 (H) and thus will release the reset of the 393 counter.

The use of the various latches

The outputs perform, or control, various function:
- Reading data from various sources.
- Controlling the selection of various functions of the (ALU). - Writing data to various latches.

In the first case I have only two sources that can be read, in this case only data from the ALU and data coming from the outside world. Otherwise I could have used a demultiplexer like the 74LS138 or 139 to save control lines as I only can read one source at the time anyway.

In the third case I have eight registers I can write to. The whole design would need 29 control lines which on its turn would mean originally that I needed at least four 573 latches and four clock cycles for reading an instruction. But eight of those control lines write to a 573 latch. Using a 138 3-to-8 demultiplexer would reduce the number of control lines to 24, thus needing exactly three 573 latches.
But a problem is that the outputs of a 138 are active (L) and the clock inputs of the 573 are active (H) so inverters are needed. That's why IC07, an 8-bits inverting 540 buffer is needed. Hey, but this means an extra 20-pins IC, then why not using an extra 573?
No. Remember that during the first four steps the 573s will be disabled and their outputs will be pulled (H) by resistors? So the inverters would be needed anyway. OK, I have thought about using resistors to pull the disabled outputs (L) but I don't have good experiences with this method: slow rising edges. And don't forget the extra step that is needed with an extra 573; this will certainly slow down the system by 25%.

Only seven latches are clocked by the 138, the one for latching the output of the data has its own inverted control line, I5. There are two reasons for this construction:
- During the first three steps all control lines are (H) and therefore the 138 would activate output Y7. In version 1 control line I5 disabled the 138, thus disabling output Y7 as well. In version 1 Y7, after inversion, lead to the ALU output latch.
- When working on the opcodes, I found out that when wanting to store data directly into the ALU output latch, I first needed it to store in, for example, the A register and then I would need another instruction for copying the data from the A register through the ALU into the ALU output latch. Using I5 in this way I can clock both the clock input of register A or B, C2 or C4, and the one of the ALU output latch, C7, at the same time.

The Program Counter

My first idea was to use the same program counter as the one used in Build your own Mini6502: the address is kept in the SRAM, copied to the ALU, incremented and outputted to the address bus and saved back to the SRAM at the same time. But the instruction bytes needed to perform these various steps using this CPU have to be read from the ROM which means that the program counter has to be increased every step to be able to read these bytes: a kind of chicken-and-egg problem. Conclusion: hardware is a must.
So I decided to use an automated counter based on the one of Build your own 6502. But instead of 191 I used 161 counters. The 161 can be reseted, the 191 can not, and this enables, or rather forces, me to start at address $0000.
The 161s will be clocked by the same clock as the 393 counter. The major error I made in version 1 was that I overlooked the fact that the 161s count at the rising edge of the clock and the whole design is based on the fact that I expected the Program Counter to count at the falling edge of the clock. So I had to shuffle things to free an inverter, IC35C in this case.

In case of a jump, two 573 latches, IC08 and IC14, have to be filled with the new address. By negating the /LOAD input of the four 161s the new address is copied into them. How this signal for /LOAD is generated will be explained later.

In this design the Program Counter counts up at every clock cycle, thus also at the fourth cycle. Remember, this is the cycle where the actual action takes place like reading a byte from somewhere in RAM or setting an I/O register. So if we are dealing with a non-operand instruction where this byte is not used at all, then this byte can become waisted. "Can", because this fourth byte can be accessed in an indirect way so it is not a complete loss. But it will need some creative programming, see later.

The 161s cannot be tri-stated so two 541 buffers, IC22 and IC31 take care of that. Control signal I7 takes care of en- or disabling the 541s. When the code latches are filled with data, their outputs are disabled and all control signals are (H). To make sure that these 541 buffers are enabled, I7 is inverted first.

The temporary address lines A0..15

Thinking things over I soon found out that I could not use the Program Counter for temporary accesses to the memory or I/O. The reason is simple: setting the Program Counter for reading a byte means that it will continue the program at the address after the one needed for accessing the memory or I/O for the simple reason that I have no means to restore the Program Counter automatically to the original address immediately after that action. So I have to use some buffers that will contain the needed address for that moment: IC19 and IC20. Control line I7 can be used here as well and it selects whether the temporary address buffer (not inverted) or the Program Counter (inverted) is active.

The ALU

For the ALU I decided to use two 74181s, the world's most well-known ALU IC. I was tempted to use EEPROMs here as well but then it wouldn't be an aal TTL CPU.
The data needed as inputs for the ALU will be stored in two 573 latches, IC27 and IC28, first. The advantage of this design is that the flag information from the ALU stays available when the data on the data bus has been changed by other operations. IC03, a 573 latch, takes care of storing and outputting the data created by the ALU towards the internal data bus.
If needed, these three latches can be used as temporary internal registers.

The Flags and the use of them

So far I will only use three flags: Carry, Zero and Minus. The 181 outputs a zero flag but this one is only valid for four bits. OR gate IC34C takes care of combining the signals of both 181s to create a zero flag valid for a byte.
Both 181s also output a Carry flag. The one coming from IC26, the first 181, is fed into IC29, the second 181. The one coming from IC29 is fed into IC36A, a 74 D-flipflop. The Q output of the D-flipflop is fed into the not-Carry input of the first 181, IC26. IC36A, 74 D-flipflop, has two functions:
- It enables the CPU to remember earlier states of Carry. Useful for ADC (ADd with Carry) equivalent instructions.
- A program can set (I10) or reset (I8) the Carry on demand
Remark: the 181 uses an inverted Carry for whatever reason and I will keep it that way in this CPU. So in the hardware used here, an active Carry is LOW. So that's why I10, although it resets output Q of the flipflop, it sets Carry.
If the Carry from the ALU has to be clocked into IC36A, signal I9 has to be set (L). At step 1000 all outputs of the instruction latches are tri-stated. The pull-up resistors pull the various Ixx signals (H), including I9. This causes the D-flipflop to latch the bit at the D input. Because ALL Ixx signals are pulled (H), this also means that function inputs of the ALU ICs are pulled (H) and this can change the level of the Carry at the D-input of IC36A. To make sure that the Carry is clocked before a possible change, I added an extra resistor, R2, to line I9 to make sure that the rising flank is steeper. I also count a bit on the internal delays of the ALU ICs.

The minus flag is derived from bit 7 of the output of the 181s = pin 13 of IC29.

The flags can be used for conditional jumps or branches. The advantage of branches is that an executable can be relocated within the memory which is impossible with an executable using jumps. But using branches means that the new address has to be calculated in real time, something that is built-in in a 6502 but will cost a lot of instructions for this CPU. It will be up to the programmer to decide what to use.

How to select the flag needed for the condition? IC09, a 74LS153 4-to-1 multiplexer enables one to choose from four flags:
- bit 7 from the output of the ALU as the Minus flag
- the output of OR gate IC34C as the Zero flag
- the Carry output of ALU IC29
- the Q output of D-flipflop IC36A as the Carry flag
Controls I20 and I21 determine what flag is chosen.

The next step is to feed the signal of output Y of the 153 into OR gate IC34A and, through the inverter IC24C, into OR gate IC34BC. The output of these two OR gates is fed into an AND gate, IC23C. The output of this AND gate is connected to the /LOAD inputs of the 161s mentioned above.

I22 and I23 control the behavior of the two OR gates. If both controls are (H), both outputs of the OR gates will be (H) as well. Therefore the output of the AND gate and the /LOAD inputs will also be (H) as well. This means no active /LOAD and the 161s just keep on counting. During the first six steps the pull-up resistors make sure these lines are (H) anyway.

When negating both I22 and I23 at least one of the outputs of the OR gates will be (L) and this will cause the AND gates to output a (L) which will on its turn cause the 161 to copy the address saved into the 573 latches IC08 and IC14. In short this means that when both controls are (L), the 161s will behave like a jump.

The two last possible situations are the ones where one of the controls is (L) and the other (H). Let's have a look at the following table:

  I23  I22  Flg  |  /LOAD
-----------------+-------------
   0    0    0   |    0   = jump
   0    0    1   |    0   = jump
                 |
   0    1    0   |    1   = count
   0    1    1   |    0   = jump
   1    0    0   |    0   = jump
   1    0    1   |    1   = count
                 |
   1    1    0   |    1   = count
   1    1    1   |    1   = count

The first two and the last two rows have been explained already. Rows 3, 4, 5 and 6 handle the case where a conditional jump is needed. In words:
- 'I22 = (L)' and 'I23 = (H)' handle the situation when a jump is needed when the chosen flag is not set.
- 'I22 = (H)' and 'I23 = (L)' handle the situation when a jump is needed when the chosen flag is set.

The above circuit enables us to have instruction (more or less) like the Z80's "jp z,xxxx" or "jp nc, $YYYY".

The registers

My TTL-CPU does not have internal registers like the 6502 or Z80. I did have a version that had registers in the form of a 2K*8 static RAM, in fact that was the very first version that I started with. Here the CPU registers wouldn't have a specific function: the programmer is free to dedicate a byte of the memory to a certain register. This raised the question: if the internal registers of this CPU don't have a specific function, why can't we use external RAM instead and what are the possible advantages?
- Two ICs, the SRAM and a 573 to control it, less, thus a smaller board.
- With one 573 less, I need one clock cycle less to fill it and thus the CPU becomes faster.
- This on its turn means a smaller program.
- Some people don't consider RAM as real TTL, so the removal of it will make them happy.
- Whatever I found out about the old processors, mainly Zuse, they only used "external" RAM. That is, if you can still talk about external RAM: the whole computer, including processor and memory, was mostly one big design.

Does the design indeed become faster? There is one disadvantage by removing this register RAM: instead of one execution cycle to access the RAM, I need more of them. Before I can access the external RAM, I have to load the temporal address latches IC19 and IC20 with an address first. And that will cost me eight cycles.

Reminder: don't forget that the three registers of the ALU can, more or less, be used as internal registers.

The speed of my TTL CPU

I'm quit sure it will run at 1 MHz but I cannot guarantee anything. I just have to build it and see. For the same money I overlooked something very elementary and the whole project won't run at all.

The control lines

- I0: Not used here. 
- I1..I2: for selecting the ROM, RAM and I/O
- I3: R/W line of this processor
- I4: (H) = read data from data bus through IC17 / (L) = output data to data bus through IC17
- I5: clock the output of the ALU into a 573 buffer
- I6: read output of ALU, IC05
- I7: (H) output address Program / (L) Temporary address

- I8: clear Carry, IC36a
- I9: clock Carry coming from ALU into IC36a
- I10: preset Carry, IC36a
- I11: \ 
- I12:  \
- I13:   > select function 181 ALU
- I14:  /
- I15: /

- I16: halt the CPU by blocking PHI2
- I17: \ 
- I18:  > select output 138, IC04: C0..6
- I19: / 
- I20: \
- I21: -- select condition for branch, IC09
- I22: \
- I23: -- select "jump", "branch" or "count" for the Program Counter

- C0: clock low address into temp address buffer A0..7 IC19
- C1: clock high address into temp address buffer A8..15 IC20
- C2: clock data into IC27, input buffer A for ALU
- C3: clock data into IC17 towards data bus
- C4: clock data into IC28, input buffer B for ALU
- C5: clock low-byte address into pre-load Program Counter, IC08
- C6: clock high-byte address into pre-load Program Counter, IC14
- C7: clock the output of the ALU into a 573 buffer, after inverting I5

The software

There is no software for it yet. Being a complete new and unique processor, it means I need at least to write a new assembler for it. Or in my case, another module for my "Multi Processor" assembler.

The opcodes

When designing a CPU with an Instruction Decoder, one can think: which opcodes do I want to support? But in this case, it is more or less like: what opcodes do I get? An 8-bitter can support up to 256 one-byte opcodes. This CPU is a 24 bitter and theoretically has 16 MB of opcode bytes. But most of them just do nothing or have no meaning. For example: set all control bits to one and then whatever value you give the control bits of the ALU, nothing will happen. In other words, 524.288 code bytes that simply do nothing.
The very first idea was that my assembler would only support the basic one-byte opcodes and macros had to be created for supporting multi-byte opcodes. For example, to mimic the 6502 instruction "LDA $1234" first the temporary address had to be set, and that on itself already needed two one-byte opcode, plus one byte for reading the data byte. But I changed my mind when seeing what was needed to execute a conditional jump. For the moment I will draw a line at branches: a branch means that, knowing the offset, the real address has to be calculated first and that will take IMHO too many instructions. This is a simple CPU so let us keep the instructions as simple as possible as well.
The second idea was using the instructions of the 6502 as base but having no internal registers simply means that an instruction like LDX won't make any sense. OK, there is nothing against dedicating an external address in RAM as being the register X but that idea went IMHO beyond the "simple" line. On the other hand, I will support the three registers this CPU does have: A, B and R.
The Z80 is not left out in this. The Z80 supports conditional jumps and this TTL CPU supports them as well. And the same for the HALT instruction. Conditional branches OTOH are not supported.

When programming the assembler, I made an interesting discovery. There is an instruction named AND. The only thing it does in case of this processor is ANDing the contents of ALU register A and ALU register B. This is taken care of by the two 181 ICs. But once the actual function has been performed, what should I do with the result? There is a chance that I'm only interested in a side product like the Zero Flag and not in the actual result. In other words, do I want to store the result into the ALU output register or not? The interesting point: except for one single bit the code for these two instructions is the same. So it seems that I even got more opcodes than I had in my mind. The only problem is giving them a nice and understandable name. My idea: add an "R" for "Register" at the end of the opcode where the result should be stored. For example, ANDR in case of AND.
This discovery lead to the thought that, instead of looking for what I want, I systematically have to search for the possible opcodes. I already can tell you on forehand that that lead to some surprising finds.

A good candidate to start with is the ALU because we need 5 bits for it. 5 bits mean 32 functions. But only 11 of them are usable IMHO. I personally have no idea what to with a function like "A and (not B)". I'm quite sure that occasionally there will be a need for this function but to reserve its own opcode for it? Anyway, here is the result:
A - output the data from input A ADC - add buffer B to buffer A with Carry
ADD - add buffer B to buffer A
AND - and buffer A with buffer B
B - output the data from input B CLR - load the ALU output register with a zero
CMP - subtract buffer B from buffer A
DEA - decrement buffer A
EOR - ExOR buffer A with buffer B
XOR - eXOR buffer A with buffer B (same as EOR)
INC - increment buffer A
NOTA - not/inverted buffer A
NOTB - not/inverted buffer A
OR - or buffer A with buffer B
SHL - shl buffer A
SBC - subtract buffer B from buffer A with Carry
SUB - subtract buffer B from buffer A
Remark 1: EOR and XOR are the same function. ADC/ADD and SBC/SUB/CMP use each the same function. Therefore only 14 functions instead of 17.
Remark 2: Not all functions end up in their own opcode.
Remark 3: a lot of opcodes have a "R" variant, i.e. the result is stored in the ALU output register. With CMP, for example, we are only interested in the flags and therefore have no need for storing the result. In fact, CMP and SUB are the same function but, in this case, SUBR does make sense.

IC04, an 138 3-to-8 de-multiplexer needs three control lines to generate 7 signals for clocking data into various latches. Related opcodes:

C0: Clocking data into the LB Temporary Address latch, IC10:
LAD - load address of temporary address register with immediate data
LADL - load address of temporary address register Low with immediate data
STAL - store buffer A into temporary address Low
STBL - store buffer B into temporary address Low
STRL - store buffer R into temporary address Low
C1: Clocking data into the HB Temporary Address latch, IC20:
SADH - set address of temporary address register High with immediate data
STAH - store buffer A into temporary address High
STBH - store buffer B into temporary address High
STRH - store buffer R into temporary address High
C2: Clocking data into ALU register A:
LDA - load ALU buffer A with data, immediate or from address
LDAT - load ALU buffer A with data pointed to by Temp Address Register
TBA - copy buffer B into buffer A
TRA - copy the output of the ALU into buffer A
C3: write data from the CPU to the system. IC16:
SIA - store buffer A into 4th byte
SIB - store buffer B into 4th byte
SIR - store buffer R into 4th byte
STA - store buffer A into address
STB - store buffer B into address
STR - store result = output of the ALU into an address
C4: Clocking data into ALU register B:
LDB - load ALU buffer B with data, immediate or from address
LDBT - load ALU buffer B with data pointed to by Temp Address Register
TAB - copy buffer A into buffer B
TRB - copy the output of the ALU into buffer B

CCF - Copy Carry flag from ALU into Flag register
CLC - Clear Carry flag in D-flipflop
HLT - HaLT the CPU
INA - INput from I/O port x into register A, address set by Temp Address Register
INB - INput from I/O port x into register B, address set by Temp Address Register
JCC - Jump address if Carry Clear (181 Carry)
JCS - Jump address if Carry Set (181 Carry)
JEQ - Jump address if zero / EQual
JFC - Jump address if carry Clear (74 Flipflop carry)
JFS - Jump address if carry Set (74 Flipflop carry)
JMI - Jump address if MInus
JMP - JuMP address
JNE - Jump address if not zero / Not Equal
JPL - Jump address if PLus
NOP - No OPeration
OUT - OUTput to I/O port x using byte #4 as input
OUTA - OUTput to I/O port x from register A, address set by Temp Address Register
OUTB - OUTput to I/O port x from register B, address set by Temp Address Register
SEC - SEt Carry flag in D-flipflop

Note 1: The SIx commands, storing data into the 4th byte only work when a program is running from RAM.
Note 2: SADL #$zz + SADH #$yy + LAx is the same as LDX $yyzz

It may look that there are instructions that are superfluous. STA could be achieved by the commands TAR and STR. But that would mean two instructions instead of just one. OTOH, SAD will be converted to two instructions by the assembler. That looks like it is against the idea of "to keep it simple". But have look at this code:

	sadl	#Label

Then

	sad	#Label

looks more elegant then the above one.

If you didn't notice, one well known opcode is missing from this list: JSR/CALL - execute subroutine. To be honest it was in the above list and thus to be implemented but when I started to write my assembler, I noticed that I had no means to read the content of the Program Counter and therefore had no idea what to push on the Stack. A bit of a bummer. Bad design? Neah, the very first computers didn't have subroutines either. It just means we have to be more creative with using loops in the program.
For obvious reasons RTS, PHA/PLA and equivalent opcodes are missing as well. Theoretically I could implement the PUSH and PULL opcodes but they involve so many memory movements that any possible gain of this opcode is completely lost in the number of cycles this little CPU needs to execute these opcodes. But I changed my minde before.....

A weird combination of opcodes

When the older Commodore computers with the IEEE bus have to send a byte over IEEE, they first negate this byte in software. At the side of the receiver it is negated again by software. To be honest, I never did and still don't understand the reason for it.
The above could be performed by this TTL cpu like this:

	lda	#$xx
	notar
	str	IEEE

We have the instruction LDRA which means: load register R with data that goes through register A and the ALU. In this case function "A" is used. But what if we use the function "NOTA" instead i.e. creat the instruction LDRNA? That woould save us four bytes and the according time.
But, and that is my personal opinion, it is not a logical instruction in the line of the other instructions. A bit vague, I know. But if you realy need it, create a macro or persuede me to change my mind :)

Creative programming

Using this barrel organ design can have an disadvantage: as said above, the fourth byte of an opcode is not always used. This is certainly the case with implecit instructions, instructions that don't have an operand like AND, ADD, CCF and CLR. Here the fourth byte is not used at all. But due to the four-byte cycle it is stored anyway into the program.
Can it be used in another way then? Yes, it can. I don't see any reason why the CPU is not able to read this individual byte or to write to it in case it is RAM.

How should an assembler deal with it? The assembler will output 4-byte instructions only. For the moment I only see one option, shown in the next example:

	lda	$2000
DATA:
	and
	.
	.
	.
	lda	DATA+3

Explanation: register A is loaded with the 4th byte after the label DATA. Bytes 1, 2 and 3 are the 24 bits of the actual instruction, byte 4 the byte of data.

Testing version 2

Already when thinking about this project I was sure that just plugging the TTL-CPU into a host system was not the way to test it. So I turned to a solution I already was using for over 20 years: I have some ISA cards with four 8255s laying around. The total of 96 I/O pins of these four 8255s can be accessed through two 50-pin headers. So I made a little PCB that enabled me to connect the TTL-CPU to this ISA board using a SCSI cable.

But most people won't have this mean of testing the TTL-CPU and for this and various other reasons I started to look for another way, a way that can be accessed by almost everyone and is relatively cheap. What I found was the Arduino Mega2560. It has, IMHO, enough I/O ports to test the TTL-CPU. But there is one problem: I'm not familiar with it. I can use it in two ways (I think):
- It runs a program that does all the testing and the connected PC serves only as a terminal.
- It acts, more or less, as a slave for the connected PC and only sets or reads ports on request of the PC. So the PC runs the actual test software.

The advantage of the first method: faster than the second one. The disadvantage: I really need to know thoroughly how to program the Arduino. (yes, I'm lazy)
The advantage of the second method: I already have an INO for the Arduino so it can act as a slave and I know how to program the PC. The disadvantage: probably slower than the first method. We'll see.

Version 3

In the beginning I have thought about using this TTL CPU as replacement for other CPUs, like the 6502. But giving it some thoughts in the end I decided to make it a stand-alone computer from it by adding RAM, ROM and I/O. The first idea was to add 32 KB of EPROM, RAM and some I/O. But then I realized that I could use the trick of separate I/O, as used by the Z80 and 80x86, here as well. This meant I could add I/O without the need to plunge it somewhere in the ranges meant for the RAM and ROM. And when adding I/O, why not making use of a MMU? That could solve the extra need for program ROM.
Having I/O, ROM and RAM, why would I need a connector? So I dropped it. Bad mistake.

IC18A is responsible for selecting the LCD (Y0), 6522 (Y1), MMU (Y2) and RAM/ROM (Y3). Notice that when both control lines I1 and I2 are (H), RAM or ROM is selected. That is a must because during the first three cycles both I1 and I2 are (H) anyway. Whether ROM or RAM is selected depends on the value of address line A15. IC18B takes care of the end selection.

We have a fixed window of 32 KB of ROM. The MMU, a simple resettable 8 bit latch, enables us to have four pages of 32 KB of ROM.

We also have 32 KB of RAM but the first 4 KB are common RAM. Alongside these 4 KB we also have four pages of 28 KB of RAM.

OK, this is better than 32 + 32 but I still had the feeling that it could be done even better. Wanting to test this version, I found out that I had no means to really debug it. Therefore version 4.

Version 4

The first thing I did was adding the connector again. The next action was to exchange three ICs, IC18, IC37 and IC41, for just one GAL to decrease the size of the board. When starting with the PLD equations for the GAL, I noticed quite a flaw in the original design. This lead from one thing to another and I ended up with an improved design that still used the same ICs, thanks to the GAL.
Remark: For those persons who think that I'm cheating now by using a GAL, I consider the GAL not part of the actual processor. Period.
FYI: the part right of the white DIN connector is the CPU. The part left of the connector is the I/O, RAM, ROM and MMU.

Schematic:

PCB:

Picture:

Not sure about the timing of the whole, I added I3 (= R/W) and PHI0. The GAL enables the ROM, RAM or I/O as long as the CPU performs a read. During a write, the end of PHI0 tells the involved party to store whatever data has been read.

To be able to run stand-alone at all, I added some extra hardware:
- an 1 MHz oscillator.
- the same circuit as I use for my 6502 debugger, switch S2 and the ICs IC30 and IC38, so I can step this TTL-CPU
- I added a LED and a resistor to see if the computer was halted.
- my own one Reset circuit: switch S2, R6 and C26.
- And as last: you can feed the board with 5V either using one of the 4-pin plugs of an AT power supply or a round jack plug (not added yet).

The RAM and ROM

As said before, the original idea was using 32 KB of each, but I chose for 128 KB of RAM and ROM because I had MMU in my mind. But the GAL also changed that and an explanation will follow in the MMU part.

The I/O

I only have three pieces of I/O:
- a 6522 VIA
- a LCD screen
- a MMU, Memory management unit The control lines I1 and I2 take care of selecting everything:

  I1  I2
  0   0    LCD screen
  0   1    MMU (Memory Management Unit)
  1   0    6522
  1   1    RAM/ROM

6522
The 6522 provides 20 I/O pins and some counters. The I/O pins end up into a 24-pin header and can be used for hooking up a keyboard, connecting it to a LPT port of a PC, an Arduino, or whatever.

LCD screen
I wanted some visual output and blinking LEDs are a bit to common. So, having them laying around anyway, I chose to use a LCD screen. Why not?

MMU
MMU, a fancy word for something quite simple in this case. Needing much, much more ROM than a normal CPU like the 6502 or Z80, version 1 had a 74273 on board to be used as address lines 16..23. Unfortunately, I found out that that could not work at all: for jumping to an address outside the original 64 KB range, all 24 bits have to be changed at the very same moment. And this CPU cannot handle that.
That left me in fact with an 8-bit I/O port (but only capable of output) which reminded me of the onboard I/O port of the 6510 CPU of the C64. And that I/O port is the base of the C64's memory management. I combined this with the same trick that CP/M 3 uses to address more than 64 KB of RAM: a part of RAM can be swapped with another part of RAM and a common part is always present. See later.

The base of the MMU is a 74LS273, an 8-bit resettable latch. After a reset it outputs all zeros and in the original configuration the CPU saw 32 KB of ROM starting from $0000 and 32 KB of RAM starting from $8000. This RAM is taken from the first 64 KB of RAM of the 681000.

How the I/O changed the instructions

Originally I only had the IN and OUT instructions for the I/O. But the use of I1 and I2 changed things. The 6522, LCD and MMU don't have their own address so How am I going to address them only using I1 and I2? The first idea was using the instruction "OUT 2,1,$55" to write data to port A of the 6522. But that meant I had change my multi-processor-assembler because it doesn't support multiple commas. That's not needed for the 6502, 6800, Z80 etc. Adding an address to the instruction means that I have to load the Temporary Address Registers with separate instructions anyway so I can drop it. An immediate data byte, byte #4, is possible. So the above instructions ends up in:

	sadl	#1
	sadh	#0
	out	2, #$55

Note: it can only be an immediate data so the '#' should not be needed but I prefer it to emphasize that it is Immediate data. And it could be that I change my mind later and will allow an address here.

It is also possible that the user wants to write the content of register B to port A of the VIA:

	sadl	#1
	sadh	#0
	outb	2

Seeing the above the idea rose to allow:

	outb	2,1

because only one comma is needed. But I will have to think about that.

How the GAL changed things

The GAL changed a lot: now I have at least four different ways of handling the ROM and RAM in mind and the only difference between them is the way how the GAL has been programmed.

Original idea (more or less)
- 4 KB of common RAM ($F000 - $FFFF).
- 8 KB of ROM ($0000 - $1FFF) and I can choose from 16 blocks of 8 KB each using the 273.
- The ROM and the following 8 KB of RAM can be swapped with its counterpart of 16 KB of RAM at $0000 in the second half of the 681000. - From $4000 on the other three 16 KB blocks of RAM can be swapped with its counterpart in the second half of the 681000. Exception: the 4 KB of common RAM stays.

Swapping 32 KB of RAM
The idea is that the upper 32 KB of RAM can be swapped with one of the two 32 KB blocks of the second half of the 681000.

Swapping 16 KB of RAM
The idea is that the upper 16 KB of RAM can be swapped with one of the four 16 KB blocks of the second half of the 681000.

Swapping 8 KB of RAM
The idea is that the upper 8 KB of RAM can be swapped with one of the eight 8 KB blocks of the second half of the 681000. A variation on this idea: use the 8 KB of RAM just above the 8 KB of ROM.

Adding more I/O, RAM and ROM

Is it possible to add more I/O, RAM and ROM? Seen from an hardware point of view, all is possible, without a question. Seen from a software point of view, it becomes more difficult.

The first idea: why should I use only I1 and I2 to handle the I/O? The idea rose to use only I2 to mark it as I/O and to use the address lines A15 and A14 to tell what piece of I/O we want to address. This idea expanded into using I1, I2 and the four address lines A12..15 to select what ever we want. And very important: whitout changing one bit of hardware.
I1 and I2 give us four pages of 64 KB. And having an update in mind, adding line I0, means we will have eight pages at the end. But already remember this: only the last page can be used for running a program. When loading the three opcode byte, I0..3 are always (H). So, when numbering the pages from zero to seven, only page seven can be used for running a program. Then I'm free to use page six for the I/O. Unfortunately line I0 is not used in version 4 so we must use page four. But this needs a complete re-arrangement of the way the I/O is accessed:

  $00000  \
    ...    > free for anything
  $3FFFF  /
  $4000x  LCD screen
  $4100x  MMU (Memory Management Unit)
  $4200x  6522
  $43xxx  \
    ...    > free for anything
  $4Fxxx  /

I deliberately used page 4 so I can use this idea with version 4 without the need to solder the wire. When using version 4, page 6 will be a mirror of page 7 and page 5 will be a mirror of page 4.
So for writing immediatly a value to port A of the 6522 the code will look like this:

	sadl	#1
	sadh	#$20
	out	4,#$55

Writing the content of register B to port A of the VIA:

	sadl	#1
	sadh	#$20
	outb	4

These can be abbreviated to:

	out	$42001,#$55
	outb	$42001

The last means we need to use a 20-bit address. That's no problem for my assembler.

There cannot be an "IN" equivalent for "out" but the "INx" equivalents for "OUTx" will look like this:

	sadl	#1
	sadh	#$20
	inb	4

and

	inb	$42001

Having expanded our range a multifold, almost any type and amount of hardware can be attached to the processor port. Just make sure to place it in the correct address range.

Version 5

Sofar line I0 is not used. Luckily I had one input pin free: pin 14. Using it means I have 8 pages at my disposal. An update of the schematics and the board was done within minutes. For the finished hardware of version 4 it will mean adding one line from the processor port connector to the GAL.

Having questions or comment? You want more information?
You can email me here.