My TTL CPU
Last update: 2024-07-11b
Being revised right now !!!
What is it?
This is a 32-bit processor made of only TTL ICs. I'm busy building three different TTL CPUs and this is the smallest one. The others:- Mini6502, the middle one
- Big6502, the biggest one
All three projects are still under construction.....
Some history: the first processors
If you are Dutch, you should be familiar with a "Draaiorgel" / "Barrel organ". The music of the bigger barrel organs is directed by its Draaiorgelboek / Book music. The holes in the paper of the rolls make the various instruments to produce their sounds. There are barrel organs having more than 100 instruments on board.But what have barrel organs to do with these first processors? The various registers, counters, adders and whatever else you find inside a processor can be considered as its instruments and the program as its book music.
The fact is, the first computers operated in this way: the bits of every byte of a program directly manipulated the various registers, counters etc. The more registers, counters or whatever such a computer had, the more bits a byte had. Zuse's Z3, the first operational Turing-complete computer in the world, was a 22 bitter. Its successor, the Z4, was 32 bits.
Remark: nowadays we are used to the fact that bytes are only 8 bits. But in the early computer days a byte could hold any number of bits. Due to the overwhelming present of 8-bitters from the 1970s on, byte became a synonym for 8 bits. From now on, if I use the word "byte" here, I mean 8 bits as well.
But there is one big difference between the organ and this type of processor: each line of the book music only contains code that plays the instruments whereas each line of code for the processor that "plays" the registers etc., also includes a number of bits of data. And the designer of the computer decided how many data bits would be used.
But in time the computer guys got into trouble. Once one computer was built, people wanted already a better one. In general more data bits, more counters, more registers, etc., etc., etc., and so the bus became wider and wider. A bad side effect of improving a computer was that practically any change made it incompatible with all earlier models: already developed software could not be used on the changed models. In other words, newer models were not downwards compatible. The Z4 mentioned above was an improved version of the Z3 and also ran into this problem: software developed for the Z3 had to be rewritten so it would run on the Z4.
Then one day some guys invented the Instruction Decoder and a lot of problems were solved. For example, it is the Instruction Decoder that enables an IBM 80586 to run code originally written for an Intel 8088. But that is not discussed here, see the Mini6502.
My idea
For quite some time I wanted to build my own processor but so far the designs were too complex, mainly due to the Instruction Decoder. I had thought about building such a "Barrel organ processor" as well but I didn't like the idea of having a n-bits wide data bus because it meant that I needed a lot of parallel ROMs and RAMs to be able to run any program.Then I had a brain wave: instead of reading n bits parallel, I would read them one by one bytewise (= 8 bits) from memory, store each byte in a latch and once all bytes had been loaded, I would activate the latches and all needed functions would be performed at that moment. The big advantage: just one ROM or RAM needed for (in this case) a 24 bits instruction code.
But now the weird fact: how many bits is this processor? The actual data it can handle is only 8 bits. But the size of its opcode is 24 bits. Then I considered this: if I had designed it like the Z4, I would have needed a byte of 32 bits. So, a 32-bitter it is.
What I said above about changing/improving this type of processor is also valid for this one. Once I have built it, I have to stick with it.
What processor will it emulate?
The answer is simple: none. To be more precise: it is a new processor with its own opcode set. By the way, do you know a processor only using opcodes of 24 bits? I do not. Again: this has to do with the strong relation between the bytes to be loaded and the hardware of the TTL-CPU. In this case I have to load 24 bits to tell the TTL-CPU what to do where a lot of processors, like the 6502, could do with just eight bits. And even then it will probably do only a part of what the 6502 can do. So, in short again: this will be a brand new, one of a kind, processor in this world.The versions
At this moment I am at version 4, which is the production version. Version 1, in fact two versions: one with and one without on board RAM, contained errors. Version 2 is an improved version 1 without RAM. Version 3 is version 2 plus some add-ons: RAM, ROM and I/O. In version 4 a GAL replaces some glue logic for this RAM, ROM and I/O and the external connector of version 1 and 2, removed in version 3, is added again. I decided only to describe version 4 as the older versions are more or less the same and the needed text to describe them only makes this page unnecessary larger.Picture of the board of version 1
Picture of the board of version 4
The part left of the white connector is the RAM, ROM and I/O, the part right of it is the TTL-CPU. There is hardly any difference to be seen between the original CPU and the right part of version 4. The only real visible one is that the 74LS273 at the top-right part of the first picture has become the MMU of version 4 and can be found now above the SST EEPROM.
The schematics and board of version 4
The explanation of the schematic
The processor port: the interface to the outside worldThe idea is to use a female 64 pins AC DIN connector as port to the outside world, one that is used for all my TTL-CPU boards. The original idea for version 1 was that a card with a male DIN connector is attached to the TTL-CPU and this card would contain at least a connector to connect the board to the target system. But then I decided to make a stand-alone system of it and to add RAM, ROM and I/O and to drop this connectorn (version 3). Very bad idea I found out because I discovered that I had no means to debug the whole system and so added it again in version 4.
An explanation of the various pins:
- A0..23: 24 address lines but only 16 are used in this case.
- D0..D7: the eight data lines
- AEC is meant for tri-stating the various busses. Not used here.
- NMI, IRQ: Not used here. And to be honest, I also had no idea (yet) how to implement them in this design.
- RDY, RESET will be used, see later.
- SO, a typical 6502 signal, won't be used here either.
- SYNC is also 6502 signal and will be used here to mark the actual execution of an instruction.
- HALT, M1: the well-known Z80 signals, not used here.
- PHI0, PHI1 and PHI2: the well-known 6502 clock signals and used here as well.
- IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognize the names of these lines; they are used to control memory and I/O operations. But forget about these names, only the names I0..I3 are used and of those signals only I1..I3 are used.
PHI0, PHI1 and PHI2
The three clock signals are needed and I gave them the same names as the clock signals of the 6502. PHI1 is generated by inverting PHI0 using IC35A and PHI2 is generated by inverting PHI1 using IC24B.
Reset
The Reset signal, active (L), is used for three tasks:
- resetting the Program Counter (four 161 4-bits counters)
- resseting the Memory Management Unit (MMU), IC40, a 47LS273.
- resetting a 74LS393 counter, IC25A.
The last is done using an AND gate, IC23D, and an inverter, IC24D. The inverter creates the needed active (H) CLR signal for the 393.
SYNC
SYNC is a 6502 signal that tells the outside world that the first cycle of an opcode is being processed. I use it to tell the outside world that the instruction is executed at that moment. And to be honest, I didn't need to create it, it just happened to be there (more or less).
ReaDY signal
RDY is a signal to tell the 6502, and in this case the TTL-CPU, to halt all activities as long as RDY is (L). The basic idea is that an active RDY prevents PHI2 to reach the 393 counter and Program Counter and so will stop our TTL-CPU doing anything.
RDY is fed to the D-input of IC36B, a 74LS74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1 = falling edge of PHI0. If RDY is (L), output /Q becomes (H). /Q is fed to OR gate, IC34D, (through an AND gate, IC23A, see later) together with PHI2. Its output is fed to the clock input of IC25A, a 393 4-bits counter. The outputs of the 393 are increased at every falling edge of PHI2. The moment the second input of the OR gate becomes (H) because of the flipflop, the OR gate seizes to send the pulses of PHI2 to the 393 counter and the processor stops.
As you can see, the CLR input of the D-flipflop has been connected to control line I16. The moment I16 is negated, output /Q is pulled (H) and this will stop the 393. See it as an equivalent of the 80x86 instruction HLT (= HaLT) or even better, the 65816 instruction STP (= SToP).
FYI: I only added this feature for the simple reason that I16 was left over. But in contrary to an 80x86 that can be awaken again by an interrupt, this processor cannot for the simple reason this CPU doesn't have any means except a reset to release the flipflop again. In this case halt is really HALT.
Remark: why didn't I connect I16 to the HALT pin of the connector? Reason: I simply forgot.
After I finished the first schematic, I got another idea. When connecting the board to my Debugger, in single step mode my CPU will stop at every odd step. So the question rose if I could stop it only during the execution step, thus skipping the first six steps. IMHO it only needed one AND gate (IC23A) and that just happened to be left over. It is placed between the RDY flipflop and IC34D, the OR gate. The second input is connected to SYNC.
The function of the AND gate is simple: it blocks the /Q signal coming from the flipflop towards OR gate IC34D during all steps except step 0101 and 0111 when SYNC is (H). In case you want to see all steps, just open jumper J1 and my CPU will stop at every step again.
The Counter circuit and the instruction latches
The TTL-CPU has 24 inputs that need to be controlled. These controls can, for example, tell a latch to clock the data on its inputs or enable the outputs of a buffer. 24 inputs mean three 74ALS573 8-bit latches, the so called "code latches". They have to be loaded with data first and when that has been done, the instruction must be executed. That means I need at least four cycles. The 74LS393 4-bits counter mentioned above, IC25A, takes care of this.As said before, the 393 is clocked by PHI2. PHI2 and three outputs of the 393 are fed into IC21, a 154 4-to-16 demultiplexer. The outputs of the 154 represent the various steps in the process. One idea was to read a byte at every step i.e.at every half cycle of PHI2 but then I realized that would not work because it would mean that PHI2 had to be connected to the address lines of the ROM in one or another way. So I only used every odd step, thus when PHI2 is (H).
At step 0001 the clock input of IC01, a 573 latch, is activated and the byte read by IC17 is stored inside IC01. At step 0011 and 0101 this is done as well for IC02 and IC06.
A 7-segment LED display with internal decoder, DIS3, is used to make the cycles visible.
During the first three cycles, the three instruction 573s are tri-stated and all outputs are kept (H) by pull-up resistors. The idea behind this is to make sure that all controls are in a neutral state. An example: assume that an instruction took care of writing data in one of the ALU buffers. To disable the clock signal for this latch after the instruction, a byte has to be read and to be stored into the according instruction latch. But as the ALU latch is still open, this byte will overwrite the one just written into the ALU buffer as well! So by disabling all outputs we make sure that nothing can be changed, overwritten or outputted by accident.
An exception is the output of IC17, the 573 latch that outputs the data coming from the processor port into the internal data bus. An inverter, IC35F, inverts control signal I4 and in this way takes care of negating IC17's OC input, thus allowing the data to reach the instruction latches.
If the instruction is to read data from the outside world, bit 4 of the first opcode byte has to be set (H) so that during the actual execution, IC17 keeps on transferring data from the outside world into the processor.
At step 0110 and 0111 all the outputs of the instruction latches are enabled and, on their turn, activate the needed controls. For example, this can be reading the content on the data bus and this content is written into buffer A of the ALU. Notice that two steps are involved (using AND gate IC23B to combine them) so the actual output mimics the behavior of a 6502. For example, in case of a 6522 VIA the address must be present before the rising edge of PHI2. This combined signal is also be used to create SYNC, after it has been inverted by IC24F.
If the program wants to read data from the bus, this data is only valid during step 0111. Therefore the data coming from RAM, ROM or I/O, is clocked into IC17 when PHI0 is (H) during step 0111. During step 0110 the third byte of the opcode is placed on the internal bus by IC17. Can this jeopardise things? I don't think so because, if data, that is read from the external bus, needs to be stored, it can only be done at the end of step 0111. And then the real data should be present.
At step 1000 the output of pin 9 is fed to the second input of AND gate IC23D and inverted by IC24D so it can reset the 393 counter. This causes the 393 to go to step 0000 at that moment which on its turn will automatically pull pin 9 (H) and thus will release the reset of the 393 counter.
The use of the various latches
The outputs perform, or control, various function:- Reading data from various sources.
- Controlling the selection of various functions of the (ALU). - Writing data to various latches.
In the first case I have only two sources that can be read, in this case only data coming from the ALU or data coming from the outside world. Otherwise I could have used a demultiplexer like the 74LS138 or 139 to save control lines as I only can read one source at the time anyway.
In the third case I have eight registers I can write to. The whole design would need 29 control lines which on its turn would mean originally that I needed at least four 573 latches and four clock cycles for reading an instruction. But eight of those control lines write to a 573 latch. Using a 138 3-to-8 demultiplexer would reduce the number of control lines to 24, thus needing exactly three 573 latches.
But a problem is that the outputs of a 138 are active (L) and the clock inputs of the 573 are active (H) so inverters are needed. That's why IC07, an 8-bits inverting 540 buffer is needed. Hey, but this means an extra 20-pins IC, then why not using an extra 573?
No. Remember that during the first four steps the 573s will be disabled and their outputs will be pulled (H) by resistors? So the inverters would be needed anyway. OK, I have thought about using resistors to pull the disabled outputs (L) but I don't have good experiences with this method: slow rising edges. And don't forget the extra step that is needed with an extra 573; this will certainly slow down the system by 25%.
Only seven latches are clocked by the 138, the one for latching the output of the data has its own inverted control line, I5. There are two reasons for this construction:
- During the first three steps all control lines are (H) and therefore the 138 would activate output Y7.
- When working on the opcodes, I found out that when wanting to store data directly into the ALU output latch, I first needed it to store in either the A or B register and would need another instruction for copying the data from the A or B register through the ALU into the ALU output latch. Using I5 in this way I can clock both the clock input of register A or B, C2 or C4, and the one of the ALU output latch, C7, at the same time.
The Program Counter
My first idea was to use the same program counter as the one used in Build your own Mini6502: the address is kept in the SRAM, copied to the ALU, incremented and outputted to the address bus and saved back to the SRAM at the same time. But the instruction bytes needed to perform these various steps using this CPU have to be read from the ROM which means that the program counter has to be increased every step to be able to read these bytes: a kind of chicken-and-egg problem. Conclusion: hardware is a must.So I decided to use an automated counter based on the one of Build your own 6502. But instead of 191 I used 161 counters. The 161 can be reseted, the 191 can not, and this enables, or rather forces, me to start at address $0000. The 161s will be clocked by the same clock as the 393 counter but through an inverter, IC35C in this case.
In case of a jump, two pre-load 573 latches, IC08 and IC14, have to be filled with the new address MINUS ONE! By negating the /LOAD input of the four 161s the new address is copied into them. How this signal for /LOAD is generated will be explained later. Why "minus one"? The Program Counter is loaded at step 0111 but the next clock will increase the Program Counter and as it has to point to the correct address NOW, it means the loaded address had to be this address minus one. The needed calculation has to be done by the assembler.
In this design the Program Counter counts up at every clock cycle, thus also at the fourth cycle. Remember, this is the cycle where the actual action takes place like reading a byte from somewhere in RAM or setting an I/O register. So if we are dealing with a non-operand instruction where this byte is not used at all, then this byte can become waisted. "Can", because this fourth byte can be accessed in an indirect way so it is not a complete loss. But it will need some creative programming, see later.
The 161s cannot be tri-stated so two 541 buffers, IC22 and IC31 take care of that. Control signal I7 takes care of en- or disabling the 541s. When the code latches are filled with data, their outputs are disabled and all control signals are (H). To make sure that these 541 buffers are enabled, I7 is inverted first.
The temporary address lines A0..15
Thinking things over I soon found out that I could not use the Program Counter for temporary accesses to the memory or I/O. The reason is simple: setting the Program Counter for reading a byte means that it will continue the program at the address after the one needed for accessing the memory or I/O for the simple reason that I have no means to restore the Program Counter automatically to the original address immediately after that action. So I have to use some latches that will contain the needed address for that moment: IC19 and IC20. Control line I7 can be used here as well and it selects whether the temporary address buffer (not inverted) or the Program Counter (inverted) is active.The ALU
For the ALU I decided to use two 74181s, the world's most well-known ALU IC. I was tempted to use EEPROMs here as well but then it wouldn't be an all TTL CPU.The data needed as inputs for the ALU will be stored in two 573 latches, IC27 and IC28, first. The advantage of this design is that the flag information from the ALU stays available when the data on the data bus has been changed by other operations. IC03, a 573 latch, takes care of storing and outputting the data created by the ALU towards the internal data bus.
IC27 and IC28 will be used as temporary internal registers and are the already mentioned registers A and B.
The Flags and the use of them
So far I will only use three flags: Carry, Zero and Minus. The 181 outputs a zero flag but this one is only valid for four bits. OR gate IC34C takes care of combining the signals of both 181s to create a zero flag valid for a byte.Both 181s also output a Carry flag. The one coming from IC26, the first 181, is fed into IC29, the second 181. The one coming from IC29 is fed into IC36A, a 74 D-flipflop. The Q output of the D-flipflop is fed into the not-Carry input of the first 181, IC26. IC36A, 74 D-flipflop, has two functions:
- It enables the CPU to remember earlier states of Carry. Useful for ADC (ADd with Carry) and equivalent instructions.
- A program can set (I10) or reset (I8) the Carry on demand
Remark: the 181 uses an inverted Carry for whatever reason and I will keep it that way in this CPU. So in the hardware used here, an active Carry is LOW. So that's why I10, although it resets output Q of the flipflop, it sets the Carry.
If the Carry from the ALU has to be clocked into IC36A, signal I9 has to be set (L). At step 1000 all outputs of the instruction latches are tri-stated. The pull-up resistors pull the various Ixx signals (H), including I9. This causes the D-flipflop to latch the bit at the D input. Because ALL Ixx signals are pulled (H), this also means that function inputs of the ALU ICs are pulled (H) and this can change the level of the Carry at the D-input of IC36A. To make sure that the Carry is clocked before a possible change, I added an extra resistor, R2, to line I9 to make sure that the rising flank is steeper. I also count a bit on the internal delays of the ALU ICs.
The minus flag is derived from bit 7 of the output of the 181s = pin 13 of IC29.
The flags can be used for conditional jumps or branches. The advantage of branches is that an executable can be relocated within the memory which is impossible with an executable using jumps. But using branches means that the new address has to be calculated in real time, something that is built-in in a 6502 but will cost a lot of instructions for this CPU. I, or better my assembler, wont support branches. But any programmer is free to decide to use them.
How to select the flag needed for the condition? IC09, a 74LS153 4-to-1 multiplexer enables one to choose from four flags:
- bit 7 from the output of the ALU as the Minus flag
- the output of OR gate IC34C as the Zero flag
- the Carry output of ALU IC29
- the Q output of D-flipflop IC36A as the Carry flag
Controls I20 and I21 determine what flag is chosen.
The next step is to feed the signal of output Y of the 153 into OR gate IC34A and, through the inverter IC24C, into OR gate IC34BC. The output of these two OR gates is fed into an AND gate, IC23C. The output of this AND gate is connected to the /LOAD inputs of the 161s mentioned above.
I22 and I23 control the behavior of the two OR gates. If both controls are (H), both outputs of the OR gates will be (H) as well. Therefore the output of the AND gate and the /LOAD inputs will also be (H) as well. This means no active /LOAD and the 161s just keep on counting. During the first six steps the pull-up resistors make sure these lines are (H) anyway.
When negating both I22 and I23 at least one of the outputs of the OR gates will be (L) and this will cause the AND gates to output a (L) which will on its turn cause the 161 to copy the address saved into the 573 latches IC08 and IC14. In short this means that when both controls are (L), the 161s will behave like a jump.
The two last possible situations are the ones where one of the controls is (L) and the other (H). Let's have a look at the following table:
I23 I22 Flg | /LOAD
-----------------+-------------
0 0 x | 0 = always jump
|
0 1 0 | 1 = count
0 1 1 | 0 = jump
1 0 0 | 0 = jump
1 0 1 | 1 = count
|
1 1 x | 1 = always count
The first two and the last two rows have been explained already. Rows 3, 4, 5 and 6 handle the case where a conditional jump is needed. In words: - 'I22 = (L)' and 'I23 = (H)' handle the situation when a jump is needed when the chosen flag is not set.
- 'I22 = (H)' and 'I23 = (L)' handle the situation when a jump is needed when the chosen flag is set.
The above circuit enables us to have instruction (more or less) like the Z80's "jp z,xxxx" or "jp nc, $YYYY".
The registers
My TTL-CPU does not have internal registers like the 6502 or Z80. I did have a version that had registers in the form of a 2K*8 static RAM, in fact that was the very first version of version 1 that I started with. Here the CPU registers wouldn't have a specific function: the programmer is free to dedicate a byte of that memory to a certain register. This raised the question: if the internal registers of this CPU don't have a specific function, why can't we use external RAM instead and what are the possible advantages?- Two ICs less, the SRAM and a 573 to control it, thus a smaller board.
- With one 573 less, I need one clock cycle less to fill it and thus the CPU becomes faster.
- This on its turn means a smaller program.
- Some people don't consider RAM as real TTL, so the removal of it will make them happy.
- Whatever I found out about the old processors, mainly Zuse, they only used "external" RAM. That is, if you can still talk about external RAM: the whole computer, including processor and memory, was mostly one big design.
Does the design indeed become faster? There is one disadvantage by removing this register RAM: instead of one execution cycle to access the RAM, I need more of them. Before I can access the external RAM, I have to load the temporal address latches IC19 and IC20 with an address first. And that will cost me eight cycles.
Reminder: don't forget that the three registers of the ALU can, more or less, be used as internal registers.
The speed of my TTL CPU
I'm quit sure it will run at 1 MHz but I cannot guarantee anything. I just have to build it and see. For the same money I overlooked something very elementary and the whole project won't run at all.The control lines
- I0: Not used here. - I1..I2: for selecting the ROM, RAM and I/O - I3: R/W line of this processor - I4: (H) = read data from / (L) = output data to data bus through IC17 - I5: clock the output of the ALU into a 573 buffer - I6: read output of ALU, IC05 - I7: (H) output address Program / (L) Temporary address - I8: clear Carry, IC36a - I9: clock Carry coming from ALU into IC36a - I10: preset Carry, IC36a - I11: \ - I12: \ - I13: > select function 181 ALU - I14: / - I15: / - I16: halt the CPU by blocking PHI2 - I17: \ - I18: > select output 138, IC04: C0..6 - I19: / - I20: \ - I21: -- select condition for branch, IC09 - I22: \ - I23: -- select "jump", "branch" or "count" for the Program Counter - C0: clock low address into temp address buffer A0..7 IC19 - C1: clock high address into temp address buffer A8..15 IC20 - C2: clock data into IC27, input buffer A for ALU - C3: clock data into IC17 towards data bus - C4: clock data into IC28, input buffer B for ALU - C5: clock low-byte address into pre-load Program Counter, IC08 - C6: clock high-byte address into pre-load Program Counter, IC14 - C7: clock the output of the ALU into a 573 buffer, after inverting I5
Adding RAM, ROM and I/O
In the beginning I have thought about using this TTL CPU as replacement for other CPUs, like the 6502. But after giving it some thoughts I decided to make it a stand-alone computer by adding RAM, ROM and I/O. The first idea was to add 32 KB of EPROM, RAM, a LCD screen and a 6522 and to have it memory mapped. But then I realized that I could use the trick of separate I/O, as used by the Z80 and 80x86, here as well. This meant I could add I/O without the need to plunge it somewhere in the ranges meant for the RAM and ROM. And when adding I/O, why not making use of a MMU? That could solve the extra need for program ROM.In version 3 three ICs took care of selecting the RAM, ROM and I/O and I decided to replace them with one GAL, mainly to decrease the size of the board. When starting with the PLD equations for the GAL, I noticed quite a flaw in the original design of version 3, this lead from one thing to another and I ended up with an even more improved design.
Remark: For those persons who think that I'm cheating now by using a GAL, it is only used for selecting the RAM, ROM and I/O and therefore I don't consider it as a part of the actual processor. Period.
The RAM and ROM
As said before, the original idea was using 32 KB of each, but with the MMU I could afford to have 128 KB of RAM and ROM.The I/O
I only have three pieces of I/O:- a 6522 VIA
- a LCD screen
- a MMU, Memory management unit
The control lines I1 and I2 take care of selecting everything:
I1 I2 device: 0 0 LCD screen 0 1 MMU (Memory Management Unit) 1 0 6522 1 1 RAM and ROM
6522
The 6522 provides 20 I/O pins and some counters. The I/O pins end up into a 24-pin header and can be used for hooking up a keyboard, connecting it to a LPT port of a PC, an Arduino, or whatever.
Because the I/O is not memory mapped, I have sepparate instructions for it: IN and OUT. The MMU only needs one addres so something like "OUTA $01" would do to change the MMU settings. But the 6522 has 32 registers and the LCD two, so some extra information is needed. One idea is to use an extra parameter like "OUTA VIA,DDRA", the other like OUTA $0200 where the high nibble represents the I/O device and the low nibble the register.
LCD screen
I wanted some visual output and blinking LEDs are a bit to common. So, having them laying around anyway, I chose to use a LCD screen. Why not?
MMU
MMU, a fancy word for something quite simple in this case. Needing much, much more ROM than a normal CPU like the 6502 or Z80, version 1 had a 74273 on board to be used as address lines 16..23. Unfortunately, I found out that that could not work at all: for jumping to an address outside the original 64 KB range, all 24 bits have to be changed at the very same moment. And this CPU cannot handle that.
Remark: just FYI, I did find a trick to do it but many month after the production of this board. Add another 273 and feed it with the output of the first 273. Set the addres and the moment the 161 are set, the address is fed into the second 273 as well. If I do another version.....
That left me in fact with an 8-bit I/O port (but only capable of output) which reminded me more or less of the onboard I/O port of the 6510 CPU of the C64. And that I/O port is the base of the C64's memory management.
Four lines are directly connected to the 29F010 128 KB EEPROM. After a reset they all output a zero, no problem so far. At first sight it looks that if I changed their contents, I would run into the same trouble as mentioned above. It would if I ran the instructions from ROM. But when running from RAM, nothing can happen. That is, provided that the programmer did a good job.
And then the assembler changed things.....
The MMU needs only one address. The LCD needs two addresses and the 6522 needs an address range of sixteen bytes, so how to deal with that? One idea was to use a byte where the lowest nibble represented the address and two bits of the highest nibble I1 and I2. But why should I reserve I1 and I2 only for my I/O? The GAL also has some address lines attached, why not use them as well? The new idea:I1 I2 address 0 0 $0xxx LCD screen 0 0 $4xxx MMU (Memory Management Unit) 0 0 $8xxx 6522 0 1 free for other use 1 0 free for other use 1 1 RAM/ROMThe above means we need five nibbles for the operand: one for setting I1 and I2, four for the address. There is even more, see later.
Extra hardware
To be able to run stand-alone at all, I added some extra hardware:- an 1 MHz oscillator.
- the same circuit as I use for my 6502 debugger, switch S2 and the ICs IC30 and IC38, so I can step this TTL-CPU
- I added a LED and a resistor to see if the computer was halted.
- my own one Reset circuit: switch S2, R6 and C26.
- And as last: you can feed the board with 5V either using one of the 4-pin plugs of an AT power supply, a round jack plug (not added yet) or the big white connector.
How the combination MMU/GAL changed things
The combination of MMU and GAL changed a lot in the way RAM and ROM can be used. Four outputs of the MMU, marked M4 to M7, were originally not used at all and are now fed into the GAL. M4..7 can now be used for various purposes:- The amount of used RAM and ROM.
- Within the allocated part, the amount of common RAM and the amount of swappable RAM.
- Within the swapable part, which part is to be used.
Be aware that M4..7 only gives us only a limited number of choices, sixteen to be pricese. But GALs are reprogrammable and enables us to experiment with various ideas.
OOPS, an error?
Having a MMU handling the ROM looked like a good idea but writing about it in detail, I now realise I probably made a mistake. I can tell the GAL to have 32 KB of ROM but accessing the part above the first 8 KB is the problem: address line A13 and A14 have not been connected to te EEPROM so the first 8 KB of it is mirrored all over the next 24 KB. Big bummer :(Can something be done about? I'm quite sure it is so let's turn this error into a feature.
Main idea: first 8 KB will be ROM, rest is RAM.
Two secundary ideas:
- First 24 KB after the ROM is common RAM, the upper 32 KB is swappable.
or
- First 40 KB after the ROM is common RAM, the upper 16 KB is swappable.
The first idea needs two bits to select the needed part of 32 KB of the 681000, the second idea needs three bits for selecting the needed 16 KB part.
Last idea: use one bit to swap the first 8 KB of ROM with RAM.
This more or less means I'm turning the ROM into a kind of ROM disk. The idea: after a reset the program in the ROM copies a routine from ROM into the RAM, jumps to this routine and executes it.
The software
There is no software for it yet. Being a complete new and unique processor, it means I need at least to write a new assembler for it. Or in my case, another module for my "Multi Processor" assembler.The opcodes
When designing a CPU with an Instruction Decoder, one can think: which opcodes do I want to support? But in this case, it is more or less like: what opcodes are hidden inside this design? An 8-bitter can support up to 256 one-byte opcodes. Seen from an opcode point of view, this CPU is a 24 bitter and theoretically can haves 16 MB of opcode bytes. But most of them just do nothing or have no meaning. For example: set all control bits to one and then, to whatever value you set the control bits of the ALU, nothing will happen. In other words, 524.288 code bytes that simply do nothing.The very first idea was that my assembler would only support the basic one-byte opcodes and macros had to be created for supporting multi-byte opcodes. For example, to mimic the 6502 instruction "LDA $1234" first the temporary address had to be set, and that on itself already needed two one-byte opcode, plus one byte for reading the data byte. But I changed my mind when seeing what was needed to execute a conditional jump. For the moment I will draw a line at branches: a branch means that, knowing the offset, the real address has to be calculated first and that will take IMHO too many instructions. This is a simple CPU so let us keep the instructions as simple as possible as well.
The second idea was using the instructions of the 6502 as base but having no internal registers simply means that an instruction like LDX won't make any sense. OK, there is nothing against dedicating an external address in RAM as being the register X but that idea went IMHO beyond the "simple" line. On the other hand, I will support the three registers this CPU does have: A, B and R.
The Z80 is not left out in this. The Z80 supports conditional jumps and this TTL CPU supports them as well. And the same for the HALT instruction.
When programming the assembler, I made an interesting discovery. There is an instruction named AND. The only thing it does in case of this processor is ANDing the contents of ALU register A and ALU register B. This is taken care of by the two 181 ICs. But once the actual function has been performed, what should I do with the result? There is a chance that I'm only interested in a side product like the Zero Flag and not in the actual result. In other words, do I want to store the result into the ALU output register or not? The interesting point: except for one single bit the code for these two instructions is the same. So it seems that I even got more opcodes than I had in my mind. The only problem is giving them a nice and understandable name. My idea: add an "R" for "Register" at the end of the opcode where the result should be stored. For example, ANDR in case of AND.
This discovery lead to the thought that, instead of looking for what I want, I systematically have to search for the possible opcodes that, more or less already existed. I already can tell you on forehand that that lead to some surprising finds.
A good candidate to start with is the ALU because we need 5 bits for it. 5 bits mean 32 functions. But only 11 of them are usable IMHO. For example, I personally have no idea what to with a function like "A and (not B)". I'm quite sure that occasionally there will be a need for this function but to reserve its own opcode for it? Anyway, here is the result:
A - output the data from input A
ADC - add buffer B to buffer A with Carry
ADD - add buffer B to buffer A
AFF - the ALU outputs $FF
AND - and buffer A with buffer B
B - output the data from input B CLR - load the ALU output register with a zero
CMP - subtract buffer B from buffer A (but don't use the result)
DCA - decrement buffer A
XOR - eXOR buffer A with buffer B
ICA - increment buffer A
NOTA - not/invert buffer A
NOTB - not/invert buffer B
OR - or buffer A with buffer B
SHL - shl buffer A
SBC - subtract buffer B from buffer A with Carry
SUB - subtract buffer B from buffer A
Remark 1: ADC/ADD and SBC/SUB/CMP use the same functions. Therefore only 14 functions instead of 17.
Remark 2: a lot of opcodes have a "R" variant, i.e. the result is stored into the ALU output register as well, like ADD and ADDR. With CMP, for example, we are only interested in the flags and therefore have no need for storing the result. In fact, CMP and SUB are the same function but, in this case, SUBR does make sense and CMPR does not.
Another discovery: As you can see above, the 181 can output three fixed numbers: 0, 1 and $FF. The opcode CLR outputs a zero. The opcode CLRR does store the zero in the ALU output register. And what can we do with it now? There is nothing against it storing it in register A or B. So that would lead to something like "CLRR A". This looks a bit like overkill but it isn't because the transfer can be done in the same 4-byte instruction!.
Can this be done for the instructions ANDR and ADDR? Certainly not! Technically spoken I could open register A and the content of the ALU could be written into it. But as the content of register A led to the result, we run into an infinite loop.
And this led to the next thought: if the result of CLR can be written to register A, then writing it to RAM in the same 4-byte instruction should be possible as well. OK, the Temporay Address registers should have been set on forehand but that could be part of an update operation:
lda $1234 dcar TThe first instruction is made out of three 4-byte opcodes:
- LTAL #$34 = set the LB part of the Temporay Address register
- LTAH #$12 = set the HB part of the Temporay Address register
- LDA = load register with value found in the RAM
The second opcode feeds the contents of register A into the ALU where it is decreased with one, fed into the output register and from there fed into the RAM, into the same address where the original valu came from.
The one 4-byte opcodes I wil support so far are:
adc ; add A and B and Carry adcr ; and clock result into output ALU adcr T ; use Temporary address add ; add A and B, Carry ignored addr ; and clock result into output ALU addr T ; use Temporary address aff ; ALU outputs $FF affr ; and clocks result into output ALU affr T ; use Temporary address and ; A and B andr ; and clock result into output ALU andr T ; use Temporary address ccf ; copy Carry from ALU into flipflop clc ; clear Carry in flipflop clr ; ALU outputs a zero clrr ; and clocks result into output ALU clrr T ; use Temporary address cmp ; compare A and B dca ; A - 1 dcar ; and clock result into output ALU dcar T ; use Temporary address ; Far load, load from another segment flda @0 ; TA already set fldar @0 ; TA already set fldb @0 ; TA already set fldbr @0 ; TA already set ; Far store, store into another segment fsta @0 fstb @0 fstr @0 hlt ; halt the CPU completely icA ; A + 1 icar ; and clock result into output ALU icar T ; use Temporary address ; Temporary Address register has already been set ina ; read data from I/O into A inar ; read data from I/O into R through A inb ; read data from I/O into B inbr ; read data from I/O into R through B ; Program Counter register has already been set jcc ; jump if Carry is clear jcs ; jump if Carry is set jeq ; jump if equal, Zero flag is set jfc ; jump if Carry from flipflop is clear jfs ; jump if Carry from flipflop is set jmi ; jump if minus (bit 7 is set) jmp ; unconditional jump jne ; jump if not equal, Zero flag is clear jpl ; jump if plus (bit 7 is clear) lda ; load register A with data from address lda #$12 ; load register A of ALU with 4th byte ldar ; load register R with data from address ; through register A ldar #$56 ; load output register of ALU with 4th ; byte through register A ldb ; load register B with data from address ldb #$34 ; load register B of ALU with 4th byte ldbr ; load register R with data from address ; through register B ldbr #$78 ; load output register of ALU with 4th ; byte through register B lpch #>($1234) ; load HB Program Counter pre-buffer lpch A ; with register A lpch B ; with register B lpch R ; with register R lpch ; use Temporary address lpcl #<($1234) ; load LB Program Counter pre-buffer lpcl A ; with register A lpcl B ; with register B lpcl R ; with register R lpcl ; use Temporary address ltah #>($1234) ; load HB Temporary Address ltah A ; with register A ltah B ; with register B ltah R ; with register R ltal #<($1234) ; load LB Temporary Address ltal A ; with register A ltal B ; with register B ltal R ; with register R nop ; no operation nota ; not A notar ; and clock result into output ALU notar T ; use Temporary address notb ; not B notbr ; and clock result into output ALU notbr T ; use Temporary address or ; A or B orr ; and clock result into output ALU orr T ; use Temporary address ; Output to I/O outa ; TA already has been set outb ; TA already has been set outr ; TA already has been set sbc ; subtract B from A with Carry sbcr ; and clock result into output ALU sbcr T ; use Temporary address sec ; set Carry in flipflop shl ; A + A shlr ; and clock result into output ALU shlr T ; use Temporary address sra ; save register A into 4th byte srb ; save register B into 4th byte srr ; save register R into 4th byte ; The Temporary address already has been set sta ; store register A into address stb ; store register B into address str ; store register R into address sub ; subtract B from A, Carry ignored subr ; and clock result into output ALU subr T ; use Temporary address tab ; A -> B tar ; A -> output ALU tba ; B -> A tbr ; B -> output ALU tra ; output ALU -> A trb ; output ALU -> B xor ; exor A and B xorr ; and clock result into output ALU xorr T ; use Temporary addressSeveral notes:
- The SRx commands, storing data into the 4th byte only work when a program is running from RAM.
- Most commands where the ALU is involved, a 'R' can be added and the result will be stored into the ALU output register.
- In cases where the registers A and B are not involved, the content of the ALU output register can be stored into register A or B.
- The content of the ALU output register can be stored into RAM without the need of an extra instruction.
- After writing the above I realised
I already mentioned the instruction "LDA $1234" above. What is more easy in use, this multi-opcode instruction or using the three single opcodes? Being used to opcodes of the 6502, Z80 and 80x86 processors, "LDA $1234" will look more familiar and elegant than the use of the three opcodes. For that reason I created some multi 4byte instructions:
adcr $1234 ; and save to RAM addr $1234 ; and save to RAM affr $1234 ; and save to RAM andr $1234 ; and save to RAM clrr $1234 ; and save to RAM dcar $1234 ; and save to RAM icar $1234 ; and save to RAM ina $0000 ; read data from I/O into A inb $4000 ; read data from I/O into B inar $8000 ; read data from I/O into R through A inbr $C000 ; read data from I/O into R through B jcc $1234 ; jump if Carry is clear jcs $1234 ; jump if Carry is set jeq $1234 ; jump if equal, Zero flag is set jfc $1234 ; jump if Carry from flipflop is clear jfs $1234 ; jump if Carry from flipflop is set jmi $1234 ; jump if minus (bit 7 is set) jmp $1234 ; unconditional jump jne $1234 ; jump if not equal, Zero flag is clear jpl $1234 ; jump if plus (bit 7 is clear) lda $1234 ; load register A with data from address ldar $1234 ; load register R with data from address ; through register A ldb $1234 ; load register B with data from address ldbr $1234 ; load register R with data from address ; through register B lpcw #$1234 ; load Program Counter pre-buffer lpcw ab ; = LB / HB ltaw #$1234 ; load Temporary Address ltaw ab ; = LB / HB notar $1234 notbr $1234 orr $1234 outa 4000 ; output register A to I/O outb 8000 ; output register B to I/O outr 0000 ; output register R to I/O sbcr $1234 shlr $1234 sta $1234 ; store register A into address stb $1234 ; store register B into address str $1234 ; store register R into address subr $1234 xorr $1234 fjmp @0,$1234 ; far jump, set set I1 and I2 flda @0,$1234 ; load register A with data from address fldar @0,$1234 ; load register R with data from address ; through register A fldb @0,$1234 ; load register B with data from address fldbr @0,$1234 ; load register R with data from address ; through register B fsta @0,$1234 fstb @0,$1234 fstr @0,$1234In most case the Program Counter register or Temporary Address register are involved, in two cases register A and B. As you can see, these commands just simplify addressing a 16-bit address.
If you didn't notice, one well known opcode is missing from this list: JSR/CALL - execute a subroutine. The intention was there but when I started to write my assembler, I noticed that I had no means to read the content of the Program Counter and therefore had no idea what to push on the Stack. A bit of a bummer. Bad design? Neah, the very first computers didn't have subroutines either. It just means we have to be more creative with using loops in the program.
For obvious reasons RTS, PHA/PLA and equivalent opcodes are missing as well. Theoretically I could implement the PUSH and PULL instructions but they involve so many memory movements that any possible gain of this opcode is completely lost in the number of cycles this little CPU needs to execute all these 4-byte opcodes. But I did change my minde before.....
Using the extra I/O range for memory
As mentioned above, the free range can be used for external RAM and ROM. The IN and OUT instructions can be used for reading or writing individual bytes. But it would be more interesting to run programs from this memory. It is no problem to create a FJMP "= far jump" instruction. This instruction not only jumps to the new address but also tells the assembler to remember the setting of I1 and I2 for the next instructions. Temporary accesses to other parts of memory can be made by adding a "F" in front of the instruction and adding the fifth nibble. But as far as I can see, adding the FAR attribute will only work for instructions that work with addresses like the load and store instructions and JMP. It won't work for the conditional jumps because setting I1 and I2 will be a compiler feature. The compiler cannot know if a conditional jump will be performed or not so won't know if I1 and I2 must be set or not.An added extra: I foresee that this feature can overcome my unwanted limitation of only having 8 KB of ROM. IMHO an external ROM won't have this limitation at all! We will see.
I thought about this feature later. Checking all opcodes once again, I realised that "FLDA @0,$1234" and "INA @0,$1234" are exactly the same. The same for their B and R version. This is also true for "FSTA @0,$1234" and "OUTA @0,$1234"plus their equivalents. I have thought about dropping IN and OUT for this reason but dropped the idea for one simple reason: I, for myself, simply cannot associate a "far load" or "far store" instruction with I/O anymore.
Then I got another idea. When reserving one whole segment for I/O means I have still left three for normal memory. The idea is telling the assembler which segment has been reserved for I/O and when running into IN or OUT, the opcode is replaced with FLDA or FSTA and as segment the one reserved for I/O is taken.
Different combinations of opcodes
When the older Commodore computers with the IEEE bus have to send a byte over IEEE, they first negate this byte in software. At the side of the receiver it is negated again by software. To be honest, I never did and still don't understand the reason for it.The above could be performed by this TTL cpu like this:
lda #$xx nota sta IEEEBut this is an other possebility:
lda #$xx notar str IEEEBe my guest what you prefer.
Creative programming
Using this barrel organ design can have an disadvantage: as said above, the fourth byte of an opcode is not always used. This is certainly the case with implecit instructions, instructions that don't have an operand like AND, ADD, CCF and CLR. Here the fourth byte is not used at all. But due to the four-byte cycle it is stored anyway into the program.Can it be used in another way then? Yes, it can. I don't see any reason why the CPU is not able to read this individual byte or to write to it in case it is RAM.
How should an assembler deal with it? The assembler will output 4-byte instructions only. For the moment I only see one option, shown in the next example:
lda $2000 DATA: and . . . lda DATA+3Explanation: register A is loaded with the 4th byte after the label DATA. Bytes 1, 2 and 3 are the 24 bits of the actual instruction, byte 4 the byte of data.
Testing version 2
Already from the start of this project I was thinking about how to test My-TTL-CPU. I decided to turn to a solution I already use for over 20 years: I have some ISA cards with four 8255s on board laying around. The total of 96 I/O pins of these four 8255s can be accessed through two 50-pin headers. So I made a little PCB that enabled me to connect the TTL-CPU to such an ISA board using a SCSI cable.But most people won't have this mean of testing the My-TTL-CPU and for this and various other reasons I started to look for another solution, a solution that can be accessed by almost everyone and is relatively cheap. What I found was the Arduino Mega2560. It has, IMHO, enough I/O ports to test the TTL-CPU. But there is one problem: I'm still not familiar enough with it. I can use it in two ways (I think):
- It runs a program that does all the testing and the connected PC serves only as a terminal.
- It acts, more or less, as a slave for the connected PC and only sets or reads ports on request of the PC. So the PC runs the actual test software.
The advantage of the first method: faster than the second one. The disadvantage: I really need to know thoroughly how to program the Arduino. (yes, I'm lazy)
The advantage of the second method: I already have an INO for the Arduino so it can act as a slave and I know how to program the PC. The disadvantage: probably slower than the first method. We'll see.
An emulator
I decided to write an emulator. It is written in "Free Pascal" in text mode (for the moment). The main reason to write an emulator is to be able to test programs before running them on the real hardware. But writing an emulator means one must have a good understanding of the hardware. While writing the emulator, for example I found out that I had to store the new address minus one into the pre-load latches IC08 and IC14 instead of the new address itself..The future
Never say never but the chance that I will build a newer version is very small. If there is anything I want to change then it is adding one or two extra temporary addresses. Unfortunately this design has no room left for these extras. The only room I see so far is connecting output I0 with a free input of the GAL. But any other addition requires an extra 573 register which on its turn will change the software etc., etc., etc. I think I will leave it as it is.You can email me here.