Ruud's Commodore Site: Diagnostic ROM for the PC/XT Home Email

Diagnostic ROM for the PC/XT




What is it?

An EPROM that should replace the BIOS of an IBM (compatible) PC/XT and which enables you to test/diagnose this PC.


Last and final update

Modem7, the owner of the well known site IBM 51xx PC Family Computers, has taken over this project. What pleases me that he will still give this project my name, thank you very much!. You'll find it her from now on: Ruud's Diagnostic ROM.


Background

The only software available (that I know of) to test a PC is the one of Landmark and can can be found at this page. I know that the Landmark ROM for the IBM PC/XT and clones contains some small errors but I cannot remember exactly why I decided to write my own one. I know at least three reasons: 1) I prefer to use products that can be controlled by me in one or another way, 2) the very long time that Landmark needs to test the RAM and 3) just out of fun.


What can it do, and what not

My ROM can be used to diagnose the hardware of an IBM PC, XT or a compatible computer for errors. But be aware, there is only a limit to what it can diagnose and test. If, for example, the crystal is broken, the PC won't start up and the diagnostic ROM cannot be run at all. If, for example the ROM says the Programmable Interrupt Controller is broken but a replacement won't work either and the original IC works fine in another computer, then a part in the glue logic is broken. But the ROM certainly cannot tell you which part.

My diagnostic ROM is meant to be burned into an (E)EPROM and to replace the BIOS on the motherboard. What (E)EPROM is to be used depends completely on the type of the original ROM. Most boards accept the 2764 EPROM. But in case of the IBM PC, the very first PC with a 5-slot motherboard, the BIOS has been burned in a 2364 ROM and you either need a special EPROM, which is very hard to get, or you need to build a 28-pins to 24-pins converter.

In what computer can it be used? Certainly in an IBM PC and XT, and various Taiwanese clones. Being a Commodore fan I'm sorry to tell you that it won't run in a PC10/20-III. I have no idea about Olivettis or Compaqs.


What minimum hardware is needed?

The minimum configuration is a motherboard with RAM and a MDA or CGA video card (or both) with according monitor. VGA and EGA are not supported. The reason: the EGA and VGA cards need to be initialized by their own ROMs and I'm sure that they run subroutines and therefore need RAM for the stack.
Yes, I know, the Landmark diagnostic ROM supports EGA by using its own routines but 1) IMHO that needs more effort than it is worth the trouble and time and 2) EGA monitors are rare nowadays. Not satisfied and you want EGA? The sources are free so be my guest and add them yourself!
This ROM expects that a MDA or a CGA video card is present. In case both cards are present, in this version only the MDA card will be used.

Remark: in version 2 I use some free video memory as Stack. Writing the above this idea popped up: why not using the video memory of a VGA/EGA card as Stack? I will have a look at it.


Preferred extra hardware

Preferred is to connect a keyboard, a floppy disk controller (FDC) and a floppy drive with a formatted floppy in it as well. The diagnostic ROM can test the presence of certain registers in ICs but reading data from a floppy is the best proof that various ICs on the motherboard work well. Remark: I assume that the FDC, drive and floppy have been tested on another system and work fine.
The ROM will also test if the BASIC ROMs are present. If so, the ROM will check their checksums.


Versions

So far there are two versions. Modem7, famous for his special IBM site, discovered a bug and I fixed it. But reading my own comment in the sources I saw one mentioning to use the unused video RAM for the Stack. And I decided to turn that idea into reality. That version is version 2.
If I can get VGA to work (I don't own EGA so I cannot test that), that will be version 3.


The explanation of the source code

I had one advantage when I started to write the code: I have written code for BIOSes before and I always prefered to add as much as test functions as possible. The words between brackets refer to the labels in the source code. This makes it easier to find things back.

[Coldstart]
The first main things that is done is to send the byte zero to the data registers of LPT1, LPT2 and LPT3 and to I/O address 80h using the macro OutDebug. This address was first used by IBM in their AT computers for diagnostics. A card containing a latch, some decoding logic and two hexadecimal LED displays can show the byte sent to this address. IMHO the card can be built quite easily by yourself, I did that, but can also be bought on Ebay or Aliexpress.
When using a LPT card, you only need eight inverters and LEDs to display the data (and some means to power the whole). I use a macro to output the byte to the various registers. That simplifies things a lot if I have to change this specific piece of software for whatever reason, for example adding extra hardware like a LCD display.

[TestCPU0]
Then the CPU is tested. So far in my whole life I only ran once into a CPU, a 65816, that was partly broken. But I included this test, not only because the IBM BIOS and the Landmark Diagnostic ROM do it, but if it doesn't help, it certainly doesn't harm.
Is the test complete? Certainly not. Theoretically it could be that only the instruction "XOR BP,[1234]
" is broken. I just checked and I don't use this instruction at all in the source code. So if everything else is fine, every diagnostic run will turn out fine and yet the user could experience problems when running programs under an operating system.
I had the idea of expanding this ROM with a part that only tests the CPU for every instruction but an ASM file that contained all possible opcode lead to BIN of 40 KB big. I unfortunately see no way to squeeze that into a 8 KB ROM.

How are we informed that the CPU is bad? The only way so far is to look at the LED displays of the diagnostic card or the one attached to the LPT port. But with a broken CPU we cannot be sure if even that little information is displayed. If the display does indeed show "80", we are more or less in luck.
But what can we do if a bad CPU is found? Then the Diagnostic ROM will stop the test right here anyway. Why continuing the test with a bad processor?
Two remarks:
- A bad EPROM could cause this error as well. But when the used ROM works fine in another system, we can assume the CPU is bad.
- Remember the remark about broken glue logic I made above: a bad part can also make the CPU look bad although in fact it is not. Only exchanging the CPU with one coming from a good board can tell what is going on.
So for the rest of this document: if an error is found, keep in mind that other parts can be blamed for the error than the one that is mentioned.

[CheckSum]
As said above, a bad EPROM could result in an incorrect test, not only the CPU test, but any test. Therefore a checksum is executed. To make sure that the EPROM self is fine, first test it in a working system (if possible).

[InitIO]
This parts initializes some I/O and disables the video for the moment.

[CheckVideoRAM]
In version 1 I used video memory to store several variables. The idea rose to use video memory as Stack as well. But before it can be used, I want to be sure that it is all right. When everything is found in good order,

[InitVideo]
This parts initializes the found vide, clears the screen and disables the cursor.

[CounterRAM]
I need a counter to count the number of passes and the number of errors found during those passes. The RAM found on board of the motherboard cannot be used because, first, any intensive test would wipe that data anyway. Second: in most cases it is DRAM and DRAM needs to be refreshed by the hardware and the ICs responsible for this have not been initialized yet. And during further runs the ICs will be tested again, thus stopping temporary the refresh which will cause the DRAMs to lose their data.
The Landmark Diagnostic ROM does count by increasing the numbers projected on the screen i.e. increasing the numbers in the screen memory. The calculation of the place of every counter is hardcoded, something that is not desirable. But I used the idea of accessing the screen memory directly in another way: I use the part of memory that is not used for the screen. The MDA card has 4096 bytes of screen memory but only 80 * 25 * 2 = 4000 bytes are really needed which leaves us 96 bytes to use for our own purposes. So far I only needed 25 bytes, that lead to the idea in the end to use rest of it for theStack in version 2.
At the end of this part, all RAM is cleared, thus clearing all variables as well.

[DiagLoop]
The part mentioned above is run only once, every next run will restart its diagnostic tests at this particular spot. Why is the above only run once? In case of the counters the reason is clear: you can only initialize them once. Initializing, read clearing, them every time makes no sense at all. Initializing the video only once makes sense as well: if it doesn't work the first time right, the user will have no screen and will know immediately that there is something wrong. The "what or why" will remain a question unfortunately because this ROM is meant to test a motherboard, not the hardware of a video card. And once the video runs, why tinker with something that runs fine?

The first thing the loop starts with is outputting the POST code but also displaying it in the right upper corner of the screen. This is new in version 2.
The next step is displaying the title of the ROM and its version to give a beep sound. If there is no text on the screen but you hear the beep, something is wrong. Bad video card, monitor or cable? I cannot tell you but unfortunately you are on your own.

[TestCPU2] and [CheckSumROM2]
Both the CPU and the checksum of the EPROM are tested again but now with an according text displayed on the screen. Why? It is known to happen that ICs can malfunction the moment they become warmer. A good reason to have them tested every run.

[CheckTimers]
This is the start of testing the three timers of the 8253 Programmable Interval Timer. Before the test can start the DMA and speakers have to be disabled.

[Check8253]
This is the start of a loop where every individual timer is tested. Register DX determines which will be that timer. When ever a test passed or failed, a table is used to determine what message is outputted and what the next action should be.
Remark: The ROM halts the test if timer 1 fails. Without this timer, the refresh coming from the DMA IC won't happen.

[Check8237DMA]
Testing the 8237 is a bit of a problem. The test that I perform is certainly no rocket science: just filling some registers with a value, read the value back and compare the read value with the original one. But a passed test does not mean the 8237 is fine. A nice running motor in a car is no guarantee that the car is fine if you have no means to test the gear box. That's why I mentioned the preference of having a FDC and drive with a floppy attached: if THAT test goes fine, we can be reasonably assured that the 8237 is OK.
If this particular test fails, the whole test is halted. No DMA means no refresh. No refresh means no RAM and in the real world a PC with a broken 8237 would stop as well; you first had to replace it before you could continue. Why should I limit myself? :)

[InitRefresh]
In this part of the code all actions are taken to enable the refresh of the DRAM.

{ Check8255Par]
Check if a parity error occurs even if the parity check has been disabled. Nothing more, nothing less.

[Check2KbRAM]
Now we check the first 2 KB of RAM in a very intensive way. That is done in two phases. The first phase is making sure that all parity RAMs have been filled correctly. That is done by:
- enabling the parity.
- writing the value 00 into the first 2 KB of RAM.
- reading these 2 KB. Just reading, not checking the read values.
- writing the value 7F into the first 2 KB of RAM for a different value of parity.
- reading these 2 KB again.
- clearing the parity check bits.
- enabling the parity again.
The second phase is the actual testing of the RAM:
- fill the 2 KB of RAM with one of the four test values: 55h, 0FEh, 0AAh or 01h. Using these value make sure that both every bit and the parity is tested. Only using 55h and 0AAh as Landmark does is not enough IMHO: the parity does not change.
- wait quite a bit, if the refresh doesn't work, the next compare will fail.
- read and compare the read value with the original one. If not the same, quit the test.
- check the parity. If wrong, quit the test.
- get the next test value and loop.
If these tests fail, the whole test is halted: no RAM means no stack and interrupt routines. In this case the offending address and the bad bits are shown. This information should give the user enough information to replace the offending DRAM IC(s).
If these tests run fine, that means I have 1 KB available for the interrupt routines.

[CheckAllRAM]
This routine checks the rest of the RAM. My previous versions checked the RAM by testing every byte one by one. But checking all of the RAM in this way costs a lot of time. Using four instead of two values, like Landmark does, costs even more time. So I did an analyze and found the reason: the waiting time X needed to check if the refresh worked well. In case of having 640 KB on board and doing this for every KB meant I had to wait 640 times X. Filling all 640 KB in one go and waiting X meant that I only had to wait 1 times X. Now I could even afford two, three or more times X and still be much, much faster than the original routine. But testing 640 KB will still take some time and I needed a mean to inform you of the Progress.

The first thing the test does is checking how much RAM is on the board. It starts doing this by checking the first byte of the last possible segment (09FFF:0000h) with the same four values mentioned above without any pause for the refresh. If this little test fails, it will subtract 0400h (16 KB) from the segment value and will test the next byte in this way. If, for what ever reason, even the test for the very first 16 KB segment (03FF:0000h) fails, the diagnostic ROM will still assume that at least 16 KB is present.
Why steps of 16 KB and the above assumption? This has to do with the very first IBM PC, it has four banks of 16 KB each on board. In this way I'm sure that I test at least one byte of each bank. If the test also fails for the first 16 KB, then it simply means that a part of the DRAM is bad.
When a working byte is found, the actual segment will converted to a decimal number that represents the number of KBs on this board. This number will be the first possible clue to the user if there is something wrong with the RAM.

The next step is done in the same way as testing the 2 KB of RAM as done above. The only difference is that not just 2 KB is filled and tested but all the found RAM. As I already mentioned, I wanted to show the progress of the complete test. But it should be done in a way that will cost as less time as possible.
I start with displaying the number 0, 1, 2 or 3 in the first column of the next line. The number represents the test value that is used to test the RAM. Next for every 16 KB of RAM that is filled with this value, a dot is displayed on the screen. When checking the read value, the dot is removed again for every checked 16 KB.
In case no error is found, a message will mention this. But in case an error is found, a message will mention this as well, plus the offending address and bad bits.

[CheckPIC8259]
Check the 8259 Programmable Interrupt Controller (PIC). This part only checks if the registers can be written and read correctly.

[CheckBadIRQ]
This part checks if the 8259 sees an interrupt where there isn't one. The vector table for interrupts 8..15 are all loaded with the same pointer to an interrupt routine. Some time is waited and then is checked if an interrupt has been noticed. If nothing has been noticed, things are fine.

[CheckINT0]
Now the 8259 is configured to recognize interrupts coming from timer 0 of the 8253. Again some time is waited and again there is a check if an interrupt has been noticed. If it has been detected, things ar fine. If not, either the 8259 or 8259 can be blamed. Even both the ICs.

[CheckNMI]
This part checks the Non Maskable Interrupt. The only thing that can be done is disabling the memory parity and wait. If nothing happens, then things are fine. Unfortunately there is no way to test if the NMI does recognize interrupts. Standard on a PC the NMI is only used to detect parity errors. As these don't ocurre on a good working PC, that is the end of the story.

[Check8087]
Check if a 8087 co-processor is present and display on the screen whether or not this is the case. Unfortunately I have no idea how to test the 8087, I have no knowledge to program it yet. I hope that this will change one day.

[CheckKeyboard]
Check if a keyboard is present. After enabling the keyboard a interrupt should be generated and the keyboard should send the code "0AAH".

[CheckKeybScan]
Check for unexpected interrupts caused by the keyboard. The most logical reason for these interrupts are one or more stuck keys.

[CheckFDC]
First check if there is a FDC card present i.e. program the 7657 IC and check if it acts as it should act.

[ChkReadFloppy]
Then check if the flopyy can be read. What you should see at least is that the motor starts to spin and the head is moved to track zero. Tip: move the head off track zero before the PC is powered on (if possible). After the disk has been read, the motor is stopped.

[CheckExtraROMs]
Check for extra BASIC ROMs, if present. It can be that there is a ROM in the socket but thet the checksum turns to be wrong. In that case the PC doesn't consider the ROM to be present.
Remark: the diagnostic ROM does not test ROMs in the 0Cxxxxh and 0Dxxxxh segment.

[ReadSwitches]
Read the on board dip switch settings. The program uses a trick to distinguish between an IBM PC with two blocks of dip switches and the IBM XT with just one block. There is no guarantee that this piece of code will work for other non-IBM PCs.

[SwitchPC] and [SwitchXT]
Display the settings of the dip switch(es) of either the PC or the XT.

[DispTotalCount]
Display the number of loops so far.


The used subroutines

[OutBeeep]
Output data to the diagnostic ports and give a beep

[Beeep]
Let the PC speaker beep for a moment.

[CalcSpotScreen]
Calculate the spot on the screen where to place some text.

[CheckForINT]
Check if an interrupt has ocured.

[DispCritical]
Display a critical error message on the screen.

[DisplayCorner]
Output a debug code to the last positions of the first line.

[DisplayFailed]
Display the text "FAILED" behind the message pointed to by register DX.

[DisplayFailedA]
Display the text "FAILED" behind the message pointed to by register DX and remove the arrow in front of the text.

[DisplayPassed]
Display the text "PASSED" behind the message pointed to by register DX.

[DisplayPassedA]
Display the text "PASSED" behind the message pointed to by register DX and remove the arrow in front of the text.

[DisplayValue]
Display the decimal value in the registers DH/DL/BL on the video screen.

[DispSecondMsg]
Display a status message behind a previous one.

[AL2DEC]
Convert the content of AL to a decimal value.

[AX2DEC]
Convert the content of AX to a decimal value.

[DoChecksum]
Perform a checksum on a block of 8 KB of ROM. This test is only meant to be used for the BASIC ROM. It cannot be used for ROMs in the 0Cxxxxh and 0Dxxxxh segment (which isn't tested anyway).

[DoChecksumRes]
Run the above subroutine and display the results.

[GetErrorValue]
Retrieve the error counter value from the invisible screen memory.

[ProgramFDC]
Wait for the 765 to be ready and program diskette command/data register 0

[ReadDataFDC]
Read data from the 765 floppy controller.

[]






Having questions or comment? You want more information?
You can email me here.