The Second Book Of Machine Language

Chapter 1
How to Use This Book

The dual nature of this book-it's both a text and a program-offers you a choice. You can follow the ideas: reading through the chapters, studying the program listings, and deepening your understanding of machine language programming.
Alternatively, you can type in the LADS assembler and experiment with it: learning its features, trying out modifications, and using it to write your own machine language programs. Appendix A describes how to use the assembler and Appendix B provides instructions on typing it in. If you choose this second approach, the rest of the book can serve as a reference and a map for modifying the assembler. The tutorials can also help to clarify the structure and purpose of the various subroutines and subprograms.
LADS is nearly 5K long, and for those who prefer not to type it in, it can be purchased on a disk by calling COMPUTE! Publications toll free at 1-800-334-0868. Be sure to state whether you want the Commodore, Atari, or Apple disk. The disk contains both the LADS source and object code (these terms are defined below). To create customized versions of the assembler, you will need the source code. It, too, can be typed in (it is printed in sections at the end of Chapters 2-9). If you don't type in any of the comments, it is roughly 10K long. The Commodore disk contains the various PET/CBM (Upgrade and 4.0 BASIC), VIC, and Commodore 64 versions.

Definitions
There are several concepts and terms which will be important to your understanding of the rest of the book.
ML programming, and programming in general for that matter, is a new discipline, a new art. There are few rules yet and few definitions. Words take on new meanings and are sometimes used haphazardly. For example, the word monitor means two entirely different things in current computerese: (1) a debugging program for machine language work or (2) a special TV designed to receive video signals from a direct video source like a computer.
Since there is no established vocabulary, some programming ideas are described by an imprecise cluster of words. When applied to machine language programming, the terms pointer, variable, register, vector, flag, and constant can all refer to the same thing. There are shades of difference developing which distinguish between these words, but as yet, nothing has really solidified. All these terms refer, in ML parlance, to a byte or two which the programmer sets aside in the source code. In BASIC, all these terms would be covered by the word variable.

Loose Lingo
Purists will argue that each of these words has a distinct, definable meaning. But then purists will always argue. The fact is that computing is still a young discipline and its lingo is still loose.
Some professors of BASIC like to distinguish between variables and constants, the latter meaning unchanging definitions like SCREEN = 1024. The address of the start of screen RAM is not going to vary; it's a constant.
In BASIC, something like SCORE = 10 would be a variable. The score might change and become 20 or whatever. At any rate, the word SCORE will probably vary during the execution of the program. In ML, such a variable would be set up as a two-byte reserved space within the source code:

100 SCORE BYTE 0 0

Then, anytime you ADC SCORE or ADC SCORE+1, you will add to the SCORE. That's a variable. The word pointer refers to those two-byte spaces in zero page which are used by Indirect Y addressing-like LDA (155),Y-and which serve to point to some other address in memory.
Register usually means the X or Y or Accumulator bytes within the 6502 chip itself. As generally used, the word register refers to something hard wired within the computer: a circuit which, like memory, can hold information. It can also refer to a programmer-defined, heavily used, single-byte variable within an ML program:

100 TEMP.BYTE 0

A vector is very much like a pointer. It stores a two-byte address but can also include the JMP instruction, forming a three-byte unit. If you have a series of vectors, it would be called a "jump table," and the Kernal in Commodore computers is such a table:

FFD2 JMP $F252
FFD5 JMP $A522
FFD8 JMP $11095

Thus, if you JSR $FFD2, you will bounce off the JMP into $F252, which is a subroutine ending in RTS. The RTS will send you back to your own ML code where you JSRed to the JMP table. That's because JMP leaves no return address, but JSR does.
A flag is a very limited kind of variable: It generally has only two states, on or off. In LADS, PRINTFLAG will send object code (defined below) to the printer if the flag holds any number other than zero. If the PRINTFLAG is down, or off, and holds a zero, nothing is sent to the printer. The word flag comes from the Status Register (a part of the internals of the 6502 chip). The Status Register is one byte, but most of the bits in that byte represent different conditions (the current action in an ML program resulted in a negative, a zero, a carry, an interrupt, decimal mode, or an overflow). The bits in the Status Register byte are, themselves, individual flags. ML programmers, however, usually devote an entire byte to the flags they use in their own programs. Whole bytes are easier to test.
Source code is what you type into the computer as ML instructions and their arguments:

100 * = 864
110 LDA #$0F ; THIS WILL PUT A 15 ($0F) INTO THE ACCUMULATOR
120 INY ; THIS RAISES THE Y REGISTER

After you type this in, you assemble it by turning control over to the LADS assembler after naming this as the source code. The result of the assembly is the object code. If you have the S pseudo-op on, causing the object code to print to the screen, you will see:

100 0360 A9 0F   LDA #$0F   ; THIS WILL PUT A 15 ($0F) INTO THE ACCUMULATOR
120 0362 C8   INY    ; THIS RAISES THE Y REGISTER

Properly speaking, the object code is the numbers which, taken together, form a runnable ML program. These numbers can be executed by the computer since they are a program. In the example above, the object code is A9 0F C8. That's the computer-understandable version of LDA #$0F: INY. It's generated by the assembler. An assembler translates source code into object code.
A complex assembler like LADS allows the programmer to use labels instead of numbers. This has several advantages. But it does require that the assembler pass through the source code twice. (When an assembler goes through source code, it is called a pass.) The first time through, the assembler just gathers all the label names and assigns a numeric value to each label. Then, the second time through the source code, the assembler can fill in all the labels with the appropriate numbers. It doesn't always know, the first time through, what every label means. Here's why:

100 LDA 4222
110 BEQ NOSCORE
120 JMP SOMESCORE
130 NOSCORE INX:JMP CONTINUE
140 SOMESCORE INY
150 CONTINUE LDA 4223

As you can see, the first time the assembler goes through this source code, it will come upon several labels that it doesn't yet recognize. When the assembler is making its first pass, the labels NOSCORE, SOMESCORE, and CONTINUE have no meaning. They haven't yet been defined. They are address-type labels. That is, they stand for a location within the ML program to which JMPs or branches are directed. Sometimes those jumps and branches will be forward in the code, not yet encountered.
The assembler is keeping track of all the addresses as it works its way through the source code. But labels cannot be defined (given their numeric value) until they appear. So on the first pass through the source code, the assembler cannot fill in values for things like NOSCORE in line 110. It will do this the second time through the source code, on the second pass. The first pass has a simple purpose: The assembler must build an array of label names and their associated numeric values. Then, on the second pass, the assembler can look up each label in the array and replace label names (when they're being used as arguments like LDA NAME) with their numeric value. This transforms the words in the source code into numbers in the object code and we have a runnable ML program. Throughout this book, we'll frequently have occasion to mention pass 1 or pass 2.

The Two Kinds of Labels
There are two kinds of labels in ML source code: equate and address labels. Equate labels are essentially indistinguishable from the way that variables are defined in BASIC:

100 INCOME = 15000

This line could appear, unaltered, in LADS or in a BASIC program. (Remember this rule about labels: Define your equate labels at the start of the source code. The LADS source code shows how this is done. The first part of LADS is called Defs and it contains all the equate definitions. This is not only convenient and good programming practice; it also helps the assembler keep things straight.)
The other kind of label is not found in BASIC. It's as if you can give a name to a line. In BASIC, when you need to branch to a subroutine, you must:

10 GOSUB 500
.
.
500 (the subroutine sits here)

that is, you must refer to a line number. But in LADS, you give subroutines names:

10 JSR RAISEIT; GOSUB TO THE RAISE-THE-Y-REGISTER-SUBROUTINE
.
.
500 RAISEIT INY; THE SUBROUTINE WHICH RAISES Y
510 RTS

This type of label, which refers to an address within the ML program (and is generally the target of JSR, JMP, or a branch instruction), is called an address-type label, or sometimes a PC-type label. (PC is short for Program Counter, the variable within the 6502 chip which keeps track of where we are during execution of an ML program. In LADS, we refer to the variable SA as the Program Counter-SA keeps track, for LADS, of where it is during the act of assembling a program.)
Subprogram is a useful word. LADS source code is written like a BASIC program, with line numbers and multiple-statement lines, and it's written in a BASIC environment. The source code is saved and loaded as if it were a BASIC program. But if you are writing a large ML program, you might write several of these source code "programs," saving them to disk separately, but linking them with the .FILE and END pseudo-ops into one big chain of source programs. This chain will be assembled by LADS into a single, large, runnable ML object program.
Each of the source programs, each link in this chain, is called a subprogram. In the source code which makes up LADS there are 13 such subprograms-from Defs to Tables-comprising the whole of LADS when assembled together. This book is largely a description of these subprograms, and some chapters are devoted to the explication of a single subprogram. To distinguish subprograms from subroutines and label names, the subprogram names (like Tables) have only their first letter capitalized. Subroutines and labels are all-caps (like PRINTFLAG).
The word integer means a number with no fraction attached. In the number 10.557, the integer is the 10 since integers have no decimal point. They are whole numbers. ML programs rarely work with anything other than integers. In fact, the integers are usually between 0 and 65535 because that's a convenient range within which the 6502 chip can operate-two bytes can represent this range of numbers. Of course, decimal fractions are not allowed. But virtually anything can be accomplished with this limitation. And if you need to work with big or fractional numbers, there are ways.
In any case, when we refer to integer in this book, we mean a number that LADS can manipulate, in a form that LADS can understand, a number which is a number and not, for example, a graphics code. For example, when you write LDA $15 as a part of your source code, the computer holds the number 15 in ASCII code form. In this printable form, 15 is held in the computer as the numbers $31 $35 which, when printed on the screen, provide the characters 1 and 5 (but not the true number 15). For the assembler to work with this 15 as the number 15, it must be transformed into a two-byte integer, an actual number. When translated, and put into two bytes, the characters 1 5 become: $0F 00. We'll see what this means, and how the translation is accomplished, in Chapter 5 where we examine the subprogram Valdec. It's Valdec's job to turn ASCII characters into true numbers.

The Seventh Bit (Really the Eighth)
For most of human history, we had to get along without the 0. It was a great leap forward for mankind when calculations could include the concept of nothing, zero. But now there's another mental leap to be made, a private adjustment to the way that computers use zero: They often start counting with a zero, something humans never do.
Imagine you are driving along and you've been told that your friend's new house is the third house in the next block.. You don't say "house zero, house one, house two, house three." It makes no sense (to us) to say "house zero." We always count up from 1.
But the computer often starts counting from zero. In BASIC, when you DIM (15) to dimension an array, it's easy to overlook the fact that you've really DIMed 16 items-the computer has created a zeroth item in this array.
It's sometimes important to be aware of this quirk. A number of programming errors result from forgetting that unnatural (or at least, nonhuman) zeroth item.
This situation has resulted in an unfortunate way of counting bits within bytes. It's unfortunate in two ways: Each bit is off by 1 (to our way of thinking) because there is a zeroth bit. And, to make things even tougher on us, the bits are counted from right to left. Quite a perversity, given that we read from left to right. Here's a diagram of the Status Register in the 6502 chip, each bit representing a flag:

     7 6 5 4 3 2 1 0 (bit number within the Status Register byte)
     N V - B D I Z C (flag name)

As a brief aside, let's quickly review the meanings of these flags. The flag names in the Status Register reflect various possible conditions following an ML event. For example, the LDA command always affects the N and Z flags. If you LDA #0, the Z flag will go up, showing that a zero resulted (but the N flag will go, or stay, down since the seventh bit isn't set by a zero). Here's what the individual flags mean: N (negative result), V (result overflowed), - (unused), B (BRK instruction used), D (decimal mode), I (interrupt disable), Z (result zero), C (carry occurred).
But in addition to the meanings of these flags in the Status Register, notice how bytes are divided into bits: count right to left, and start counting from the zeroth bit.
This is relevant to our discussion of LADS when we refer to bit 7. This bit has a special importance because it can signify several things in ML.
If you are using signed arithmetic (where numbers can be positive or negative), bit 7 tells you the sign of the number you're dealing with. In many character codes, a set (up) seventh bit will show that a character is shifted (that it's F instead of f). In the Atari, it means that the character is in inverse video. But a set seventh bit often signifies something.
One common trick is to use bit 7 to act as a delimiter, showing when one data item has ended and another begins. Since the entire alphabet can easily fit into numbers which don't require the seventh bit up (any number below 128 leaves the seventh bit down), you can set up a data table by "shifting" the first character of each data item to show where it starts. The data can later be restored to normal by "lowering" the shifted character. Such a table would look like this:

FirstwordSecondwordAnotherwordYetanother.

BASIC stores a table of all its keywords in a similar fashion, except that it shifts the final character of each word (enDstoPgotOgosuBinpuT...). Either way, shifted characters can be easily tested during a search, making this an efficient way to store data. Just be sure to remember that when we refer to the seventh bit, we're talking about the leftmost bit.

Springboard
In the 6502 chip instruction set, there aren't any instructions for giant branches. Some chips allow you to branch thousands of bytes away, but our chip limits us to 127 bytes in either direction from the location of the branch. Normally, this isn't much of a problem. You JSR or JMP when you want to go far away.
But as you assemble, you'll be making tests with BNE and BEQ and their cousins in the B group. Then, later, you'll add some more pieces of programming between the branch instruction and its target. Without realizing it, you'll have moved the target too far away from the branch instruction. It will be a branch out of range.
This is pretty harmless. When you assemble it, LADS will let you know. It will print a bold error message, print the offending line so you can see where it happened, and even ring a bell in case you're not paying attention. What can you do, though, when you have branched out of range? Use a springboard.
The easiest and best way to create a giant branch is this:

100 LDA 15
110 BEQ JTARGET
.
170 JTARGET JMP TARGET; THIS IS THE SPRINGBOARD
.
.
930 TARGET INY ; HERE IS OUR REAL DESTINATION FROM LINE 110

When you get a BRANCH OUT OF RANGE ERROR message, just create a false target. In LADS, the letter J is added to the real target name to identify these springboards (see line 170 above). All a springboard does is sit somewhere near enough to the branch to be acceptable. All it does is JMP to the true target. It's like a little trampoline whose only purpose is to bounce the program to the true destination of the branch.
One final note: To make it easy to locate programming explanations in the text of this book, all line numbers are in boldface. Most of the chapters in the book cover a single major subprogram. At the end of a chapter is the appropriate source code listing. It is these listings to which the boldface line numbers refer.
Now, let's plunge into the interior of the LADS assembler. We'll start with the equate labels, the definitions of special addresses within the computer.

Return to Table of Contents | Previous Chapter | Next Chapter