# Chapter Ten Assembly Language Math

As an Atari assembly language programmer, you probably won't ever have to write many, if any, ultrasophisticated, multiprecision arithmetical programs. If you ever have to write a program that includes a lot of multiprecision math, your Atari can help you. It has a pretty powerful set of arithmetical programs, called Floating Point, or FP routines, built right into its operating system. The folks at Atari have taken care to provide you with the means of using these OS routines in your own assembly language programs. They've provided instructions on how to use the Atari FP package in a number of publications, including De Re Atari, a manual published by Atari for assembly language programmers.

Even if you don't want to use the FP package built into your Atari (and there are reasons not to; the routines are slow), you can find prewritten code for most kinds of sophisticated arithmetical operations, often called multiple precision binary operations, in a number of manuals on 6502 assembly language programming. One text that's packed with multiple precision programs that you can simply type into your computer and use is 6502 Assembly Language Subroutines, written by Lance A. Leventhal and Winthrop Saville and published by Osborne/McGraw Hill.

## Then Why Bother?

You may ask why we are bothering to include a chapter on advanced 6502 arithmetic in this volume. The answer: no matter how much help is available, you still have to know the principles of advanced 6502 arithmetic if you want to become a good assembly language programmer. So even though you may never have to write an assembly language routine that will perform long division on signed numbers, accurate to 17 decimal places, chances are pretty good that you'll eventually have to use some arithmetic operations in some programs.

Most assembly language programmers occasionally have to write an addition or subtraction routine, or a routine that will multiply or divide a pair of numbers, or a program that will deal with signed or BCD (Binary Coded Decimal) numbers. Logical operations, which are extensively used in 6502 programs, also fall under the heading of assembly language math. In this chapter, therefore, we'll be reviewing 8-bit and 16-bit binary addition, subtraction and multiplication, and also saying a few words about binary long division. We'll wind up the chapter with brief introductions to signed numbers and the BCD (Binary Coded Decimal) number system.

In the addition problem that you called from BASIC in Chapter 8, you saw how the carry bit of the processor status works in 16-bit addition operations. Now we're going to review the use of the carry bit in addition problems, and we're also going to take a look at how carries work in subtraction, multiplication and division problems.

## A Close Look at the Carry Bit

The best way to get a close look at how the carry bit works is to look at it through an "electronic microscope" at the bit level. Look at these two simple 4-bit hexadecimal and binary addition problems in their binary and hexadecimal forms, and you'll see clearly how neither addition operation generates a carry in either binary or hexadecimal notation.

```HEXADECIMAL    BINARY

04            0100
+ 01          + 0001
-----------   --------
05            0101

08            1000
+ 03          + 0011
-----------   --------
0B            1011
```

Now let's look at a couple of problems that use larger (8-bit) numbers. The first of these two problems doesn't generate a carry, but the second one does.

```

8E            1000 1110
+ 23          + 0010 0011
------------  -----------
B1            1011 0001

8D            1000 1101
+ FF          + 1111 1111
------------  -----------
18C        (1) 1000 1100
```

Note that the sum in the second problem is a 9-bit number - 1 1000 1100 in binary, or 18C in hexadecimal notation. Here's an assembly language program that will perform that very same addition problem. Type it into your computer and run it, and you'll be able to see how the carry flag in your computer works:

### 8-BIT Addition With a Carry

```10	*=\$0600
20	CLD
30	CLC
40	LDA #\$80
60	STA \$CB
70	RTS
```

When you've typed this program, assemble it and then run it by activating your assembler's debugger and using the "G" command. When the program has been executed, and while your debugger is still turned on, type the command "DCB" (for "Display memory location \$CB"). You should then see this kind of line displayed on your video screen:

```O0CB 8C 00 00 00 00
```

That line shows us that memory address \$CB now holds the number \$8C, the correct sum of the numbers we added, except for the carry. So where's the carry? Well, if what you've read in this book about the carry bit is true, it must be in the carry bit of your computer's P register. As our program is written now, there's no easy way to find out whether the carry bit from our addition operation has been dumped into the carry bit of the P register. But by adding a couple of lines to the program, and running it again, we can find out. Here's how to rewrite the program so we can check the carry bit:

```10	*=\$0600
20	CLD
30	CLC
40	LDA #\$8D
60	PHP
70	STA \$CB
80	PLA
90	AND #01
100	STA \$CC
110	RTS
```

In this rewrite of our original program, we've used one new stack manipulation instruction: PHP. PHP means "PusH Processor status (P register) on stack." We've also used the AND operator introduced in Chapter 9. In addition, we've used one stack manipulation instruction that was introduced a couple of chapters ago: PLA, which means "PulL Accumulator from stack."

The instruction PHP is used in line 60 of our rewritten program. It appears there because we want to save the contents of the P register as soon as the numbers #\$8D and #\$FF have been added. We can use the instruction PHP without any fear that it will do anything terrible to our program, since it is an instruction that doesn't affect the contents of either the P register or the accumulator, but it does affect the stack pointer.

When you run this program, the first thing it will do is add the literal numbers \$8D and \$FF. Before it stores the result of this calculation anywhere, however, it pushes the contents of the status register onto the stack, using the instruction PHP. When that operation is complete, the value in the accumulator (still the sum of \$8D and \$FF, with no carry), is stored in memory address \$CB. Next, in line 80, when almost everything else in the program has been done, the value that was pushed onto the stack by the PHP instruction back in line 60 is removed from the stack. Then, since the only flag in the P register that we're interested in is the carry flag (Bit 0), we have used an AND operation to mask out every bit of the number just pulled from the stack except Bit 0. Finally, the resulting number - which should be \$01 if our operations up to now have worked - is stored in memory address \$CC.

Now, at any time we like, we can peek into the memory address and see what the result of the calculation in the program was (without a carry). Then we can peer into memory address \$CC and take a look at just what the status of the P register was just after we added the numbers \$8D and \$FF. So let's do it! Assemble the program, execute it using your debugger's "G" command, and then use the command "DCB" to take a look at the contents of memory address \$CB and the memory locations that follow. Here's what you should see:

```00 CB 8C 01 00 00 00
```

That line tells us two things: that memory address \$CB does hold the number \$8C, the result of our calculation, without a carry and that our addition of #\$8D and #\$FF did indeed set the carry bit of the processor's status register.

We will now take a look at a program that will add two 16-bit numbers. The same principles used in this program can also be used to write programs that will add numbers having 24 bits, 32 bits, and more. Here's the program:

### A MULTIPLE PRECISION ADDITION PROGRAM

```10 ;
20 ;THIS PROGRAM ADDS A 16-BIT NUMBER IN \$B0 AND \$B1
30 ;TO A 16-BIT NUMBER IN \$CO AND \$C1
40 ;AND DEPOSITS THE RESULTS IN \$C2 AND \$C3
50 ;
60 *=\$0600
65 ;
70 CLD
80 CLC
90 LDA \$B0  ;LOW HALF OF 16-BIT NUMBER IN \$B0 AND \$B1
100 ADC \$C0 ;LOW HALF OF 16-BIT NUMBER IN \$C0 AND \$C1
110 STA \$C2
120 LDA \$B1 ;HIGH HALF OF 16-BIT NUMBER IN \$B0 AND \$B1
130 ADC \$C1 ;HIGH HALF OF 16-BIT NUMBER IN \$C0 AND \$C1
140 STA \$C3
150 RTS
```

When you look at this program, remember that your Atari computer stores 16-bit numbers in reverse order - high byte in the second address, and low byte in the first address. Once you understand that fluke, 16-bit binary addition isn't hard to comprehend. In this program, we first clear the carry flag of the P register. Then we add the low byte of a 16-bit number in \$BO and \$B1 to the low byte of a 16-bit number in \$CO and \$C1. The result of this calculation is then placed in memory address \$C2. If there is a carry, the P register's carry bit will be set automatically.

In the second half of the program, the high byte of the number in \$BO and \$B1 is added to the high byte of the number in \$CO and \$C1. If the P register's carry bit has been set as a result of the preceding addition operation, then a carry will also be added to the high bytes of the two numbers being added. Then the result of this half of our program will be deposited into memory address \$C3. When that operation is completed, the results of our addition problem will be stored, low byte first, in memory addresses \$C2 and \$C3.

## 16-Bit Subtraction

Here's a 16-bit subtraction program:

```10 ;
20 ;THIS PROGRAM SUBTRACTS A 16-BIT NUMBER IN \$B0 AND \$B1
30 ;FROM A 16-BIT NUMBER IN \$C0 AND \$C1
40 ;AND DEPOSITS THE RESULTS IN \$C2 AND \$C3
50
60  *=\$0600
65 ;
70 CLD
80 SEC ;SET CARRY
90 LDA \$00 ;LOW HALF OF 16-BIT NUMBER IN \$C0 AND \$C1
100 SBC \$B0 ;LOW HALF OF 16-BIT NUMBER IN \$B0 AND \$B1
110 STA\$C2
120 LDA \$C1 ;HIGH HALF OF 16-BIT NUMBER IN \$C0 AND \$C1
130 SBC \$B1 ;HIGH HALF OF 16-BIT NUMBER IN \$B0 AND \$B1
140 STA \$C3
150 RTS
```

Since subtraction is the exact opposite of addition, the carry flag is set, not cleared, before a subtraction operation is performed in 6502 binary arithmetic. In subtraction, the carry flag is treated as a borrow, not a carry, and it must therefore be set, not cleared, so that if a borrow is necessary, there'll be a value to borrow from. After the carry bit is set, a 6502 subtraction problem is quite straightforward. In our sample problem, the 16-bit number in \$B0 and \$B1 is subtracted, low byte first, from the 16-bit number in \$C0 and \$C1. The result of our subtraction problem (including a borrow from the high byte, if one was necessary) is then stored in memory addresses \$C2 and \$C3.

## Binary Multiplication

There are no 6502 assembly language instructions for multiplication or division. To multiply a pair of numbers using 6502 assembly language, you have to perform a series of addition operations. To divide numbers, you have to perform subtraction sequences. Here is an example of how two 4-bit binary numbers can be multiplied using the principles of addition:

```        0110 (\$06)
X 0101 (\$05)
________________

0110
0000
0110
0000
__________________

0011110 [\$1E]
```

Notice what happens when you work this problem. First, 0110 is multiplied by 1. The result of this operation, also 0110, is written down.

## What Happens Next

Next, 0110 is multiplied by 0. The result of that operation, a string of zeros, is shifted one space to the left and written down. Then 0110 is multiplied by 1 again, and the result is once again shifted left and written down. Finally, another multiplication by zero results in another string of zeros, which are also shifted left and duly noted. Once that's done, all of the partial products of our problem are added up, just as they would be in a conventional multiplication problem. The result of this addition, as you can see, is the final product \$1E.

This multiplication technique works fine, but it's really quite arbitrary. Why, for example, did we shift each partial product in this problem to the left before writing it down? We could have accomplished the same result by shifting the partial product above it to the right before adding. In 6502 multiplication, that's exactly what's often done; instead of shifting each partial product to the left before storing it in memory, many 6502 multiplication algorithms shift the preceding partial product to the left before adding it to the new one.

## Multiple Precision Multiplication

We're now going to present a program that will show you how that works.

### A MULTIPLE PRECISION MULTIPLICATION PROGRAM

```
10 MPR=\$C0 ;MULTIPLIER
20 MPD1=\$C1 ;MULTIPLICAND
30 MPD2=\$C2 ;NEW MULTIPLICAND AFTER 8 SHIFTS
40 PRODL=\$C3 ;LOW BYTE OF PRODUCT
50 PRODH=\$C4 ;HIGH BYTE OF PRODUCT
60 ;
70 *=\$0600
80 ;
85 ;THESE ARE THE NUMBERS WE WILL MULTIPLY
87 ;
90  LDA #250
100  STA MPR
110  LDA #2
120  STA MPD1
130 ;
140 MULT CLD
150  CLC
160  LDA #O ;CLEAR ACCUMULATOR
170  STA MPD2 ;CLEAR ADDRESS FOR SHIFTED MULTIPLICAND
180  STA PRODL ;CLEAR LOW BYTE OF PRODUCT ADDRESS
190  STA PRODH ;CLEAR HIGH BYTE OF PRODUCT ADDRESS
200  LDX #8 ;WE WILL USE THE X REGISTER AS A COUNTER
210 LOOP LSR MPR ;SHIFT MULTIPLIER RIGHT; LSB DROPS INTO CARRY BIT
230  CLC
240  LDA PRODL
260  STA PRODL ;RESULT IS NEW LOW BYTE OF PRODUCT
270  LDA PRODH ;LOAD ACCUMULATOR WITH HIGH BYTE OF PRODUCT
290  STA PRODH ;RESULT IS NEW HIGH BYTE OF PRODUCT
300  NOADD ASL MPD1 ;SHIFT MULIPLICAND LEFT; BIT 7 DROPS INTO CARRY
310  ROL MPD2 ;ROTATE CARRY BIT INTO BIT 7 OF MPD2
320  DEX ;DECREMENT CONTENTS OF X REGISTER
330  BNE LOOP ;IF RESULT ISN'T ZERO, JUMP BACK TO LOOP
340  RTS
350  .END
```

## A Complex Procedure

As you can see, 8-bit binary multiplication isn't exactly a snap. There's a lot of left and right bit-shifting involved, and it's hard to keep track of. In the above program, the most difficult manipulation to follow is probably the one involving the multiplicand (MPD1 and MPD2). The multiplicand is only an 8-bit value, but it's treated as a 16-bit value because it keeps getting shifted to the left, and while it is moving, it takes a 16-bit address (actually two 8-bit addresses) to hold it.

To see for yourself how the program works, type it out on your keyboard and assemble it. Then use the "G" command of your debugger to execute it. Then, while you're still in the DEBUG mode, you can type "DC3" (for "Display \$03), and take a look at the contents of memory addresses \$C3 and \$C4, which should hold the 16-bit product of the decimal number 2 and the decimal number 250, which the program is supposed to multiply. The value in \$C3 and \$C4 should be \$01F4, displayed low byte first, the hex equivalent of decimal 500, the correct product.

## Not the Ultimate Multiplication Program

Although the program we've just outlined works fine, there are many algorithms for binary multiplication, and some of them are shorter and more efficient than the one just presented. The following program, for example, is much shorter than our first example, and therefore more memory efficient and faster running. One of its neatest tricks is that is uses the 6502's accumulator, rather than a memory address, for temporary storage of the problem's results.

### AN IMPROVED MULTIPLICATION PROGRAM

```10 ;
20 PRODL=\$C0
25 PRODH=\$C1
30 MPR=\$C2
40 MPD=\$C3
50 ;
60 *=\$0600
70 ;
80 VALUES LDA #10
90  STA MPR
100  LDA #10
110  STA MPD
120 ;
130  LDA #0
140  STA PRODL
150  LDX #8
160 LOOP LSR MPR
180   CLC
210  ROR PRODL
220  DEX
230  BNE LOOP
235  STA PRODH
240  RTS
250  .END
```

## Another Test

If you wish, you can test out this improved multiplication program the same way you tested the previous one: by executing it using your debugger's "G" command, and then taking a look at its result using the "D" command.

## A Different Command

You should type "DCO" this time, since the product in this problem is stored in \$C0 and \$C1. The 16-bit value in \$C0 and \$C1 should be \$0064 (stored low byte first), the hexadecimal equivalent of decimal 100, and the answer to this problem.

## Feel Free to Play

You can play around with these two multiplication problems as much as you like, trying out different values and perhaps even calling the programs up from BASIC, the way we did our 16-bit addition problem a few chapters ago. The best way to become intimately familiar with how binary multiplication works, though, is to do a few problems by hand, using those two tools of our forefathers, a pencil and a piece of paper. Work enough binary multiplication problems on paper, and you'll soon begin to understand the principles of 6502 multiplication.

## Multiprecision Binary Division

It's unlikely that you'll ever have an occasion to write a multiprecision binary long division program. And even if the need should arise, you'd probably have no use for the limited program and explanation we could publish here.

## Nevertheless...

Still, this chapter would not be complete without an example of a binary long division program. So here is a simple (but tricky) program for dividing a 16-bit dividend by an 8-bit divisor. The result is an 8-bit quotient.

## A Tricky Program

This program is even more subtly designed than the multiplication program we presented a few paragraphs ago. During the execution of the program, the high part of the dividend is stored in the accumulator and the low part of the dividend is stored in a variable called DVDL. The program contains a lot of shifting, rotating, subtracting, and decrementing of the X register. When it ends, the quotient is in a variable labeled QUOT and the remainder is in the accumulator. That's true until line 380 when the remainder is moved out of the accumulator and into a variable called RMDR. Then, finally, an RTS instruction ends the program.

### A SIMPLE DIVISION PROGRAM

```10 ;
20 ;DIVISION.SRC
30 ;
40  *=\$0600
50 ;
60 DVDL=\$C0 ;LOW PART OF DIVIDEND
70 DVDH=\$C1 ;HIGH PART OF DIVIDEND
80 QUOT=\$C2 ;QUOTIENT
90 DIVS=\$C3 DIVISOR
100 RMDR=\$C4 ;REMAINDER
110 ;
120  LDA #\$1C ;JUST A SAMPLE VALUE
130  STA DVDL
140  LDA #\$02 ;THE DIVIDEND IS NOW \$021C
150  STA DVDH
160  LDA #\$05 ;ANOTHER SAMPLE VALUE
170  STA DIVS ;WE'RE DIVIDING BY 5
180  ;
190  LDA DVDH ;ACCUMULATOR WILL HOLD DVDH
200  LDX #08 ;FOR AN 8-BIT DIVISOR
210  SEC
220  SBC DIVS
230 DLOOP PHP ;THE LOOP THAT DIVIDES
240  ROL QUOT
250  ASL DVDL
260  ROL A
270  PLP
290  SBC DIVS
300  JMP NEXT
320 NEXT DEX
330  BNE DLOOP
340  BCS FINI
360  CLC
370 FINI ROL QUOT
380 STA RMDR
390 RTS ;ENDIT
```

## Not the Ultimate Division Program

As complex as this program appears, it is not by any means the world's best binary long division routine. It isn't the most accurate division program you'll ever see, and it won't handle fractions, decimal points, very long numbers, or signed numbers. If a versatile, accurate multiprecision division program is what you need, you'll have to look toward the floating point package built into your Atari's operating system.

The Atari floating point package is not easy to use, but more or less complete instructions on how to use it can be found in the Atari programmer's guidebook De Re Atari. If you decide not to use your computer's FP package, you can take a look at the many division and other arithmetic routines that are included in many 6502 assembly language manuals and "cookbooks." Quite a few arithmetic routines that are yours for the asking are published in manuals such as the excellent text 6502 Assembly Language Subroutines by Leventhal and Saville (Berkeley: Osborne/McGraw-Hill, 1982).

## Signed Numbers

Before we move on to the next chapter, there are two more topics that we should briefly cover: signed numbers and BCD (Binary Coded Decimal) numbers. First we'll talk about signed numbers. Arithmetic operations cannot be performed on signed numbers using the techniques that have been described so far in this chapter. However, if some slight modifications are made in those techniques, the 6502 chip in your Atari computer is capable of adding, subtracting, multiplying and dividing signed numbers. If you want to perform arithmetic operations on signed numbers, the first thing you'll have to know is how to represent their signs. Fortunately, that isn't difficult to do. To represent a signed number in binary arithmetic, all you have to do is let the leftmost bit (bit 7) represent a positive or negative sign. In signed binary arithmetic, if bit 7 of a number is zero, the number is positive. If bit 7 is a 1, the number is negative.

Obviously, if you use one bit of an 8-bit number to represent its sign, you no longer have an 8-bit number. What you then have is a 7-bit number or, if you want to express it another way, you have a signed number that can represent values from -128 to +127 instead of from 0 to 255. It should also be obvious that it takes more than the redesignation of a bit to turn unsigned binary arithmetic operations into signed binary arithmetic operations. Consider, for example, what we would get if we tried to add the numbers +5 and -4 by doing nothing more than using bit 7 as a sign:

```   0000 0101 (+5)
+ 1000 0100 (-4)
________________
1000 1001 (-9)
```

That answer is wrong. The answer should be 1. The reason we arrived at the wrong answer is that we tried to solve the problem without using a concept that is fundamental to the use of signed binary arithmetic: the concept of complements.

Complements are used in signed binary arithmetic because negative numbers are complements of positive numbers. And complements of numbers are very easy to calculate in binary arithmetic. In binary math, the complement of a 0 is a 1, and the complement of a 1 is a 0. It might be reasonable to assume, therefore that the negative complement of a positive binary number could be arrived at by complementing each 0 in the number to a 1, and each 1 to a 0 (except for bit 7, of course, which must be used for the purpose of representing the number's sign). This technique of calculating the complement of a number by flipping its bits from 0 to 1 and from 1 to 0 has a name in assembly language circles. It's called one's complement.

To see if the one's complement technique works, let's try using it to add two signed numbers, say +8 and -5.

```  0000 1000 (+8)
+ 1111 1010 (-5) (one's complement)
________________
0000 0010 (+2) (plus carry)
```

Oops! That's wrong, too! The answer should be plus 3. Well, that takes us back to the drawing board. One's complement arithmetic doesn't work.

But there's another technique, which comes very close to one's complement, that does work. It's called two's complement, and it works like this: first calculate the one's complement of a positive number. Then simply add one. That will give you the two's complement, the true complement, of the number. Then you can use the conventional rules of binary math on signed numbers -- and, if you don't make any mistakes, they'll work every time. Here's how:

```  0000 0101 (+5)
+ 1111 1000 (-8) (two's complement)
________________
1111 1101 (-3)
```

Here's another two's complement addition problem:

```  1111 1011 (-5) (two's complement)
+ 0000 1000 (+8)
________________
0000 0011 (+3) (plus carry)
```

As we said, it works every time. Unfortunately, it's not easy to explain why. There are some lovely mathematical proofs, and if you're interested in what they are, you can find them in numerous textbooks on the theory of binary numbers. At the moment, though, the most important thing to know about two's complement arithmetic is how to use it, should the need ever arise.

## Using the Overflow Flag

There's one more important fact to remember about signed binary arithmetic: when you add signed numbers, you use the overflow (V) flag rather than the carry flag to carry numbers from one byte to another. The reason for this is as follows: The carry flag of the P register is set when there's an overflow from bit 7 of a binary number. But when the number is a signed number, bit 7 is the sign bit -- not part of the number! So the carry flag cannot be used to detect a carry in an operation that involves signed numbers. You can solve this problem by using the overflow bit of the processor status register. The overflow bit is set when there is an overflow from bit 6, not bit 7. So it can be used as a carry bit in arithmetic operations on signed numbers.

## BCD (Binary Coded Decimal) Numbers

Another variety of binary arithmetic that it might be helpful to know something about is the BCD (Binary Coded Decimal) system. In BCD notation, the digits 0 through 9 are expressed just as they are in conventional binary notation, but the hexadecimal digits A through F (1010 through 1111 in binary) are not used. Long numbers must therefore be represented differently in BCD notation than they are in conventional binary notation. The decimal number 1258, for example, would be written in BCD notation as:

```           1         2         5         8

0000 0001 0000 0010 0000 0101 0000 1000
```

In conventional binary notation, the same number would be written as:

```        \$0   \$4   \$E   \$A

0000 0100 1110 1010
```

This which equates to \$04EA, or the hexadecimal equivalent of 1258. BCD notation is often used in bookkeeping and accounting programs because BCD arithmetic, unlike straight binary arithmetic, is 100% accurate. BCD numbers are also sometimes used when it is desirable to print them out instantly, digit by digit as they are being used -- for example, when numbers are being used for on screen scorekeeping in a game program.

The main disadvantage of BCD numbers is that they tend to be difficult to work with. When you use BCD numbers, you must be extremely careful with signs, decimal points and carry operations, or chaos can result. You must also decide whether you want to use an 8-bit byte for each digit, which wastes memory, since it really only takes 4 bits to encode a BCD digit, or whether to "pack" two digit into each byte, which saves memory but consumes processing time.

Fortunately, as we have pointed out, you'll probably never have to use most of the programming techniques described in this chapter, but an understanding of how they work will definitely make you a better Atari assembly language programmer.