W. A. Bell
By now you probably have had your Atari® Computer for a few months, and have had a chance to put in some fairly large programs and tinker with and embellish them. You may have even written some programs of that type. If so, then you have undoubtedly wished for a renumber command. In fact, if you have used BASIC on other systems, then you have probably roundly cursed those programmers who left that facility out. Or you may have wanted to change the name of a variable to make it more self-documenting, but didn't know everywhere it occurred. This article will explore, in tutorial fashion, the structure of Atari BASIC programs as they are stored in memory. It will provide you some tools for doing more of your own exploring, and then show how you can put this type of information to use.
To begin our exploration inside BASIC, the program shown in Listing 1 is useful. It lets us peek around in memory to find things that are of interest. It will search memory from a specified starting address and tell you where it finds a string of characters or data you have specified, or it will find address pointers to a specified memory location. It will also let you dump memory in two formats, decimal or hexadecimal, and character. If your Atari is plugged in, it may help your understanding to follow along on your keyboard.
Do the following steps in direct mode:
NEW TESTVAR1=999 TESTVAR2=123456 TESTVAR3=98765432
Now enter the memory analysis utility program in Listing 1 (you may want to save it for future investigations). As an initial objective, let's try to find the following:
- Where the BASIC statements are stored
- Where variable names are stored
- Where variable values are stored
Let's start our search by seeing if we can find where the actual lines of the program are stored in memory. To do that, we RUN the memory analysis utility program, and request that it find the character string in the first REM statement (Line 10). To do that specify "5" for function required and enter the character search mode by responding with a "C". Then enter the character string "MEMORY ANALYSIS UTILITY." Be sure to request the dump in decimal this time. After the appropriate pause, a match should be found at address 2264 and you should see the first lines of comment.
At this point it should be explained that the article assumes throughout that you have a system without disk. For those of you with disk systems most of the addresses will be different, and there may be some variation in some of the commands, but the fundamental concepts remain the same. If you have trouble reproducing these results with a cassette system, it probably is because of differences in the sequence in which the program was entered, or errors in variable names. To resolve this you can do a LIST "C, a NEW, enter the variables again in direct mode, and do an ENTER "C.
Examining this more carefully, you will note that there are a few bytes in between the comments of each of the REM lines. After some study, you may note that the line numbers appear to start five bytes before each comment, at addresses 2259, 2288, etc. At this point you may wish to request another search, again with a decimal dump, looking for the character string "A DUMMY LINE" as listed in Line 256. The search will find a match at address 2847, and you will find that the value at address 2342 is now zero, but the next byte now has a value of one, where it previously was always zero. In fact the line number occupies two bytes, with the low order byte containing the low order bits, and the higher byte containing the high order eight bits. Thus the line number is 256 times the second byte plus the first byte, or 256*1+0=256. All binary 16-bit numbers in the Atari (and most 6502 processors) are stored in this fashion, including addresses. You may want to study lines 650 through 700 of Listing 1 to see how this type of number is manipulated.
To understand a little more of how this structure is laid out, try adding the following line to Listing 1.
Now request the dump function starting at address 2259. You will see that we now have Line Number one, followed by five bytes, and then Line Number 10. Looking at the Line one dump, we see the first two bytes represent the line number, while the next two bytes contain the value six. Byte number five contains a zero, and byte number six contains a 155, which from Appendix C of the Atari Basic Reference Manual is a RETURN or EOL character. You will note that the rest of the REM statements follow a similar format.
In fact we can now deduce that the third byte gives the length of the lines in bytes and by adding that to the address of the present line, we can find the next line. (Let's reserve study of the fourth byte until later). Similarly we can deduce that the fifth byte contains the equivalent of an opcode for the REM statement, while the EOL character signifies the end of the character string following the REM. This also conforms to the information in Chapter 11 of the BASIC Reference Manual under Item 2, where it states that each logical line requires six bytes of overhead.
With these facts in hand, let's leave the subject of BASIC statements for a moment, and see what we can observe about the other things we want to find.
Note that the second and third items are alluded to in the BASIC Reference Manual in Chapter 11, Item 3. The statement is made that a variable takes eight bytes plus the number of characters in the variable name the first time it is used, but that each subsequent reference takes only one byte. Thus the variable name and value cannot be stored in the BASIC statement.
Let's start the search for variable names by looking for the variable TESTVAR1 that we entered before we keyed in Listing 1. After typing RUN, specify a string search for the characters "TESTVAR." With an appropriate wait for the computer to find it, it should respond with an address of 2048 (decimal), and a dump of the surrounding area.
Examining the dump received, you will see the characters TESTVAR1 starting at the indicated address. However, note that the last character is in inverse video, or more precisely, that the high bit of the last character in the name has been set to a one. Following TESTVAR1, you will see the variable names TESTVAR2 and TESTVAR3, each with the last character in inverse video. You will also see the variables used in the program displayed in the same manner, each with the last character in inverse video.
Now specify an address pointer search for the address where the variable name table was found (2048), In this case several will probably be found, but the one of interest is the one found on memory Page 0 at address 130 and 131. (For those of you not familiar with the 6502 architecture and the significance of Page 0, you may want to refer to one of the excellent references on this subject.) One more problem with the variable name table remains. Since it is of variable length, depending on how many variables have been defined, and the length of each variable name, how do we know where the table ends?
A little deductive reasoning is in order. Remember that variables can only contain alphanumeric characters. Thus any non-alphanumeric character could be used as a flag for the end of the variable table. Looking at a dump starting at 2048, sure enough after the variable BYTE0 we see the value 0 (address 2122). Now doing an address pointer search for address 2122, we find such a pointer at 132 and 133 on memory Page 0. We can also do a search for an address pointer to the beginning of the program lines by specifying a search for an address pointer to address 2259 where we found the first line of the program. Again a reference will be found on Page 0, this time at address 136 and 137.
Let's review what we found so far. We have a variable name table stored from address 2048 to 2122, with a pointer to the beginning of the table stored at addresses 130 and 131, and a pointer to the end of the table at 132 and 133. We also have the program lines stored beginning at address 2259, and an address pointer at 136 and 137. So what do you suppose is stored in between the end of the variable name table and the beginning of the program lines?
To find out, let's do a dump starting with the byte after the end of the variable name table, or address 2123, in decimal. After doing so, nothing much jumps out at you - right! So let's try a dump in hex starting at the same address. This time, with some study you will find in order the hex characters 09 99, 12 34 56, and 98 76 54 32 interspersed with other data. Looks like we may have found the variable value table, doesn't it?
Let's study this dump a little closer. Looking at the other bytes, and remembering what Chapter 11 said about 8 bytes per variable, study the value of TESTVAR1. What you should see is:
00 00 41 09 99 00 00 00
Similarly for TESTVAR2 and TESTVAR3 we see:
00 01 42 12 34 56 00 00 and
00 02 43 98 76 54 32 00
Thus the structure of the variable value is such that it is stored in binary coded decimal (BCD) as a floating point number. The digits are stored left-justified in bytes four through eight of the 8-byte block, with the exponent stored in byte three. The exponent is defined such that for numbers greater than one, the exponent is from hex 40 to hex 7F, while for numbers less than one it will have a value from 00 to 3F. For negative numbers the high order bit will be set to one, or the exponent will range from 80 to FF. At this point you may want to end the dump program, change line 50 to assign a different set of values to the three variables, and then run a dump of this same area to see the changes.
Now that you have convinced yourself of the way numbers are stored, we still have a mystery or two to solve. What about byte two? Suppose that might be the variable number? Remember the statement in Chapter 11 about how additional references of a variable only take one byte. Seems that the only way to do that would be to assign a variable number. Also note that you are allowed a maximum of 127 different variables in a given BASIC program (see Chapter 1 of the Reference Manual). So the deduction that byte two of the 8-byte block is the variable number seems logical. Furthermore it gives a method of finding the variable name for such purposes as listing the program or operating in the direct mode.
Let's leave the use of the high order bit of byte two and the use of byte one of the 8-byte block to your investigation, with a couple of hints. Try examining the variables A$, B$ and HEX$. You may also want to define a numeric array in the direct mode and assign a set of values to it, and then dump its 8-byte block. One final step in this investigation is to try to find an address pointer to the variable value table. Specify a pointer to the address 2123, and we find that such an address pointer exists at 134 and 135 on Page 0 of memory.
Let's stop and summarize what we have learned at this point. FIGURE 1 is a visual depiction of the layout in memory of the address pointers on memory Page 0, the variable name table, the variable value table, and the program storage area.
At this point let's set our objective to create a full featured renumber utility. We have the fundamental information regarding memory layout and usage. The only additional data needed is to determine how line numbers are used in a program line. To investigate this, LISTING 2 has been developed. You can enter it at this point, either clearing the old program out, or leaving it at your option (if you have adequate memory).
The program in Listing 2 has been designed to let us dump a specific BASIC line. It will give us a decimal, hex, and character dump of any line we want. To digress for a moment, what we will get is a picture of the tokenized version of the BASIC line. This is the form used to store a program in the save mode. The list mode on the other hand stores the program just as you see it when you do a list to the screen or printer. Also note that a save operation will save the variable name table and the variable value table as well.
The intention is to decipher the internal structure of a BASIC line; since we want to generate a renumber utility, more specifically we want to see what those lines with line number references look like. Let's start with one of the most common line referencing statements, the GOTO. When the program in Listing 2 has been entered, add the line
10 GOTO 10
Then in direct mode type
Now request that the program find and dump Line 10. What you will see as a dump is:
Now change Line 10 to read
10 GOTO 123456
and with another GOTO 20000, the dump will read:
From the change that takes place, it is obvious that the referenced line number is stored in bytes seven through 12 of the line. Not only that, but also it is stored in exactly the same format as variable values are stored. You may want to try a few other values for the referenced line number to convince yourself.
We can also speculate that the opcode for the GOTO must be either byte five or byte six, or a combination of the two. Now let's see how BASIC lines with multiple statements are formatted. Again modify Line 10 as follows:
10 GOTO 999:GOTO 999:GOTO 999
and doing a GOTO 20000 we get the following dump:
From this we can conclude that bytes four, 13 and 23 are used to describe the length of a given statement in the line. More precisely, they are used to give the offset from the address of the line number to the next statement, and the last of these in a multi-statement line will always be the same as byte three of the line.
At this point we need to establish what statements use line number references. After studying the BASIC Reference Manual, the following types of statements can have a line number reference:
GOTO GOSUB ON () GOTO ON () GOSUB TRAP LIST RESTORE IF () THEN IF () THEN GOTO IF () THEN GOSUB
Taking each of these statements in order (entering the line number as shown, and then dumping it) we get the following results:
|3||ON Z GOTO 997, 998,999|
|4||ON Z GOSUB 997, 998, 999|
|8||IF Z THEN 999|
|9||IF Z THEN GOTO 999|
|10||IF Z THEN GOSUB 999|
From these dumps we now deduce that all line number references are preceded by a having the decimal value 14. Furthermore, the byte preceding the byte with a value of 14 will have one of the following values if a line number reference follows:
OPCODE STATEMENT 4 LIST 10 GOTO 12 GOSUB 13 TRAP 18 ON () 2nd, 3rd, etc. line references 23 ON () 1st line reference 24 ON () GOSUB 1st line reference 27 IF () THEN 35 RESTORE
In fact, it appears that the actual usage of the value 14 in a BASIC statement is to indicate that a BCD floating point constant follows. To see this, you may want to reload the program in Listing 1 and search for the decimal value 14. You should find that any occurrences in the program storage area, aside from line or statement lengths, precede a numeric constant.
With this information in hand, we now know enough to construct a Renumber utility. The basic algorithm is as follows:
1 - Find each line number reference
2 - Find the line that is referenced, and count the number of lines from the beginning
3 - Compute what the new line number will be
4 - Store that value as the new referenced line number
5 - When all line references have been set to their new value then do the actual renumbering of lines.
There remains a sticky implementation problem, since line numbers are stored as floating point numbers. (Why this approach was chosen by Atari remains a mystery - a binary format would have required two bytes instead of six, and no internal conversion.) Listing 3 demonstrates one technique for solving this problem, using the variable value table we found earlier. In this case, the location of the value for a specific variable (REFLINE) is established. That variable is used to store the new referenced line number when it is computed. Then that value is POKEd into the location for the line number reference.
Other more elegant solutions, requiring fewer statements, are possible, but they generally require some additional exploration of the structure of BASIC. At this point you will probably want to study Listing 3 along with its comments, and then enter it into your Atari. You should also note that this implementation of a renumber utility is not capable of renumbering itself. One other limitation is that the program will not deal with situations where variables are used as the line number reference. In such cases, you will have to follow the computational routines used to set the value of the line number reference, and either alter them appropriately, or else restore those line numbers to their original value after renumber has done its thing.
So how is such a program used? After the program has been entered, ready the tape recorder and, in the direct mode, type;
This will store the renumber utility on tape in a form so that it can be merged with other programs already in memory. (A CSAVE would be advisable, just for backup purposes.) First CLOAD a program you want to test the utility on. When that has finished, position the tape at the location where you started the List "C, and type:
When the renumber utility has been loaded, a list command will show that it has been merged in at the end of the program previously loaded.
Now type GOTO 3200 and watch the results. One more step of course, is saving the program once it has been renumbered. If you simply do a CSAVE, you will also store the renumber utility with your original program. To avoid doing that, (gobbling up all that precious memory, not to mention space on your tape) do the following:
Rewind the tape to where the list started and
You now have just the original program in its renumbered form, and it can be CSAVEd in the conventional manner.
We have been able to develop a utility to renumber BASIC programs using the information we have uncovered. We have also found several techniques for conserving memory, such as not using the IF THEN GOTO statement, as it uses two more bytes than IF THEN. Using a variable will also save over using a constant if it is used more than twice. And, of course, every statement put into a multiple statement line saves three bytes. There are several other functions that could be implemented: such as changing variable names; finding all references to a given variable; the deletion of blocks of lines; and renumbering selected lines of a program. Some of these ideas require additional digging to find all of the data necessary; others can he implemented with the things we know at this point.
Two problems exist at this point. The first is that utilizes such as that in Listing 3 require a good deal of memory - a precious commodity for most of us. The second is that, for programs of any significant size, the use of such a utility will take a considerable period of time. A future article will take what has been developed to date and convert some of the more complex functions to machine language subroutines. These subroutines will be general purpose in nature, so that they can also be used in implementing some of the functions in the previous paragraph. Happy PEEKing!
Memory Layout for Atari Basic Tables
Program 1: Memory analysis utility
Comments for Program 1
General: The underscore (_) is used to indicate that characters are to be entered in inverse video
|60||Required since a RUN command resets all variables to zero|
|90-190||Determine the function to be performed|
|210-610||Search memory for specified data|
|210-260||Determine if data input as character or decimal|
|270-350||Input of decimal data|
|360-380||Input of character data|
|410||Required to prevent match on BASIC input buffer|
|420-590||Actual search of memory|
|490-540||Match was found, dump memory at that point|
|630-750||Search for an address pointer|
|650-660||Convert to internal address format|
|680-730||Conduct the search, noting that addresses are stored low order byte, then high order byte|
|770-890||Dump specified area of memory|
|810-830||Dump a full screen of memory|
|920-1150||Subroutine to dump memory|
|950-1050||Dump one line (10 bytes) in hex or decimal|
|980-1000||Hex dump after converting to hex|
|1020-1040||Decimal dump with appropriate spacing|
|1050-1130||One line of character dump for same memory|
|1100-1110||Check for cursor control characters and substitute inverse video space|
|1170-1180||Subroutine to print patience message|
|1200-1250||Subroutine to determine if dump is in hex or decimal|
|1270-1280||Subroutine for input error|
Program 2: Basic Line Dump Utility
Comments for Program 2
General: The underscore (_) is used to indicate that characters are to be entered in inverse video
|20400||Constants used in hex conversion|
|20500-21100||Find the line the dump was requested for|
|20500||Find starting address of first line|
|20700||Compute line number of current line|
|21000||Compute address of next line|
|21300-21400||Set up to dump line|
|21500-23500||Dump one screen of memory|
|21500||Z is how many bytes to dump on this line|
Y is vertical position on screen
MAXADR is start of next line
|21700-23300||Dump Z bytes of memory|
|22700-22800||Dump byte in decimal|
|22900-23000||Dump byte in hex|
|23100-23200||Print character representation of byte - using POSITION avoids most of the problems with cursor movement except clear screen (Q=125)|
|23500||Test for full screen of dump|
|23600-23800||For lines that exceed a full screen|
|23900||Check for end of line|
Program 3: Renumber Utility
Comments for Program 3.
General: The underscore (_) is used to indicate that characters are to be entered in inverse video
The program requires 2319 bytes of memory in this form. To conserve memory, a number of lines could be deleted, eliminating some displays and error checking. These lines should be considered: 32095, 32180 through 32195, 32220, 32225, 32240, 32245, 32340 through 32350, and 32510. Smaller gains can also be made by converting the computation of line addresses and line numbers to subroutines, and by using shorter variable names.
|32025-32110||Find the address of the variable REFLINE, used to store the referenced line number|
|32030||Beginning of the variable name table|
|32045-32055||Is this the correct variable?|
|32070-32080||Yes, compute the address in the variable value table|
|32090-32110||No, search for the end of this variable (inverse video) and increment the variable number|
|32120-32165||Initialize other variables|
|32120-32145||Set up the array of opcodes which use line numbers|
|32170-32225||Count the number of lines and check to make sure they are in ascending order|
|32235-32245||Input the renumber parameters and check see if they will exceed the first line number of this program|
|32260-32460||Find each line number reference, and replace with the new line number|
|32260-32280||Compute address of line, line number, address of end of line, start of statement and end of statement|
|32285-32430||Process each BASIC statement in the line|
|32290||Test for a BCD constant|
|32300-32310||Check for line referencing opcode|
|32325-32335||Store referenced line number in variable REFLINE|
|32345||Check for nonsense line numbers (just in case)|
|32355-32385||Scan program to locate referenced line|
|32410-32425||Referenced line found so compute what the new line number will be and store in line|
|32435-32460||Check for end of line and update address pointers accordingly|
|32470-32505||Now compute the new line number for each line and store in the first two bytes of the line|
Return to Table of Contents | Previous Section | Next Section