A65 6502 Cross-Assembler

Copyright (c) 1986 William C. Colley, III

https://git.sr.ht/~ndiddy/a65

Legal Note

This package may be used for any commercial or non-commercial purpose. It may be copied and distributed freely provided that any fee charged by the distributor of the copy does not exceed the sum of:

  1. the cost of the media the copy is written on,
  2. any required costs of shipping the copy, and
  3. a nominal handling fee.
Any other distribution requires the written permission of the author. Also, the author's copyright notices shall not be removed from the program source, the program object, or the program documentation.

How to Use the Cross-Assembler Package

First, the question, "What does a cross-assembler do?" needs to be addressed as there is considerable confusion on this point. A cross-assembler is just like any other assembler except that it runs on some CPU other than the one for which it assembles code. For example, this package assembles 6502 source code into 6502 object code, but it runs on an 8080, a Z-80, an 8088, or whatever other CPU you happen to have a C compiler for. The reason that cross-assemblers are useful is that you probably already have a CPU with memory, disk drives, a text editor, an operating system, and all sorts of hard-to-build or expensive facilities on hand. A cross-assembler allows you to use these facilites to develop code for a 6502.

This program requires one input file (your 6502 source code) and zero to three output files (the listing, the object, and the exports). The input file MUST be specified, or the assembler will bomb on a fatal error. The listing and object files are optional. The export file must be specified if the source code contains any EXP pseudo-ops. If no listing file is specified, no listing is generated, and if no object file is specified, no object is generated. If the object file is specified, the object is written to this file in absolute binary format.

The command line for the 6502 cross-assembler looks like this:

a65 source_file { -l list_file } { -o object_file } { -e export_file }

where the { } indicates that the specified item is optional.

The order in which the source, listing, object, and export files are specified does not matter. Note that no default file name extensions are supplied by the assembler as this gives rise to portability problems.

Format of Cross-Assembler Source Lines

The source file that the cross-assembler processes into a listing and an object is an ASCII text file that you can prepare with whatever editor you have at hand. The most-significant (parity) bit of each character is cleared as the character is read from disk by the cross-assembler, so editors that set this bit (such as WordStar's document mode) should not bother this program. All printing characters, the ASCII TAB character ($09), and newline character(s) are processed by the assembler. All other characters are passed through to the listing file, but are otherwise ignored.

The source file is divided into lines by newline character(s). The internal buffers of the cross-assembler will accommodate lines of up to 255 characters which should be more than ample for almost any job. If you must use longer lines, change the constant MAXLINE in file a65.h and recompile the cross-assembler. Otherwise, you will overflow the buffers, and the program will mysteriously crash.

Each source line is made up of three fields: the label field, the opcode field, and the argument field. The label field is optional, but if it is present, it must begin in column 1. It may optionally be terminated with a colon character. The opcode field is optional, but if it is present, it must not begin in column 1. If both a label and an opcode are present, one or more spaces and/or TAB characters must separate the two. If the opcode requires arguments, they are placed in the argument field which is separated from the opcode field by one or more spaces and/or TAB characters. Finally, an optional comment can be added to the end of the line. This comment must begin with a semicolon which signals the assembler to pass the rest of the line to the listing and otherwise ignore it. Thus, the source line looks like this:

{label}{ opcode{ arguments}}{;commentary}

where the { } indicates that the specified item is optional. Some examples are in order:

column 1
   |
   v
   GRONK   LDA   OFFSET, X       ; This line has everything.
           STA   MAILBOX         ; This line has no label.
   BEEP:                         ; This line has no opcode.
   ; This line has no label and no opcode.

   ; The previous line has nothing at all.
           END                   ; This line has no argument.

Labels

A label is any sequence of alphabetic or numeric characters starting with an alphabetic. The legal alphabetics are:

& , ? [ \ ] ^ _  ` { | }  ~  A-Z  a-z

The numeric characters are the digits 0-9. Note that "A" is not the same as "a" in a label. This can explain mysterious U (undefined label) errors occurring when a label appears to be defined.

A label is permitted on any line except a line where the opcode is IF, ELSE, or ENDIF. The label is assigned the value of the assembly program counter before any of the rest of the line is processed except when the opcode is EQU, ORG, or SET.

Labels can have the same name as opcodes, but they cannot have the same name as operators or registers. The reserved (operator and register) names are:

A         AND       EQ        GE        GT        HIGH
LE        LT        LOW       MOD       NE        NOT
OR        SHL       SHR       X         XOR       Y

If a label is used in an expression before it is assigned a value, the label is said to be "forward-referenced." For example:

L1   EQU  L2 + 1   ; L2 is forward-referenced here.
L2
L3   EQU  L2 + 1   ; L2 is not forward-referenced here.

Starting a label with a period will concatenate it in the symbol table with the last label that did not start with a period. For example, in this code:

WriteMem:
      ldx      #10
.loop:
      lda      source,x
      sta      dest,x
      dex
      bpl      .loop
      rts

".loop" will be added to the symbol table as "WriteMem.loop". As you can see in the example, the label is accessible inside the same "scope" as ".loop", but outside the "scope" (after the next label that doesn't start with a period), you'll have to refer to it as "WriteMem.loop".

Numeric Constants

Numeric constants can be formed in two ways: the Intel convention or the Motorola convention. The cross-assembler supports both.

An Intel-type numeric constant starts with a numeric character (0-9), continues with zero or more digits (0-9, A-F), and ends with an optional base designator. The base designators are H for hexadecimal, none or D for decimal, O or Q for octal, and B for binary. The hex digits a-f are converted to upper case by the assembler. Note that an Intel-type numeric constant cannot begin with A-F as it would be indistinguishable from a label. Thus, all of the following evaluate to 255 (decimal):

0ffH   255   255D   377O   377Q   11111111B

A Motorola-type numeric constant starts with a base designator and continues with a string of one or more digits. The base designators are $ for hexadecimal, none for decimal, @ for octal, and % for binary. As with Intel-type numeric constants, a-f are converted to upper case by the assembler. Thus, all of the following evaluate to 255 (decimal):

$ff   255   @377   %11111111

If a numeric constant has a value that is too large to fit into a 16-bit word, it will be truncated on the left to make it fit. Thus, for example, $123456 is truncated to $3456.

String Constants

A string constant is zero or more characters enclosed in either single quotes (' ') or double quotes (" "). Single quotes only match single quotes, and double quotes only match double quotes, so if you want to put a single quote in a string, you can do it like this: "'". In all contexts except the DB and DS statements, the first character or two of the string constant are all that are used. The rest is ignored. Noting that the ASCII codes for "A" and "B" are $41 and $42, respectively, will explain the following examples:

"" and ''           evaluate to $0000
"A" and 'A'         evaluate to $0041
"AB"                evaluates to $4142

Note that the null string "" is legal and evaluates to $0000.

Expressions

An expression is made up of labels, numeric constants, and string constants glued together with arithmetic operators, logical operators, and parentheses in the usual way that algebraic expressions are made. Operators have the following fairly natural order of precedence:

Highest        anything in parentheses
               unary +, unary -
               *, /, MOD, SHL, SHR
               binary +, binary -
               LT, LE, EQ, GE, GT, NE
               NOT
               AND
               OR, XOR
Lowest         HIGH, LOW

A few notes about the various operators are in order:

  1. The remainder operator MOD yields the remainder from dividing its left operand by its right operand.
  2. The shifting operators SHL and SHR shift their left operand to the left or right the number of bits specified by their right operand.
  3. The relational operators LT, LE, EQ, GE, GT, and NE can also be written as <, <= or =<, =, >= or =>, and <> or ><, respectively. They evaluate to $FFFF if the statement is true, 0 otherwise.
  4. The logical opeators NOT, AND, OR, and XOR do bitwise operations on their operand(s).
  5. HIGH and LOW extract the high or low byte of an expression.
  6. The special symbol * can be used in place of a label or constant to represent the value of the program counter before any of the current line has been processed.

Some examples are in order at this point:

2 + 3 * 4                          evaluates to 14
(2 + 3) * 4                        evaluates to 20
NOT %11110000 XOR %00001010        evaluates to %00000101
HIGH $1234 SHL 1                   evaluates to $0024
@001 EQ 0                          evaluates to 0
@001 = 2 SHR 1                     evaluates to $FFFF

All arithmetic is unsigned with overflow from the 16-bit word ignored. Thus:

32768 * 2                          evaluates to 0

Pseudo Opcodes

Unlike 6502 opcodes, pseudo opcodes (pseudo ops) do not represent machine instructions. They are, rather, directives to the assembler. These directives require various numbers and types of arguments. They will be listed individually below.

Pseudo-ops -- ALIGN

The ALIGN pseudo-op pads the object file with zeroes until the program counter is divisible by its parameter. For example, the following statement will pad the object file until the program counter is divisible by $4000:

ALIGN      $4000

Pseudo-ops -- BASE

The BASE pseudo-op will set the assembly program counter to a specific value without padding the file. This is useful when targeting platforms that have memory banking. For example, the following statement will set the program counter to $8000:

BASE      $8000

Note that unlike with the ORG pseudo-op, it's allowable to BASE backwards from the current assembly program counter.

Pseudo-ops -- DATE

The DATE pseudo-op inserts the date the file was assembled (in your computer's local time) as a NUL-terminated ASCII string. Regardless of your computer's locale, the date will always be in abbreviated month, day, 4-digit year format (e.g. "Feb 19 2023").

Pseudo-ops -- DB

The DB (Define Bytes) pseudo-op allows arbitrary bytes to be spliced into the object code. Its argument is a chain of one or more expressions or string constants separated by commas. Any expressions must evaluate to -128 thru 255. The sequence of bytes $FE, $FF, $00, $01, $02 could be spliced into the code with the following statement:

DB        -2, -1, 0, 1, 2

The NUL-terminated string "nyaa~" could be spliced into the code with the following statement:

DB        "nyaa~",0      ; This is 6 bytes of code.

Pseudo-ops -- DW

The DW (Define Word) pseudo-op allows 16-bit words to be spliced into the object code. Its argument is a chain of zero or more expressions separated by commas. The word is placed into memory low byte in low address, high byte in high address as per standard MOS Technology order. The sequence of bytes $FE $FF $00 $00 $01 $02 could be spliced into the code with the following statement:

DW        $FFFE, $0000, $0201

Pseudo-ops -- END

The END pseudo-op tells the assembler that the source program is over. Any further lines of the source file are ignored and not passed on to the listing. If end-of-file is encountered on the source file before an END statement is reached, the assembler will add an END statement to the listing and flag it with a * (missing statement) error.

Pseudo-ops -- EQU

The EQU pseudo-op is used to assign a specific value to a label, thus the label on this line is REQUIRED. Once the value is assigned, it cannot be reassigned by writing the label in column 1, by another EQU statement, or by a SET statement. Thus, for example, the following statement assigns the value 2 to the label TWO:

TWO       EQU       1 + 1

The expression in the argument field must contain no forward references.

Pseudo-ops -- EXP

The EXP pseudo-op is used to add the specified symbol as a constant in the export file. This is usable as a workaround for A65's lack of a relocating linker. For example, this source file:

      ORG      $6500
      EXP      InfiniteLoop

InfiniteLoop:
      JMP      InfiniteLoop
would generate this export file:
; Autogenerated export file - do not modify!

InfiniteLoop      equ      $6500

The EXP pseudo-op will throw a fatal error if no export file was specified (otherwise this could mess up your build process).

Pseudo-ops -- IF, ELSE, ENDI

These three pseudo-ops allow the assembler to choose whether or not to assemble certain blocks of code based on the result of an expression. Code that is not assembled is passed through to the listing but otherwise ignored by the assembler. The IF pseudo-op signals the beginning of a conditionally assembled block. It requires one argument that may contain no forward references. If the value of the argument is non-zero, the block is assembled. Otherwise, the block is ignored. The ENDI pseudo- op signals the end of the conditionally assembled block. For example:

IF   EXPRESSION     ;  This whole thing generates
FCB  $01, $02, $03  ;  no code whatsoever if
ENDI                ;  EXPRESSION is zero.

The ELSE pseudo-op allows the assembly of either one of two blocks, but not both. The following two sequences are equivalent:

          IF   EXPRESSION
          ... some stuff ...
          ELSE
          ... some more stuff ...
          ENDI

TEMP_LAB  SET  EXPRESSION
          IF   TEMP_LAB NE 0
          ... some stuff ...
          ENDI
          IF   TEMP_LAB EQ 0
          ... some more stuff ...
          ENDI

The pseudo-ops in this group do NOT permit labels to exist on the same line as the status of the label (ignored or not) would be ambiguous.

All IF statements (even those in ignored conditionally assembled blocks) must have corresponding ENDI statements and all ELSE and ENDI statements must have a corresponding IF statement.

IF blocks can be nested up to 16 levels deep before the assembler dies of a fatal error. This should be adequate for any conceivable job, but if you need more, change the constant IFDEPTH in file a65.h and recompile the assembler.

Pseudo-ops -- INCB

The INCB (Include Binary) pseudo-op is used to insert the contents of a file as a series of bytes into the current file at assembly time. The name of the file to be included is specified as a normal string constant, for example:

INCB      "fridge_gfx.bin"

Pseudo-ops -- INCL

The INCL pseudo-op is used to splice the contents of another file into the current file at assembly time. The name of the file to be INCLuded is specified as a normal string constant, so the following line would splice the contents of file "const.def" into the source code stream:

INCL      "const.def"

INCLuded files may, in turn, INCLude other files until four files are open simultaneously. This limit should be enough for any conceivable job, but if you need more, change the constant FILES in file a65.h and recompile the assembler.

Pseudo-ops -- MSG

The MSG pseudo-op is used to print arbitrary strings and/or expression results to the console at assembly time. For example, adding the following line at the end of the program would print out the amount of free ROM space (assuming the labels are inserted at the correct spots):

MSG       "Free bytes: ", VectorTable-EndCode

Pseudo-ops -- ORG

The ORG pseudo-op is used to set the assembly program counter to a particular value. The expression that defines this value may contain no forward references. The default initial value of the assembly program counter is $0000. The following statement would change the assembly program counter to $F000:

ORG       $F000

The first ORG statement will specify the starting address of the binary file (e.g. where it's mapped in memory). Any subsequent ORG statements will pad the binary file up to the given address. Attempting to ORG "backwards" (to an address before the program counter) will cause an error.

If a label is present on the same line as an ORG statement, it is assigned the new value of the assembly program counter.

Pseudo-ops -- PAGE

The PAGE pseudo-op always causes an immediate page ejection in the listing by inserting a form feed ('\f') character before the next line. If an argument is specified, the argument expression specifies the number of lines per page in the listing. Legal values for the expression are any number except 1 and 2. A value of 0 turns the listing pagination off. Thus, the following statement cause a page ejection and would divide the listing into 60-line pages:

PAGE      60

Pseudo-ops -- RMB

The RMB (Reserve Memory Bytes) pseudo-op is used to reserve a block of storage for program variables, or whatever. This storage is not initialized in any way, so its value at run time will usually be random. The argument expression (which may contain no forward references) is added to the assembly program counter. The following statement would reserve 10 bytes of storage called "STORAGE":

STORAGE   RMB       10

Pseudo-ops -- SET

The SET pseudo-op functions like the EQU pseudo-op except that the SET statement can reassign the value of a label that has already been assigned by another SET statement. Like the EQU statement, the argument expression may contain no forward references. A label defined by a SET statement cannot be redefined by writing it in column 1 or with an EQU statement. The following series of statements would set the value of label "COUNT" to 1, 2, then 3:

COUNT     SET       1
COUNT     SET       2
COUNT     SET       3

Pseudo-ops -- TITL

The TITL pseudo-op sets the running title for the listing. The argument field is required and must be a string constant, though the null string ("") is legal. This title is printed after every page ejection in the listing, therefore, if page ejections have not been forced by the PAGE pseudo-op, the title will never be printed. The following statement would print the title "Random Bug Generator -- Ver 3.14159" at the top of every page of the listing:

TITL      "Random Bug Generator -- Ver 3.14159"

Assembly Errors

When a source line contains an illegal construct, the offending filename and line number are printed to stderr. The line is also flagged in the listing with a single-letter code describing the error. The meaning of each code is listed below. In addition, a count of the number of lines with errors is kept and printed on the C "stderr" device (by default, the console) after the END statement is processed. If more than one error occurs in a given line, only the first is reported. For example, the illegal label "=$#*'(" would generate the following listing line:

L  0000   FF 00 00      =$#*'(     CPX       #0

Error * -- Illegal or Missing Statement

This error occurs when either:

  1. the assembler reaches the end of the source file without seeing an END statement, or
  2. an END statement is encountered in an INCLude file.

If you are "sure" that the END statement is present when the assembler thinks that it is missing, it probably is in the ignored section of an IF block. If the END statement is missing, supply it. If the END statement is in an INCLude file, delete it.

Error ( -- Parenthesis Imbalance

For every left parenthesis, there must be a right parenthesis. Count them.

Error " -- Missing Quotation Mark

Strings have to begin and end with either " or '. Remember that " only matches " while ' only matches '.

Error A -- Illegal Addressing Mode

This error occurs if an addressing mode is specified in the argument field that is not legal with the opcode in the opcode field.

Error B -- Branch Target Too Distant

The 6502 relative branch instructions will only reach -128 to +127 bytes from the first byte of the instruction following the branch instruction. If this error occurs, the source code will have to be rearranged to shorten the distance to the branch target address or a long branch instruction that will reach anywhere (JMP) will have to be used.

Error D -- Illegal Digit

This error occurs if a digit greater than or equal to the base of a numeric constant is found. For example, a 2 in a binary number would cause a D error. Especially, watch for 8 or 9 in an octal number.

Error E -- Illegal Expression

This error occurs because of:

  1. a missing expression where one is required
  2. a unary operator used as a binary operator or vice-versa
  3. a missing binary operator
  4. a SHL or SHR count that is not 0 thru 15

Error I -- IF-ENDI Imbalance

For every IF there must be a corresponding ENDI. If this error occurs on an ELSE or ENDI statement, the corresponding IF is missing. If this error occurs on an END statement, one or more ENDI statements are missing.

Error L -- Illegal Label

This error occurs because of:

  1. a non-alphabetic in column 1
  2. a reserved word used as a label
  3. a missing label on an EQU or SET statement
  4. a label on an IF, ELSE, or ENDI statement

Error M -- Multiply Defined Label

This error occurs because of:

  1. a label defined in column 1 or with the EQU statement being redefined
  2. a label defined by a SET statement being redefined either in column 1 or with the EQU statement
  3. the value of the label changing between assembly passes

Error O -- Illegal Opcode

The opcode field of a source line may contain only a valid machine opcode, a valid pseudo-op, or nothing at all. Anything else causes this error.

Error P -- Phasing Error

This error occurs because of:

  1. a forward reference in a EQU, ORG, RMB, or SET statement
  2. a label disappearing between assembly passes

Error R -- Illegal Register

This error occurs either when the register designator A or B is used with a machine opcode that does not permit it, or when the register designator is missing with a machine opcode that requires it.

Error S -- Illegal Syntax

This error means that an argument field is scrambled. Sort the mess out and reassemble.

Error T -- Too Many Arguments

This error occurs if there are more items (expressions, register designators, etc.) in the argument field than the opcode or pseudo-op requires. The assembler ignores the extra items but issues this error in case something is really mangled.

Error U -- Undefined Label

This error occurs if a label is referenced in an expression but not defined anywhere in the source program. If you are "sure" you have defined the label, note that upper and lower case letters in labels are different. Defining "LABEL" does not define "Label."

Error V -- Illegal Value

This error occurs because:

  1. an index offset is not 0 thru 255
  2. an 8-bit immediate value is not -128 thru 255
  3. a DB argument is not -128 thru 255
  4. an ORG statement is attempting to seek backwards in the file
  5. an INCL argument refers to a file that does not exist

Warning Messages

Some errors that occur during the parsing of the cross- assembler command line are non-fatal. The cross-assembler flags these with a message on the C "stdout" device (by default, the console) beginning with the word "Warning." The messages are listed below:

Warning -- Illegal Option Ignored

The only options that the cross-assembler knows are -e, -l, and -o. Any other command line argument beginning with - will draw this error.

Warning -- -e Option Ignored -- No File Name

Warning -- -l Option Ignored -- No File Name

Warning -- -o Option Ignored -- No File Name

The -e, -l, and -o options require a file name to tell the assembler where to put the listing file or object file. If this file name is missing, the option is ignored.

Warning -- Extra Source File Ignored

The cross-assembler will only assemble one file at a time, so source file names after the first are ignored. To assemble a second file, invoke the assembler again. Note that under CP/M-80, the old trick of reexecuting a core image will NOT work as the initialized data areas are not reinitialized prior to the second run.

Warning -- Extra Listing File Ignored

Warning -- Extra Object File Ignored

The cross-assembler will only generate one listing and one object file per assembly run, so -l and -o options after the first are ignored.

Fatal Error Messages

Several errors that occur during the parsing of the cross- assembler command line or during the assembly run are fatal. The cross-assembler flags these with a message on the C "stdout" device (by default, the console) beginning with the words "Fatal Error." The messages are explained below:

Fatal Error -- No Source File Specified

This one is self-explanatory. The assembler does not know what to assemble.

Fatal Error -- No Export File Specified

This error is thrown if your code includes the "EXP" pseudo-op but you ran the assembler without specifying an export file to write your exports to.

Fatal Error -- Source File Did Not Open

The assembler could not open the source file. The most likely cause is that the source file as specified on the command line does not exist. On larger systems, there could also be privilege violations. Rarely, a read error in the disk directory could cause this error.

Fatal Error -- Listing File Did Not Open

Fatal Error -- Object File Did Not Open

This error indicates either a defective listing or object file name or a full disk directory. Correct the file name or make more room on the disk.

Fatal Error -- Error Reading Source File

This error generally indicates a read error in the disk data space. Use your backup copy of the source file (You do have one, don't you?) to recreate the mangled file and reassemble.

Fatal Error -- Disk or Directory Full

This one is self-explanatory. Some more space must be found either by deleting files or by using a disk with more room on it.

Fatal Error -- File Stack Overflow

This error occurs if you exceed the INCLude file limit of four files open simultaneously. This limit can be increased by increasing the constant FILES in file A65.H or A65C.H and recompiling the cross-assembler.

Fatal Error -- If Stack Overflow

This error occurs if you exceed the nesting limit of 16 IF blocks. This limit can be increased by increasing the constant IFDEPTH in file A65.H or A65C.H and recompiling the cross- assembler.

Fatal Error -- Too Many Symbols

Congratulations! You have run out of memory. The space for the cross-assembler's symbol table is allocated at run-time using the C library function alloc(), so the cross-assembler will use all available memory. The only solutions to this problem are to lessen the number of labels in the source program, to use a larger memory model (MSDOS/PCDOS systems only), or to add more memory to your machine.