8080  to Z80  Source  Translation   6 Jun  82   ver
0.02

-- Overview --
     This   program  translates  8080  opcodes  and
operands (source code) to their  Z80 equivalent. It
also  does  certain upper/lower  case  translation,
and  formatting of  the  output  file.  The program
consists  of  6 modules, four of which are provided
in  source code. The  two I/O  modules are specific
to the TRS80  model  I, and  are  not included. The
routines  in  these two modules  are  explained  in
sufficient  detail  to allow  their  implementation
for another system. The program  has  been designed
to  handle  correctly  virtually  all  valid  input
conditions.  There are, however, certain cases that
will have to be  fixed by  using an editor program.
For  example  this  program  adds  colons  to   all
labels, even those on EQU instructions.
-- Modules included --
     The following modules are provided:
- SRCTR - program  mainline,  handling of comments,
output formatting and support routines
-  TXT8Z8  -  translate  record  text  (opcode  and
operands) from 8080 to Z80
- OPCDTB - opcode table and lookup routine
- OPRND  - routines to transliterate operands
     These  modules  are  written  in Z80  code  to
compile  and link as-is using Microsoft's M80 macro
compiler  and L80 linking  loader. The code  should
be  compatible   with  any  assembler  using  Zilog
opcodes, but  some minor changes  may  be required.
With  the  exception  of  the code in  the mainline
that  is  concerned with setting  up filenames, and
the length  of  the  FCB/DCB's,  all  code provided
should be compatible with CP/M.
-- Instruction syntax --
     The  program  assumes  the   following   input
syntax. Every record may start  with  a linenumber,
which may have the  high-order bit  of  each  digit
set   on   (the   format   used   by    Microsoft's
editor/assembler).  Following the line number there
may be  a special  tab  character with its h/o  bit
on. These digits and tab characters will  be copied
to  output if present. Neither need  to  be present
for any one record. Between  records there may be a
page marker (8CH)  character. There may also  be  a
file  header,  consisting  of  the  character  0D3H
followed by  a  six  character  program name. There
may or may not be a 1AH  character  as end of  file
marker.  All  these characters  are copied  to  the
output file if present, but may be absent.
     The  first non-digit  (other  than the special
tab 89H) in each record  is treated as the start of
text  for  that   record.  Every  record  must   be
terminated by  a 0DH carriage return. Blanks  (20H)
and  regular  tabs (09H)  are treated  as delimiter
characters around the  opcode  and  after operands.
Multiple operands must be separated by a  comma, no
other  character   is   accepted.  The   semi-colon
character  (;)  signals  the start  of  comments in
that  record. Comments are terminated only by a  CR
(0DH) as  end of record. A semi-colon that  is part
of  a  quoted  string does not signal the  start of
comments. Only the  apostrophe (27H)  is recognized
as opening and closing 'quote' for quoted strings.
     If  the  first  character  in  the  record  is
non-blank (and not  a semi-colon), it is assumed to
be  a  label. (This program will not handle opcodes
starting in column  one.)  The end of  the label is
signalled  by a blank or tab, or by the last of one
or  more colons  (:). (M80  requires colons on  all
labels  except those  on EQU, and uses  2 colons as
one  way to identify  external symbols.) The opcode
is  preceded  by  one or more  blanks/tabs  or by a
label, and terminated  by  one or more blanks/tabs,
a semi-colon (start of  comment), or  a CR  (end of
record).  Operands  must  be preceded  by an opcode
and followed by a blank/tab, semicolon or CR.
     On  machine  instructions  there may be  zero,
one  or  two  operands.   Two   operands   must  be
separated by  a comma. Only pseudo opcodes, such as
DB, may have more than two operands (i.e.  multiple
commas).  The number of  operands must  be  correct
for the  8080  opcode involved. The program  checks
the operand syntax only  on a few  instructions, on
others operands are copied without inspection.
     If  the  input   file  does  not  follow  this
syntax, SRCTR wil  probably  not properly translate
the file, specially not operands.
-- Output formatting --
     Alphabetic  characters in labels  and  opcodes
are translated to  upper case.  Bit flag options in
the program  can be  set to change this and several
other options.  Alpha  characters  in  operands are
translated to  upper  case only  if the  opcode was
recognized  (opcode  table), and  the  character is
not within a quoted string.
     Alphabetic   characters    in   comments   are
translated  to  lower  case  (with option  flags to
change),  subject   to  the  following.  The  first
character  in  a  comment  (after  the  ;)  is  not
changed (fine  for upper to lower translation,  not
for lower  to  upper).  Conversion  stops  (for the
current record's comments  only) if a character  is
read that is already in the proper case.
     Tabs  and  blanks preceding and following  the
opcode  on  input  are  removed,  as  are  tabs and
blanks   following    operands   after   recognized
opcodes.
     On  output   a  tab  character   is   inserted
preceding the  opcode (after  the  label  if  any),
unless  the label extends  to or past the first tab
position  (position 8).  In  that case  a  blank is
used.
     A blank rather than  a tab is inserted between
the  opcode  and  the operands.  This  is both  for
reasons  of  personal  preference  and  because  it
makes the listing more compact.  It  allows  column
24 (3rd tab)  to be  used  as  the  standard  start
position for comments, rather than 32 (4th tab).
     Following  the  operands  (if  any), tabs  are
inserted if there is a  comment, to  start comments
in the  standard  position  (set  up  as  24).  Tab
characters  (other  than  those  within   a  quoted
string)  that   are  put   out  past  the  standard
comments position,  are  translated to  blanks.  If
the  previous  character put to  the output file is
also a blank, such translated tabs  are  completely
surpressed.
     Where  a  comment  is  preceded  only  by  tab
characters  and/or  blanks, the program outputs  as
many tab  characters as  it found tabs on input, up
to  a  maximum.  This  maximum   is  based  on  the
standard comments  position  (24) divided by the  8
characters each tab character is equivalent to.
     When  a comment consists of only a semi-colon,
this   program  drops  that  semi-colon.  This   is
typically  used  for  blank   lines   inserted  for
spacing purposes. M80  accepts  lines consisting of
a  linenumber  only. SRCTR does not delete any tabs
or blanks preceding  such a solitary semi-colon. If
the  file  is  reprocessed,  such  null tabs/blanks
will be dropped, unless they follow an  opcode that
SRCTR does not recognize.
     Following every label the  program puts  out a
colon character,  unless the label  already has one
or more colons. It will do  this even for labels on
EQU (where M80 does not accept colons). The  latter
will have to be removed using a text editor.
-- Opcode translation --
     The  program  uses a table that  contains  all
8080 opcodes  for machine  instructions. The  table
is in opcode  sequence.  All  machine  instructions
(as   opposed  to   the  pseudo  opcode   assembler
instructions  such as  TITLE,  DB or DEFB) are four
characters or less. The pseudo opcodes DB, DM,  DS,
DW, SETL,  DEFB, DEFM, DEFS, DEFW and DEFL are also
included  in  the  table.  Opcodes  (recognized  as
opcodes  from the syntax)  longer  than 4 charaters
are  marked as not  found, as are shorter  ones not
matched in the table. Matching  is  done  in  upper
case,  regardless  of opcode  case  translation  on
output.
     Machine opcodes  that  are  found in the table
are on  output  replaced  by  their Z80 equivalent.
Pseudo opcodes Dx may  be  translated to  DEFx,  or
vice versa. This  is an option (bit flag)  that  is
independent of the machine opcode translation.
-- Operand translation --
     For conditional calls, returns and  jumps, the
second  and  third (if present)  character  of  the
8080  opcode  are  used  to generate  the condition
specification for  their Z80  equivalent.  E.g. CPO
... becomes CALL PO,... and  RNZ translates to  RET
NZ.
     Bit flags  in  the  opcode  table identify how
the   operands  are  to  be  translated.  Where  no
translation of  operands is  required, all bits are
off. The  following bits  are provided  (see OPCDTB
for the specific bit numbers used):
- Z80 opcode only - not used at present,
- Translate condition -  used for conditional call,
return  and  jump  opcodes  to  identify  that  the
condition information  must  be extracted from  the
8080 opcode,
-  M operand allowed -  the M for memory is allowed
as  the  first  (or  only)  operand,  translates to
(HL); if this bit is not set, an M used  as operand
will be copied to  output as  is  (except  for case
translation possibly),
- M  may  be  first  or second operand - as  above,
when M is allowed on second operand, e.g. for MOV,
- Translate  register pair - this  instruction  may
have  a  register  pair  as  its  first  (or  only)
operand; single byte operands B, D and  H translate
to  DC,  DE  and   HL,  the  3  byte   operand  PSW
translates  to  AF;  anything  else  (including SP)
remains unchanged,
- Register pair may be  second operand - as  above,
used for DAD and LDAX, (cant rightly remember why)
-  Add A as  1st operand  - the 8080 opcode implies
register  A  as  the  first  operand  (e.g. ADD  or
LDAX); 'A,' is  generated  as output,  followed  by
the input operand (after its translation if any),
- Add  A  as 2nd operand - the 8080 opcode  implies
register A on  second base (e.g. STA  or OUT); ',A'
is  generated as  output,  after  the inut  operand
(translated as required),
- Add HL  as first operand -  as above, for example
DAD,
-  Add  HL as second operand - as  above, used  for
SHLD only,
-  Add brackets to first operand -  used on  memory
reference  instruction  such  as LDA  and  SHLD  to
indicate  reference  is   to  content  rather  than
address, also used on OUT,
- Add brackets to second operand - as  above,  used
on STA, IN and others,
-  DEFx  or Dx pseudo  opcode - used to distinguish
these from machine instructions,
-  Extended  translate  - the  opcodes  PCHL,  RST,
SPHL, XCHG  and  XTHL  require  operand  adjustment
quite unlike any other  opcode; for RST  the  input
operand  must  be multiplied by 8,  for the  others
there  is  no input operand, for each  of these the
output operand is always the same.
-- Z80 to Z80 translation --
     Since   SRCTR   does   file   formatting   and
upper/lower  case translation as  well  as 8080  to
Z80  opcode  and  operand  translation,  it  may be
useful to  process a Z80  source file. This  can be
done only if those Z80 opcodes that  look like 8080
opcodes  are  first   modified  so  they  are   not
recognized. The Z80 opcodes  involved are ADC, ADD,
CP,  IN, JP,  OUT,  RST and SUB. I normally use the
text  editor to change  them  by adding  an unusual
combination of  characters, e.g. ADD to ADD?*. Then
after  processing  with SRCTR it  is very  easy  to
remove all ?*, (and colons on EQU labels).
     SRCTR will still find CALL,  Dx, DEFx, DI, EI,
NOP, POP and PUSH in its table, but  their operands
will   not  be   adjusted  other  than   for   case
translation.   (Operands    on    the   8080   CALL
instruction are never translated, nor are those  of
Dx and DEFx; DI,  EI and NOP do not  have operands.
POP  and PUSH would be changed only if they had  B,
D, H or  PSW as operand.) No  case translation will
occur  for  the  operands  of  other  opcodes  (all
not-found). Case  translation  on comments and file
formatting will take place.
-- I/O routines --
     The following input output  routines are used.
Their  source  code  is  not  provided.  They   are
specific  to the  TRS80 Model  I, and would require
drastic changes to  run on an  other  machine. Here
is  a  functional  description  of  each  of  these
routines.
     When a program  gets  control from TRSDOS (and
its   equivalents),  register  HL>  points  to  the
second character  following  the  program  name  in
DOS's  comand  buffer, or  to  the first  character
following if that is  a CR (i.e.  no  operands). If
the program is invoked as:

SRCTR:3 src8080/asm to srcz80/mac:1 <CR>

(where SRCTR is  the program name, :3 the drive  no
(equivalent to D:); src8080/asm  is  the input file
and  output file srcz80/mac must be  placed  on the
second drive),  HL>  will point to src8080...  etc.
It  is   up  to  the   program  to  interpret   the
parameters and extract any filenames etc.
     Other things  to  note  are  that under TRSDOS
(etc),  file   control  blocks  are  32  bytes  and
buffers  256 bytes  (256 byte physical  sectors  on
disk).  There  is no DMA (direct memory access)  on
disk I/O  on the  TRS  model  I,  and  no  Set  DMA
address, rather  the buffer address  is  passed  to
DOS's open routine  and  is  stored in the  FCB for
that file.
     OPENS  extracts  the next  filename  from  the
string pointed  at  by HL>,  sets it up in  the FCB
whose address is  passed in  <DE>, and calls  DOS's
open  routine, passing the FCB address in  <DE> and
the  buffer  address in <HL>. By  OPENS  convention
the buffer address used is the FCB  address plus 33
(32 byte FCB and  one  byte options).  The  file is
opened  for  single  byte  I/O. OPENS  handles  the
display of error  messages  (based on  DOS's return
code  in <A>) a zero  return code  is  open ok. The
nature of the problem  is signalled by the specific
code in <A> (codes 1 to 63).
     OPENS  in  my  implementation  will  also   do
things like display the filename it  extracts,  ask
for  a  new name  on  an  open  error,  and  a  few
optional  features  like   that.  These   are   not
material to the use made of OPENS in this program.
     When  OPENS returns control it must  return in
<A>  a  00H for successful open, anything  else for
an open failure. With  the mainline code as written
(from label LP  upto  label PROC or  lines 15300 to
20100  and  subroutine  TSTTO  at  lines  71600  to
74900),  OPENS must return  HL>  to  point  in  the
parameter string past the name of the file  it just
opened (in  our  example  to  the  blank  following
src8080/asm). The  routine TSTTO  then  tests for '
to ' between the  filenames,  with  multiple blanks
allowed.
     OPENS  is next used (with a different FCB), to
open  the  output file,  as  described  above.  The
current  address  in  <HL>  is  saved  to  look for
further files  after processing  the  current pair.
As  coded it  is  possible  to  invoke  SRCTR  with
multiple  file  pairs (limited  only by the size of
DOS's command buffer), e.g.:
SRCTR infileA to outfileA   infileB  to  outfileB -
infileC to outfile C <CR>
The dash  (-) has special meaning  to my version of
OPENS. It  signals a wait  with the message  'Press
<ENTER>  to  continue'.  This allows  switching  of
diskettes  (TRSDOS does  not  require  a reboot  to
switch  diskettes, even  to  write to  such a newly
inserted diskette).
     CLOSES  closes  the  file  for the  FCB  whose
address is passed in <DE>.
     GETCS gets one single  character from the file
whose FCB  address is passed in <DE>. The character
read  is  returned  in  <A>.  When  DOS  returns  a
non-zero return code to GETCS, it does  not  return
control via the  regular return,  but rather  jumps
to the routine EOFRTS defined in  SRCTR. This makes
it  unnecessary  to  test  for  EOF  and  exception
conditions everwhere  GETCS is used. DOS recognizes
EOF  from information it  maintains in  the  file's
directory block  and FCB  (sector and relative byte
address  of  EOF),  rather  than   from  a  special
character.  (A   far  more  general  approach  than
CP/M's  1AH, which works fine for text  files,  but
not for COM or REL files etc.  In general, TRSDOS's
disk and  directory  organization is  more flexible
and more powerful than CP/M's.)
     When  the  routine at EOFRTS gets control, <A>
contains  00H for a regular end  of  file, non-zero
for any  other  condition. On entry  to EOFRTS  the
current stackpointer content is undefined.  This is
why SP is immediately restored from SPSAV.
     PUTCS puts the single character  passed in <A>
to the file whose  FCB  address  is passed in <DE>.
If an  error  occurs  PUTCS passes  control  to the
routine  PUTERS,  analogous  to   the  GETCS/EOFRTS
approach.
     DISPS displays the  character  in  <A> on  the
screen  at the  current cursor position. DISPS also
scans the  keyboard  and  returns any  key pressed,
01H  (control-A) for Break, 00H if no  key pressed.
In  my implementation  DISPS also  handles variable
speed scrolling.
     TXTDIS  uses  routine  DISPS  to  display  the
string whose address is  passed  in HL>. The string
must  be  terminated  by  00H. TXTDIS  returns  HL>
pointing  to  the  character  following   this  00H
(ideal for in-line messages - see routine XMSG).
     The  routine WAITS  waits for  any  key to  be
pressed. It returns the key pressed in <A>.
     DISERR  displays   the  error   message  whose
address is  passed  in  <HL> (terminated  by  00H),
preceded  by <graphics  symbol>  <CR>  '***  ', and
followed by ' *** - Press  any key'. It then  waits
until  a  key  is pressed.  This ensures that error
messages do not scroll of the screen unseen.  In my
implementation,   DISERR   also  erases  the  error
message after  a key has been pressed, and restores
the  cursor  to  first  position  occupied  by  the
message  (watch  for  scrolling of the screen),  so
the display can continue where it left off.
     DISERC  is  identical  to   DISERR,  but  also
display the hex value for the  character  passed in
<A>. This is displayed  following the main message,
and before the ' *** - Press ...'.
-- Other implementation notes --
     The two  byte  bit  flags  field  STUS  (whose
address  is   maintained   in  IX   throughout  the
program) controls most  options.  Not all bit flags
shown  have  their  supporting  logic  implemented.
Notably  translate Z80 to  8080  and  remove colons
from labels are not supported.
     The opcode  table  lookup  routine maintains a
count  for  all  lookups  it attempted,  and a  hit
count for each table entry. No code  is provided to
display these counts.
     Routine   OUTDLM  is   used   to   output  tab
characters to the  output file and to  the display.
OUTDLM expands tabs on  display to the standard tab
position   (multiples  of   8).  When  the  current
display position (not  counting the line number  or
special tab, but  with  regular  tabs expanded)  is
past  the  standard  start position  for  comments,
OUTDLM  substitutes blanks  for tabs.  Furthermore,
if  the previous character put out  is also a blank
or  tab, these  substituted  blanks are surpressed.
OUTDLM will  not  substitute  blanks  for  tabs (or
surpress them),  when  the  quoted  string flag  is
set.
     The  routine  GETASC  gets the next  character
from the input file and validates it to be a  valid
text  character  (including blank,  tab,  line feed
and  carriage   return).  Invalid   characters  are
displayed  and dropped. This  protects the  program
from  spurious  characters  in  the input file. The
routine  GETC  is   used  when  any  character   is
acceptable.
     The convention is  used for most routines that
on  entry <A> contains the  next  character  to  be
processed.  Most  routines  also return in <A>  the
first character it did  not process. Many routines,
such  as FLUSHB (flush blanks/tabs), recognize that
the  character  received  in  <A>  is  not   to  be
processed  by   them   (e.g.  FLUSHB  receiving   a
non-blank/tab).  In  this  case  the  routine  does
nothing  and  returns  in  <A>  the very  character
received.  This  approach  simplifies  the  calling
routine since it can always call a routine such  as
FLUSHB.
     The text  translate module TXT8Z8 extracts the
opcode   (it   gets   control   when  the  mainline
recognizes that there is an opcode),  and  looks it
up via the LOOKUP routine in module  OPCDTB. If the
opcode  is  not  found  (including too  long),  the
opcode  and  any  operands are  flushed (copied  to
output   file  and  display).  These   opcodes  are
upper/lower  case  translated, their  operands  are
not (this is to accomodate  such  opcodes as  TITLE
and  SUBTTL). Tabs and/or blanks between the opcode
and  its operand are  replaced  by a  single blank.
Flushing  of   the   operand   continues   till  we
encounter  a   ;   that   is   not  between  quotes
(comment),  or  a CR  (end  of record).  While  the
operand  of  an  unrecognized opcode is flushed  as
is,   the  mainline   routine   OUTDSP   may  still
translate or even drop  tab characters  when  we're
past the standard comments position.
     When  the opcode is  found,  its option  flags
from the  opcode table  have been  set up in  OPOPS
(IY register  throughout) by  the  LOOKUP  routine.
Based on these bit flags TXT8Z8 first  selects  out
those opcodes not requiring operand  adjustment (no
bits set or pseudo opcodes).
     For  all  other opcodes the routine  EXTOPR is
used to extract the zero, one or  two operands from
input,   and  place  them  in  field   OPRND1  (two
operands are  consecutive, without  a  comma).  The
first  and second operand length are put  in OPRCT1
and OPRCT2.
     For  all operand  translation  and  generation
the destination  field is  OPRND  with  count field
OPRCNT.  During  the  translation  and   generation
process DE>  points to the next position in  OPRND,
while  <B>  contains  the   count   of   characters
presently in  OPRND.  HL>  is  used  as the  source
pointer  to input  operand characters  and  to  the
operands generated for extended translates.
     After  various   operands  are  generated  and
translated,   the   8080  opcode   in   OPCODE   is
translated  to  Z80  code  (for  all   but   pseudo
opcodes). Next buffers are dumped in  the following
sequence:
OPCODE - instruction opcode buffer,
 (next a blank is inserted into the output stream)
OPRND - operand  translate buffer  (based on length
in <B>/OPRCNT),
OPRND1 - input operand buffer (based  on  length in
OPRCT1 and OPRCT2),
Input file - if we did not yet  encounter a CR or ;
to  signal end of text for the current  record (see
field  UDLM), text characters are copied from input
to output.
     The routines used to  dump these  buffers  and
to process  further input file characters check for
quoted strings,  and perform  case  translation  as
appropriate. Note that the  quoted string flag  may
have   been  set  by  EXTOPR   during  the  operand
extraction, and is  reset  before  dumping  of  the
operand buffers starts.
     The  LOOKUP  routine  in  OPCDTB  matches  the
input opcode  (search  argument) against  the table
opcode.  It uses OPCODE and OPCCNT  for the  search
argument  and its length. LOOKUP matches  initially
on  the first  character only. When that matches it
then  compares  the  remaining characters  based on
the input  operand length.  If all  this agrees  it
verifies that the table  opcode is  not longer than
the input one.
     If  the  match   on   the  subsequent   opcode
characters fails,  LOOKUP continues matching on the
first byte of  the next  table entry.  The  routine
assumes the  table to be  in ascending order, as it
terminates the search  when the first  byte  of the
current  table entry is  greater than that  of  the
search argument.
     On  a  full  match  LOOKUP  updates the  table
entry's hit  count, saves the table  entry  address
and  the option flags (in OPOPS).  The  8080 opcode
in  OPCODE  is not yet replaced, as it  is used for
condition  code  translation,  extended  translate.
Translation of pseudo opcodes may not be required.
     The routine  TRCOND sets up the condition code
information  for  conditional jump, call and return
instructions.  It copies  bytes 2  (and  3 if  8080
opcode  is  3  bytes  long)  to the  operand  field
OPRND.  A  comma  is  added  for  all but  RET type
opcodes.
     Routines  OPER1 and OPER2 translate the  input
operand from HL>  in OPRND1 to DE> in OPRND.  These
routine handle  translation of register pair  names
and of M to  (HL).  For  other  operands  requiring
parentheses, these are  added seperately  (routines
NOEX and H2NDR in TXT8Z8  call OPENER and CLOSER as
required).  All operands are  copied/translated  to
OPRND.
     EXTR   is   the  routine   that  handles   the
'extended  translate'.  Based   on  the  first  and
second character  of  the  input opcode (8080),  it
goes  to  the  appropriate   subroutine.   The  RST
routine  gets the input  operand,  multiplies it by
8, and converts  it to  hex  in OPRND. The  routine
can  handle decimal and hex  numbers on input (0 to
7 only), but not binary (bbbbB).
     For  the  other  extended   translate  opcodes
(PCHL,  SPHL,  XCHG  and XTHL), the program  copies
the appropriate constant into OPRND.
-- Disclaimer --
     Reasonable effort has been  made to  test  the
program, however, there  are  almost certainly bugs
left. The program  works fine  on  MODEM217.ASM and
on my own Z80 source files.
     If  you  do fix  bugs  or  make  enhancements,
please  put an  updated version  back on the boards
(Technical CBBS in  Dearborn, Mi.   (313)  846-6127
(110, 300,  450 or  600 baud), and my local  board:
Mississauga RCP/M, Toronto, Ont. (416) 826-5394)
     This write-up is copyright  Frans  Van Duinen,
June 1982. Permission is granted to use,  copy  and
distibute for non-commercial purposes.


Happy computing,   Frans















