A v68k Execution Shell: xv68k
In the last issue, I discussed my rationale for writing a 68K emulator and some of my future plans for it. This article, however, will be limited to currently existing functionality. The v68k library handles decoding and interpreting 68K instructions; the xv68k program is an application of v68k which adds the missing pieces to form a complete (albeit simple) emulator.
One of the functions of an emulator is containment, to act as a sort of blast shield protecting the host environment from whatever disasters befall the program being emulated. (In fact, the very first instruction xv68k runs on the emulated processor clears the Supervisor bit in the status register, switching from supervisor mode to user mode, so that privileged operations cause an exception.) At the same time, it’s necessary for emulated code to be able to have some effect on the outside world (besides merely consuming resources), since otherwise there would be no value in running it. The trick is to be selective about what effects are possible from within the emulator. We certainly don’t want to give untrusted code carte blanche to the screen or the file system. Initially, it would be sufficient to have some concept of output, which emulated code can produce but whose particulars are controlled by the host system.
Fortunately, such an abstraction exists already (and has for decades): POSIX file descriptors and standard I/O. If we allow an untrusted program only the ability to write to already-opened filehandles, the worst it can do is spam your terminal or temporarily fill up your filesystem (assuming you’ve either left that file descriptor referring to the tty device or redirected it to a file on disk). The xv68k application allows the emulated program to write to any open file descriptor, by giving it access to the native write() system call.
This hand-wave merits further discussion. There must be a 68K instruction that somehow codes for a native operation. The TRAP instruction is a natural fit for system calls: The processor takes a Trap exception, and the installed exception handler, having full access to the state of the caller (and now running in supervisor mode) can implement the call in whatever way necessary and then return. In fact, this is exactly how system calls work in MacRelix, a POSIX-like environment for classic Mac OS (and therefore a spiritual cousin to Cygwin) on 68K. The only problem is that as far as the emulator application is concerned, an exception is nothing out of the ordinary — v68k’s microcode for the TRAP instruction switches to supervisor mode and constructs an exception stack frame, but for xv68k it’s just another instruction executed successfully before continuing to the next one. A jump-out-of-the-system instruction is required — i.e. one that is not handled by the processor itself, but by something external.
Fortuitously, this too also already exists. The BKPT instruction "supports breakpoints for debug monitors and real-time hardware emulators" according to the M68000 Family Programmer’s Reference Manual. It’s not supported the same way on all 68K processor models, or at all by the 68000, but xv68k doesn’t need it to be — that the instruction is recognized by assemblers and disassemblers and isn’t used for any other purpose (and therefore will never be seen in user code) is sufficient, since xv68k provides both the implementation and the call sites. Our use of it will loosely mimic the way it works on the 68020 and 68030: Execution of the BKPT instruction sets an abnormal processor condition, causing a break in the instruction step loop. The application then checks which condition caused the break; in the event of a hardware breakpoint, it optionally performs any necessary processing and then acknowledges the breakpoint, submitting an instruction opcode to substitute for the BKPT instruction. There are eight different breakpoint vectors (numbered 0 through 7); xv68k recognizes BKPT #2 for system calls. It expects a system call number in register D0 and the arguments (and return address) on the stack, and will return a result in D0. If the system call number isn’t recognized, it does nothing and doesn’t acknowledge the breakpoint, causing the processor to take an illegal instruction exception.
The prospect of littering user code with BKPT instructions is averted by combining the breakpoint mechanism with the existing practice of using TRAP #0 for system calls as in MacRelix, as shown in the implementation of write():
MOVEQ.L #4,D0 TRAP #0
The first instruction sets D0 to the value 0x00000004, which is the system call number for write(). The 'Q' stands for 'Quick' (since the value is extracted from the instruction opcode itself, rather than having to be read from somewhere). The '.L' suffix means that this is a Long-sized operation (affecting all 32 bits of the destination); however, MOVEQ is always long-sized. The '#' symbol means that the 4 is a literal value, not a memory address. The TRAP instruction causes a Trap exception along one of sixteen trap vectors (numbered 0 through 15), selected by the operand. The calling code has already pushed its arguments and return address on the stack when write() is called.
When the Trap 0 exception occurs, the processor switches to supervisor mode, builds the exception stack frame, and jumps to the exception handler’s address. The top of the stack now contains copies of several special registers as they were before the exception: the 16-bit Status Register (SR) (which we’ll ignore) and the 32-bit Program Counter (PC), which contains the address of the next instruction to execute. When the exception is taken, the PC has advanced past the TRAP instruction and is pointing to whatever follows it.
xv68k installs this exception handler for Trap 0:
LEA (2,SP),A0 SUBQ.L #2,(A0) MOVEA.L (A0),A0 MOVE.W #0x484A,(A0) RTE
SP is the Stack Pointer. It’s an alias for register A7, where it’s always stored. On the 68K architecture, the stack grows down. Parentheses indicate dereference, i.e. while SP refers to the value (a memory address) stored in A7, (SP) refers to the value stored at that memory address. Commas within parentheses denote addition, so (2,SP) means the value stored at the address SP + 2. Since the stack grows down (from high memory toward low memory), higher addresses are deeper in the stack. Two bytes away from the top of the stack is the saved PC.
Load Effective Address in this case sets A0 to SP + 2. The semantics of LEA are as if C’s address-of operator were applied — the address, not the value, of its operand is used. A0 now points to the copy of the PC stored in the stack. The next instruction (Subtract Quick) decrements it by two, so the stored PC points to the TRAP instruction again. (Returning now would result in an infinite loop.)
MOVEA is like MOVE, but used when the destination is an address register. Since A0 points to the stored PC, (A0) is the stored PC itself. After the move, A0 equals the stored PC. The second move is Word-sized, affecting only 16 bits of the destination. The word at the stored PC was the opcode for TRAP #0; now it’s BKPT #2.
RTE is Return from Exception; it deallocates the exception stack frame and restores the previous state — except as modified by the exception handler. The PC once again contains the address of the TRAP instruction, but now that address contains a BKPT instead. The next instruction step will execute the BKPT and initiate a system call.
To implement the write() system call, xv68k reads three 32-bit arguments from the 68K stack: an integer file descriptor, a buffer address, and an integer byte count. It translates the buffer address from an emulated memory offset to a native pointer, calls the native write(), and sets D0 to the result. Then it acknowledges the breakpoint with the opcode for RTS (Return from Subroutine), so that control returns to the calling code.
Finally, it’s possible to be heard in the real world. Here’s a sample user program:
main: PEA 12 PEA data PEA 1 BSR.S write MOVEQ #0,D0 RTS write: MOVEQ #4,D0 TRAP #0 data: DC.B 'Hello world\n'
PEA is Push Effective Address. It’s similar to LEA, but instead of placing the computed address in a register, it’s pushed onto the stack. We use it three times, to push three arguments onto the stack (in reverse order): a byte count of 12, the address of our string data, and the file descriptor (1 for standard out). We could have written MOVE.L #12,-(SP) instead of PEA 12, but that instruction would occupy six bytes instead of four.
BSR is Branch to Subroutine. It pushes the return address onto the stack before branching, so the called code should RTS instead of branching back. After write() returns (via the RTS supplied to acknowledge the breakpoint), main() itself returns, though not before clearing D0 (the result register), which becomes the program’s exit status.
All told, Hello World in 68K assembler occupies 34 bytes, of which 12 bytes are data and the remaining 22 bytes are code, including the 4-byte glue for the write() system call and not counting the kernel.
That’s all the space we have in this issue, but in the next one we’ll implement our first Mac Toolbox trap, _SysBeep. (It’s harder than it sounds.) Stay optimized!