Assembly hacking, or ASM hacking, is the process of modifying a game's code instead of its data. This requires reverse-engineering the relevant portion of the game.
First off, if this is your first ROM hacking project, you're probably not going to want to do this, and even if you do, you are not likely to pull it off unless you are already familiar with assembly languages. You must learn to walk before you can run. :)
Assembly languages Edit
Knowing an assembly language doesn't help a whole lot if the language you know is the wrong one. On the other hand, it will certainly make things a whole lot easier than if you are completely unfamiliar with assembly. Each processor will have its own language, for instance, the 6502 is very different from the M68000 (and much harder to use).
The order of assembly languages you may come across from easiest to toughest to use are roughly:
This is not necessarily the order in which they are easy to learn. For instance, the 6502 has far fewer instructions than the Z80. However, it has fewer registers, and they are smaller, so that it is more difficult to figure out the best way to do something -- something that may be perfectly straightforward on another processor, such as how to multiply two 16-bit numbers together. In general, the more powerful a processor is (in terms of functionality rather than speed), the easier it is to use but the harder it is to learn.
All assembly languages have a few things in common. For instance, each processor, and therefore its assembly language, has a few registers. A program carries out its tasks by manipulating these registers and memory. However, what registers are present and what they can do varies from processor to processor. For instance, the 6502 and Z80 both have a register named A which are similar in purpose, but are used in entirely different ways. All processors have a flags register, and they often have the same kind of flags, but the exact conditions which may set or clear a certain flag varies from processor to processor.
Basics of assembly language Edit
Kinds of registers Edit
All processors have at least one register that acts as an "accumulator". On the 6502 and Z80, it's named A; on the x86, it's AX; on the M68000, every D register acts as a sort of accumulator.
The accumulator is the primary register used for math.
Indexing registers Edit
All processors have at least one indexing register. On the 6502, they're X and Y; on the Z80, the primary indexing registers are IX and IY. All A registers are indexing registers on the M68000.
Counting registers Edit
Registers used for counting aren't uncommon. On the 6502, these are usually X and Y (same as the indexing registers).
Program counter Edit
The program counter, or PC, is a register that holds the address of the current location in code. For example, if the PC is 8000, then the next instruction to execute is located at 8000. In MIPS, the PC register is not directly accessible.
Stack pointer Edit
See the stack, below.
MIPS specific Edit
Argument registers Edit
Arguments are what values are sent to functions to process. MIPS uses $a0 - $a3, and any further values are sent to the stack.
Return registers Edit
Values which functions return to their caller, $v0 and $v1 on MIPS machines. As with argument registers, any further values needed are sent to the stack.
These registers ($t0 - $t9) are used in between function calls.
Saved registers Edit
These registers ($s0 - $s7) are saved across function calls. This is done by saving the original value of the saved registers needed to the stack at the beginning of a function, then restoring them from the stack after they are no longer needed, usually before the function returns.
Other MIPS registers Edit
$zero : Also known as $r0, this register is read-only and is always zero.
$at : assembler temporary.
$k0/$k1 : reserved for OS kernel.
$gp : global pointer.
$fp : Also known as $s8, this register is the frame pointer.
$ra : Return address, the address to return functions from. JAL and JALR automatically store the return address in this register.
The stack Edit
The stack is a region in memory that contains temporary storage. Sometimes the processor specifies where the stack goes; for instance, the 6502 specifies that the stack is from $0100 to $01FF.
To use the stack, you push data in a register on it, and later pop it back into a register (it doesn't have to be the same register). Suppose register A holds a value that you need to keep, but you need register A for something else right now. So you push A onto the stack. Let's say the stack has the bytes 00 01 02, and A holds 10. Then the stack becomes 00 01 02 10. When you pop the value back into A, A will hold 10 again, and the stack will again become 00 01 02. The stack pointer contains the address at the top of the stack, so in our example, when you push 10 onto the stack, the stack pointer contains the address of that byte.
It's important to note that calling a subroutine (such an instruction is usually named JSR, BSR, or CALL; the first two mean Jump/Branch to SubRoutine) pushes an address onto the stack -- this is how the CPU knows where it came from. That means you can't just push something onto the stack before calling a routine and pop it from within the routine.
Reverse engineering Edit
First, you must know what you hope to accomplish. Otherwise, you have no hack to make. You must have a means of finding the code you want to modify. For example, if you want to insert an MTE routine, you must find the part of the game that pulls the next byte from the script. Now that you know that, you can find it more easily.
You need a debugger. On the NES, FCE Ultra XD is your friend. Let's say you're using it and you know that the first block of text in the game is located in the ROM at 12345 (hexidecimal). Well, that's not where it's going to be loaded in memory, because the 6502 can only address 64 KB of memory and 12345 goes past the 64 KB barrier, which is at FFFF. Instead, it is likely that a bank will be switched in somewhere around 8000 through DFFF. However, your script block will still be located at some x345 in memory. So, you set breakpoints for reads from 8345, 9345, A345, B345, C345, and D345. Then let's say emulation stops at a line like this:
$EDB5:B1 20 LDA ($20),Y @ $8B00 = #$80
That bit with the "@ $8B00" means that ($20),Y points to that address. So write down this line and look at $8B00 in memory and see if the bytes match what's in your script. If so, you've found the bit of code you need.