Book 1

Chapter 3 - Basic Text Editing, Part 1: ASCII & Basic Control Codes


The following programs are used in this chapter: WinHex 32, Windows Character Map
The following files are used in this chapter: theend.txt, faxanadu.nes (NES ROM)

All "standard" text on a PC, such as something you might view in Notepad or in your web browser and even the text in this article, is stored in a format called American Standard Code for Information Interchange. Put simply, the ASCII standard tells most programs which of 256 (or 0xFF) possible hexadecimal bytes is set aside for the display of the letter "A", which is set aside for the letter "B", and so on. Every character in this article, including letters, numbers, spaces, and punctuation, is stored as a single byte in the .html file that contains its text. Confused yet? Well, here's a text file. Go ahead, take a look. In fact, open it in a new window. Make sure that new window isn't maximized, and position it so you can continue reading this article fairly well. You'll also want to save it to your hard drive somewhere, ideally where you keep your copy of WinHex32 (find it in the Tools archive if you don't have it.) Now launch the aforementioned hex editor. Choose "Open File for Editing..." from the File menu, and then open up theend.txt

As you scroll the cursor around the file in WinHex32, you'll see that each character is represented by a separate, unique hexadecimal code. For instance, "C" is 0x43, and "u" is 0x75. If you move the cursor over to the "R" in Ross Smith's name and type a capital "S" (making sure that your cursor is active in the text pane, of course -- if it's not, press Tab to move it from the hex pane to the text pane), you'll notice that the 0x52 will become 0x53, as "S" is the next letter after "R" -- in other words, "S" is "R+1." Similarly, if you move the cursor to the hex pane and enter "41" as your new value (please note that you should NOT type the 0x prefix in an editor unless it tells you to!), whatever letter was displaying will change to a capital "A."

The "Character Map" program packaged with Windows will confirm this as well. If you open it up (default, via the start menu, is Program Files/Accessories/System Tools) and hover your cursor over any of the letters in it, it will show you what the hexadecimal equivalent of it is (in the case shown to the left, it's displaying the Unicode equivalent; ignore everything but the "61." You can confirm this by comparing and cross-referencing theend.txt in WinHex32 to Windows' built-in character map program. Note that the extended ASCII set -- in other words, anything beyond 0x7E, the tilde (~) character -- is not "standard" ASCII, and the characers there may not display in all fonts.

Go ahead and play with WinHex if you want. In fact, here's a little excersize you can do: Scroll down to 0x1B6 in the file. You should see "RIMMER: Lister." there, with the cursor blinking on the first R in Rimmer. Now, make sure the cursor is in the text pane and type "ASSMAN", then save the file. Open it up in a text editor and behold the changes!

The same general principles work with ROM images. Note that while most old console games do note use ASCII for their text storage, a very small few do. Faxanadu is one of these. Take this conversation at the beginning of the game:

Before you do anything, make a backup of your Faxanadu ROM. This is extremely important! Backups are your friend. If something goes wrong, you can always, always, ALWAYS restore from a backup, but only if you'd actually made one! Now load up your Faxanadu ROM in WinHex32, and navigate to 0x3480B (ideally via the Go To option, which seems to be migratory in WinHex and is either in the Edit or Search menu.) Alternatively, you can also type one of the words into WinHex's Text Search prompt and look for the text that way, but do note that Faxanadu uses non-ASCII space characters so a string such as "I've been" will not turn up any results.

There are a few things going on here which I have not yet covered, but will shortly. For now, we're simply going to focus on basic text replacement, so move cursor to the text pane and to the word "journey" (0x3481F.) Type the word "sojourn" on top of it, save the game, and load it up in an emulator to see your results. If you did it properly, it should look like the screen to the right. Good job! If the text did not change, the only thing I can think of that could be wrong is that your Faxanadu ROM is read only. Open up its standard Windows preferences window and ensure that it isn't read-only, then try again.

Now that you've got the basics of text editing down, it's time to get into some of the weirder stuff that Faxanadu is doing. You will notice, as I mentioned, earlier, that Faxanadu's spaces are referenced by the character 0xFD. The technical reason for this is probably to save space in the font and instead simply place a null byte into the tilemap that serves as the text display mechanism -- more on this in the next chapter and future chapters.

You will notice, of course, that there are more things going on here than simply weird spaces. The game has to know when to insert a line break, because the window is only so large, and you'll get something weird, like the picture to the left. The game also has to know when to pause text output and wait for a keypress as the window can only hold so many lines of text (a "section break") and it has to know when to terminate the flow of text so it simply doesn't display stuff indefinitely. All of these -- and, technically, the space character -- are referred to as "control codes." I'll get into how to get the codes to display in your hex editor in the next chapter.

Go ahead and play with replacing the linebreaks (0xFE) and section breaks (0xFC). You'll see them placed in the text between words and at the ends of sentences, in the same places they are in the game. Note that the section break actually does two things -- first, it pauses the flow of text and waits for the user to press a button, and second, it inserts a linebreak before it starts displaying text again. In other games, you might not be so fortunate -- you could have a byte that simply halts the display of text and is followed by a line break, or you could have one that halts the display of text and is followed by a control code that clears the display window before showing the next segment of text.

That brings us to the final important point of the article -- control codes can do a lot more than you might initially think. In some cases, they may control a portrait of a character that gets displayed to graphically portray the person who's speaking. In other cases, they may display full words -- a type of simple compression that can drastically reduce the amount of space a game's script takes up in the ROM. Control codes can also slow down the output of text, change the music that's playing, alter the color of the font, or display kanji. They can do virtually anything. They're not always just one byte, either -- it's very possible that they may take an extra byte as a "parameter" that which color to change the text to, or which kanji to display, or how fast to make the text go. Some even control low-level assembly code that's embedded within the game's text. Part of being a translation hacker is being able to figure out exactly what a control code is, how many parameter bytes (if any) follow it, and what it does. The last is not always necessary, but it can be nice to know if something goes wrong.