The Do-It-Yourself Corner, Chapter 2: Are We Having Fun Yet?
Topic: The Do-It-Yourself Corner
So few people seem to be reading this...it's got me a little bit worried. ^_^;
DON'T PANIC!: The translation project is still proceeding. I haven't forgotten any of you. The final (to my knowledge) script block has been dumped and I'm formatting it as we speak. This particular one is full of all the little messages that you expect to see in RPGs when you pick up items or perform tasks.
I also discovered that my VWF has one last kink to work out: the VWF will be more or less perfected after I hone this final element. I think.
So without further ado, Chapter 2: Making the Table. Or whatever.
Like I said before: everything in a ROM is based around numbers. Those numbers mean different things depending on context: it's like having a language that only has 256 words, but their meaning and required grammar are completely different depending on whether our little 1337-speaker is, say, sitting in the bathroom, driving his car or ordering a cappuccino.
Unlike graphics, which are due to hardware conveniences usually stored in a format that someone somewhere has fully documented and fully implemented into a tile editor, there are as many possible ways to store text in a game as there are books in Borges' Library of Babel. Sometimes you get lucky, and a game uses an established encoding format: sitting down at Notepad, typing up a document, you are using in most cases the standard ASCII encoding system (interestingly, Sylvanian Families uses part of the ASCII set). Japanese also has many standards, such as EUC, JIS and its variants, and the variants and subsets of the Unicode system.
If you've ever cracked an alphanumeric 3-18-25-16-20-15-7-18-1-13, you're familiar with the concept behind a table. There are, however, three differences:
1. The encoded text will very likely not be signposted in any helpful way. You'll have to hunt for it in a sea of non-text gibberish data.
2. The cipher you're trying to crack isn't written in English: it's written in Japanese. The classic "Etaoin Shdrlu" will not save you.
3. You won't just be working out letters/symbols. Some numbers do not represent text, but are nonetheless significant. Line breaks, page breaks, text effects...all of these are represented with their own numbers, and are collectively referred to as "control codes." Ignore them only at your peril. (Sylvanian Families uses so many of these that its script can be considered a compiled programming language.)
Right. I won't insult your intelligence any longer with vague references to "numbers." If you know anything about computers, you know they don't count "one, two, three". These numbers I've made so much of come down to rows upon rows of tiny on/off switches. Natively, a computer "thinks" in binary, or "base 2": it uses the numbers 0 and 1 and a place value system to represent every number from 0 to whatever. (An old joke says that there are 10 kinds of people in the world: those who understand binary and those who don't.)
Of course, most of your work will probably never delve directly into the magical world of binary. Binary digits, or "bits", are grouped in sets of eight, called "bytes", which can individually represent every value from 0 to 255, and this arrangement paves the way for a stopgap between the base 2 world of the computer's brain and the base 10 world of ours: the hexadecimal (base 16) number system. It uses the numbers 0-9 as well as the letters A-F to represent the numbers 0 through 15, and place value to accomplish the rest. The "tens" column represents "sixteens": 10 hex is 16, 20h is 32, 30h is 48, all the way up to FF, which is (15 * 16) + 15, or 255. So FF, the highest two-digit hexadecimal number, also conveniently represents the maximum number a byte can represent. (Also conveniently, 100h comes out to a clean 256.)
This is the "hex" in "hex editor": you will be looking at the ROM as byte data, represented in the form of hexadecimal numbers. Exciting, isn't it?
Now, down to business. Open up JWPce (or whatever) and load your ROM in Tile Molester (again, or whatever). Search the ROM for the font. Tile Molester should choose the appropriate format on its own, but on occasion you'll need to tweak it. SNES/SFC ROMs store graphics in a 4bpp format (you don't need to understand what that means right now), but the font is usually stored in a 2bpp format (the exact same one used by Game Boy, in fact). It might be stored in an unusual format, so if you don't find it using the default, try a different one.
If you can't find the font and you've tried all the different formats, then choose a different project. Its time will come. You are but a learner now, but when next you meet, you'll be the master...and you'll have a neat helmet, a red lightsaber and James Earl Jones doing your voice. Or something like that.
Games often have multiple fonts: if you find one that's not the main font, note its location anyhow. Sylvanian Families actually had me stymied for a while because even though I found the actual font the game printed, it was not in the same order as the game's table: another font elsewhere, however, was. Once we've found the font, which we will assume is in the same order as the table, we go on to the next step, which actually involves playing the game. ^_^
Even if you can't actually understand a word of it, pay attention for any patterns of characters in the game's text that are close together in the font. We'll need to know such chinks in the game's armor for our final step, which involves our hex editor's special power tool for table making: the Relative Search function. Relative Search takes a pattern of characters and searches for it in the ROM: if you searched for "king", it would search the ROM for every set of four bytes that have the same relation between them as the letters in the word (which, in theory, will help us determine what values represent what letters). For example, if all our relative search matches for "king" are identical, then it's safe to assume we've found the values for "k", "i", "n" and "g", and we can extrapolate the rest with the aid of the font.
The same thing applies here, only we'll be doing it with Japanese characters. We can't search for "king", but we can search for patterns. Let's assume our font has all the "plain" characters in order and leaves special characters such as dakuten and small kana outside the main run. For the sake of argument, let's say this is a "Ghost in the Shell: Stand Alone Complex" game and we've decided to use the name of those lovable, inescapable Tachikoma as our key. In Japanese, Tachikoma is written in katakana as "TA-CHI-KO-MA", which are close enough in the font for us to Relative Search. So let's take a look at the syllabary, and assign English letters to each one:
A-ko
B-sa
C-shi
D-su
E-se
F-so
G-ta
H-chi
I-tsu
J-te
K-to
L-na
M-ni
N-nu
O-ne
P-no
Q-ha
R-hi
S-fu
T-he
U-ho
V-ma
(Trust me, that "A-ko" isn't in there on purpose. I've never even watched it. >_>)
So the pattern we're looking for is going to be "GHAV". If we get results from the Relative Search that are all the same, we've found "TA", "CHI", "KO" and "MA" and can begin work on extrapolating the rest of the table. If not, we'll have to refine our search a bit, maybe choose a different pattern to look for...
But once we're sure we've found the correct pattern, we can start work on the table!
Table files are usually saved with the extension ".tbl", but they can really be any plain text format. WindHex seems to support Shift-JIS exclusively; Atlas and romjuice are fine with UTF-8. So we'll probably have to save it in two formats; no biggie for JWPce.
Let's first start with a blank table. Assuming there are 256 characters or fewer in the font, go into JWPce, switch to regular ASCII mode, then start with this:
00=
01=
02=
03=
04=
05=
06=
07=
08=
09=
0A=
0B=
0C=
0D=
0E=
0F=
10=
11=
...
And work your way through to...
...
EE=
EF=
F0=
F1=
F2=
F3=
F4=
F5=
F6=
F7=
F8=
F9=
FA=
FB=
FC=
FD=
FE=
FF=
And make sure there's a blank line at the end. Copy-and-paste helps a lot; just make sure you don't have any duplicates.
Switch back to Japanese mode and put down the symbol each byte value represents after its corresponding equals sign. Once you're done with that, work out the control codes: in the simplest case, you'll just have line break and page break codes to puzzle through. If every line ends in "FE" and a page of text ends in "FF", mark those values with "<line>" and "<end>" or whatever suits your fancy. If you're not so lucky, there will be more, but you can work them out by playing the game and taking note of what happens when text with those control codes appears on screen. Mark any unknown non-text values with the number in pointy brackets, like "<1F>", <CE>", "<D5>", "<42>", etc. There's a good reason for this, but it'll wait till another time.
There are probably people out there much better suited to writing a readme for romhacking beginners, and those same people are probably watching my silly, inefficient style and laughing. It's entirely possible I'm doing more harm than good by writing these, but I thought I'd fill in the space between major project updates by giving back to the online community. These are the techniques I've used, for the most part, anyhow, and they've served me in good stead.
Until next time, ladies, gentlemen, Demi-Fiends and you fuzzy things sitting on a shelf.
Powered by Qumana
Posted by Ryusui
at 10:03 PM PDT