File Storage Method

This page tells about what is the best way to store a file for a dedicated Sega Genesis compiler.

In General
Since an assembly file usually contains tons of repeated strings (such as opcodes and variables), some kind of LZ77 compression would be optimal. The LZ77 compression could use 4 bits for length and 12 bits for distance.

Another variation: In addition to using the 2-byte length/distance pairs, there could be a pair that uses 3 bytes. The extra byte would be used for length, giving a total of 12 bits.

Header
Files for a dedicated Sega Genesis compiler could use the following header:
 * Byte 1:  Encryption Format
 * Byte 2:  Encryption Parameter
 * Bytes 3-4:  Checksum of file

File
Here is the file structure:
 * Plain text is represented as plain ASCII (bytes $20-$7F).


 * $00:  End of file.
 * $01:  Whitespace before and after an opcode's name.
 * $0A or $0D: New line.
 * $80-$FF:  Other entities besides text, such as an input/output box, graphics, etc.

Checksum
Each file has a 16-bit checksum. The checksum is calculated using this procedure:
 * 1) Checksum starts at 0.
 * 2) Circular shift the checksum right by one bit.
 * 3) Add a byte from the file, starting with the first byte.
 * 4) If there are bytes left, go back to step 2.

Encryption
To look at encryption types, go here.

For examples, go here.

Encryption can not only be used to prevent file hacking, but given the right conditions, if can also be used to make a file that is compressible with RLE.

Since there is one byte devoted to the type and one byte devoted to the parameter, there are theoretically lots and lots of ways that a file can be encrypted. Therefore, when not editing one of these files, you can run it through all the different encryption types, and see which one produces the smallest RLE-compressed file. It's like running an antivirus program; do it on your own time. When saving a file, only the first few encryption types will be checked to save time. Once a certain amount of time has elapsed, parameter checking will stop, but you can choose to check many encryption types if you want to.

Since the goal is to produce the smallest RLE-compressed file, the encryption parameter cannot remain constant throughout the encryption procedure, since similar-byte runs would not change. It will change according to a certain formula, based on the byte values.