Fun with code golf

| tags: asm code eso golf

Introduction

Lately, I've been wasting spending time doing code golf at the (very informal) Anarchy Golf server. I don't claim to be any sort of expert on the topic, but here are some recent entries I've made, explained.

For those not in the know, the goal in code golf is to write the shortest possible program in language X that accomplishes a task Y. The golf server has a set of test inputs, and your program needs to produce output matching the corresponding reference outputs in order to be accepted.

The descriptions assume some familiarity with the programming languages in question. Forth, C and the Z80 instruction set are well-known; Befunge(-93), Befunge-98 and Burlesque perhaps less so.

849. A006520 - Forth, C, Befunge-98, Burlesque

Problem 849 – A006520. The task was to generate first 500 terms of sequence A006520 from The On-Line Encyclopedia of Integer Sequences®.

Forth

anagolf-849-a006520.fs

1 502 2 [do] dup 1 .r cr [i] 0 [i] - and + [loop]

There actually aren't that many tricks to explain here.

The brackets are needed, as do, loop and i are compile-only words, but the program is executed in interpret state. Wrapping the code inside a compiled word would fix that, but the colon definition costs exactly as many bytes as it would save (when 4 bracket-pairs are needed):

: f 1 502 2 do dup 1 .r cr i 0 i - and + loop ; f

The sequence is the cumulative sum of A006519, which, in turn, is the sequence of (for term n) highest powers of 2 dividing n. In terms of bits, that's equivalent to counting the number of trailing 0s in the binary representation of n, or more exactly the value 100...00 with the same amount of zeros. We can find that very easily with the relatively well-known expression (n & (−n)), where & is the bitwise AND. In Forth terms (for the number on top of the stack) that would be 0 over - and, but as we need a looping construct anyway, we can use the short i word to get the current index of a counted do loop, and do i 0 i - and instead. The straightforward dup negate and is longer, thanks to the length of negate.

The golf server is picky about trailing whitespace (except on the last line), so the standard Forth . word (which prints a decimal number, followed by a space) can't be used. 1 .r is the shortest workaround I've found.

C

anagolf-849-a006520.c

b;main(a){for(;a<501;)printf("%d\n",b+=a&-a++);}

The C solution I have isn't exactly novel (several other people ended with effectively equivalent solutions), and there's not too much trickery involved. The obsolete implicit int behavior is used throughout, and the variables are also initialized implicitly: b to 0 (as it has static storage duration due to being in file scope) and a to 1 (since it corresponds to the argc argument of main, and the program is called with no extra arguments).

Befunge-98

anagolf-849-a006520.b98

"LOOBNRTS"4($$4($$1+:'%0p:0\-A+:SDa,'X:'S6*1+`k@

Befunge-98 shares the same problem with Forth that the standard integer output command . appends a space. (So does Befunge-93 in theory, actually, but the golf server's implementation seems to have been tweaked to remove that.) One workaround is to use S (integer to string) and D (print string) from the STRN fingerprint, but that implies quite a lot of overhead to load the fingerprint: "NRTS"4($$.

In this code, we save one pair of "" by combining the STRN and BOOL (bitwise AND) fingerprint names in a single string, though the net benefit is just 1 byte as the $$$$ (to drop outputs of () cannot be combined to a 3k$.

There isn't much else to explain about the program. To get the next term of A006520, we do :0\-A+, the Befunge equivalent to the Forth 0 over - and (or more exactly dup 0 swap - and + since Befunge doesn't have an over command). The :'%0p...'X is used to get around the lack of stack manipulation commands in Befunge; it forwards the current counter value by writing a copy of it over the letter X, so that the :0\-A+ can destroy it and add to the sum accumulator. 'S6*1+ is simply a short way to write 499 as 83*6+1.

(In retrospect, '2a*\`j@ would've saved a byte.)

Burlesque

anagolf-849-a006520.blsq

500{{J0j.-&&}GO++}GO#S
500{roJ0j?-&&++}GO#S     (too slow to run on the server)
500{{256g_}GO++}GO#S     (post-deadline, shorter)

Burlesque is quite a golf-friendly language, due to the large number of built-in commands that are just 1 or 2 characters long. I ended up with 500{{J0j.-&&}GO++}GO#S, which translates approximately to: 500{...}GO#S — generate a list of f(1) ... f(500) and put it on stack. {...}GO++ — for each element n, take the sum of g(1) ... g(n). J0j.-&& — for each of those elements i, use i & (−i).

Another golf server user, teebee, had insight to use the gcd built-in, which lets you replace J0j-&& with 256g_, saving two bytes.

851. What have I got in my POCKET - Befunge, Befunge-98, Z80

Problem 851 – What have I got in my POCKET. The task here was to map e.g. PhOCanKdsEesT to handses — which probably just about every solution implemented by just removing all uppercase letters, completely ignoring the POCKET aspect.

Befunge

anagolf-851-pocket.bef

~::1+%" "/>2-#,_

There are two key concepts in this solution.

One is a very efficient EOF test, which is possibly a well-known trick, though I discovered it independently. You may have noticed that there is no @ end execution command in the program at all. Read input until EOF is instead done by ~:1+%. An equivalent expression in C would be (c = getchar(), c % (c+1)). For any regular character (from 0 to 255), c % (c+1) == c. For EOF (−1), however, (-1) % 0 causes an exception; the standard Befunge rule for division by zero is to ask the user what the result should be (!), but this seems to apply only to /, not %. The golf server ignores all output to standard error, so crash-to-exit (when shorter than exiting cleanly) is a common strategy.

The other key idea is to test for uppercase letters by " "/2-. Dividing by 32 (" ") neatly partitions the 0..255 range to 8 classes: 0..31, 32..63, 64..95, 96..127 and so on. In the ASCII character set, all uppercase letters (and a couple of inconsequential special characters) fall in the 64..95 block, so " "/2- is zero iff the top of stack was uppercase. Then we simply print the character if the above expression was nonzero.

The actual printing is done by >2-#,_. The print command is first jumped over (#,); then the if (_) branches left if the character needs to be printed. Going left executes the sequence ,22- which prints the character and leaves a 0 on top of the stack, so that re-executing the _ will now proceed right.

Befunge-98

anagolf-851-pocket.b98

#@~:' /2-!j,

This is very close to the Befunge-93 solution, except with some space-saving made possible by commands Befunge-98 adds. The read-until-EOF reduces to a (well-defined) #@~ as the Befunge-98 character input command ~ reflects when at EOF. ' is a single-character quote, replacing the " ". Finally, !j, does the conditional printing in a simpler way using the computed jump command j.

Z80

anagolf-851-pocket.z80

next:
    call $8003
    jr nc, continue
    halt
continue:
    push de
    rlca
    cp $40
    ret pe
    rrca

The Z80 execution environment at the golf server is not terribly well documented, but in short, everything (memory and registers) is initialized to zero, the user code loaded at $0000, and if the PC becomes $8000 (resp. $8003), the emulator executes a putchar (resp. getchar) operation. putchar accepts input in the A register, and getchar returns in A with carry flag clear, or sets carry (and does not touch A) if at EOF. Execution terminates when a halt instruction is executed.

As a consequence of the zero initialization, putchar can be called by falling off the end of the program (after pushing a return address), or by rst 38h if the code is small enough. $00 is the Z80 nop instruction, so the emulator will simply execute (approximately) 32K nops between the end of the program and address $8000. This does take some time, so for longer outputs this strategy may run against the execution time limits.

Conceptually, this solution is very similar to the Befunge one. The upper half does the input until EOF task, and has no tricks whatsoever. For the lower half:

push de
Pushes $0000 (initial value of DE) to act as a return address; otherwise the top of the stack would be at the start of the program.
rlca
Essentially multiplies the input character by 2 (carry is always clear from the getchar), which never causes any overflow problems as all the input data is ASCII. This means we want to print all other values except $80..$bf, as the ASCII uppercase range is approximately $40..$5f.
cp $40
Compares (as in, subtracts and sets flags, but does not write result) with $40 (64). Given that our target values are $80..$bf, or in signed-integer terms from -128 to -65, the signed subtraction will overflow iff the input character was uppercase.
ret pe
Returns to start of program if the cp $40 caused a signed overflow.
rrca
Converts A back to the original character. Then we fall through to putchar, and return to the $0000 pushed earlier.

853. Count the Overlap - Forth

Problem 853 – Count the Overlap. Given three rectangles (x, y, width, height) A, B and C, compute the areas of overlap between all pairs AB, AC, BC and the triplet ABC.

Forth

anagolf-853-overlap.fs

: o { a b c d } a c max a b + c d + min over - 0 max ;
: s o 2>r o 2r> ;
: t s nip * nip ;
: g
stdin slurp-fid bounds do
i c@ bl = if
i 1+ 2 s>number? * + ?dup or
depth 12 = if
{ a b c d e f g h i j k l }
e g i k f h j l s b d 2rot a c t
e g i k f h j l t
a c i k b d j l t
a c e g b d f h t
." ab " . ." ac " . ." bc " . ." abc " 1 .r cr
then then loop ;

This I publish more for its entertainment value than for any real insights. It's very straight-forward; you can also see the original anagolf-853-overlap.fs for a more well-formatted version with reasonable local variable names.

There are a few minor space-saving tricks. * + is used as a shorter alternative to 2drop, as at least one of the two uninteresting results of s>number? is always 0 in this case. Somewhat similarly, ?dup or is used as a shorter ?dup drop to do drop if zero; if the original value was nonzero (and dupped by ?dup), we or it with itself, leaving one copy; and if it was zero, we or the zero with whatever was on the stack, dropping it.

An earlier version used the following for parsing user input, which was even sillier:

: ax ; : ay ; : aw ; : ah ;
: bx ; : by ; : bw ; : bh ;
: cx ; : cy ; : cw ; : ch ;
\ ...
evaluate
\ ...

...and more!

I also have some potentially interesting solutions for problem 856 (Mail merge) in Z80 and Befunge-98, but that's still under deadline, so those will have to wait until part II.