Tcl's [binary scan] and the Go Challenge 1

Published . Updated .

In the programming language Tcl the command [binary scan] decodes binary data according to a format string. For example,

binary scan $data {cu4 f1 a*} fourBytes aFloat remainder

will read four unsigned 8-bit integers from the string data and put them into the variable fourBytes, then read one 32-bit floating point number and put it in aFloat, then, finally, put the remaining string in remainder (without altering data itself).

I immediately thought of this command when reading the Go Challenge 1, a programming challenge that involves reverse engineering a binary file format and writing a program that extracts data from it. Indeed, the command proved very useful in solving the challenge.

In the following GitHub gist is my solution to the challenge. It shows how to use [binary scan] with commentary explaining how Tcl works along the way. The solution passes go test. With it I hope to illustrate why I think [binary scan] and its counterpart [binary format] are a great way to manipulate binary data from a simple hashbang script.

The challenge is fun, so if you haven't solved it yourself I encourage you to go (no pun intended) and try it before you read my solution lest it spoils some of the fun for you.

Code

GitHub gist.

decoder.tcl

#!/usr/bin/env tclsh
package require fileutil

namespace eval ::decoder {
    # Allow us to use +, -, *, /, etc. as commands outside of the [expr] DSL.
    namespace path ::tcl::mathop
}

proc ::decoder::decode-file {filename} {
    # Read the entire file into memory as binary. [::fileutil::cat] runs the
    # command ::fileutil::cat and substitutes its return value in its place.
    # $filename substitutes the value of that variable.
    set data [::fileutil::cat -encoding binary $filename]

    # Decode the header with [binary scan]. The curly braces around the format
    # string quote it. They are analogous to single quotes in the POSIX shell.
    # The components of the format string used to decode the file header have
    # the following meaning:
    #
    # a6  - a string of six characters
    # W   - one 64-bit big-endian integer
    # A32 - a string of 32 characters with trailing blanks and nulls discarded
    # r1  - one little-endian 32-bit float
    # a*  - the remainder of the string unchanged
    #
    # The first four values are assigned to elements in the array header. The
    # remainder of the string is assigned to the variable data replacing its
    # previous value.
    binary scan $data {a6 W A32 r1 a*} \
            header(magic) header(length) header(version) header(tempo) data

    # {} is an empty sting as well as an empty list.
    set tracks {}
    set charsRemaining [- $header(length) 36]
    while {$charsRemaining > 0} {
        # Assign each item of the list returned by [decode-track] to a variable.
        lassign [decode-track $data] newTrack charsDecoded data
        # Append $newTrack to the list tracks.
        lappend tracks $newTrack
        # Decrease charsRemaining by $charsDecoded.
        incr charsRemaining -$charsDecoded
    }

    # A list with the format of {key1 value1 key2 value2 ...} is also a valid
    # Tcl dictionary. We will use this fact later in [pretty-print]. [array get]
    # converts an array to a dictionary.
    return [list header [array get header] tracks $tracks]

    # Aside: Arrays are an older feature of Tcl. They allow a convenient syntax
    # shortcut of "name(key)" to access their elements, which we have used
    # above. However, unlike most things in the language they are *not*
    # immutable values. This makes it necessary to convert our headers array to
    # a dictionary before we return it.
}

proc ::decoder::decode-track {data} {
    set charsDecoded [string length $data]

    # Read the length of the track name. "cu" means an unsigned 8-bit integer.
    binary scan $data {cu1 cu3 cu1 a*} track(number) _ track(nameLength) data

    # Below we substitute $track(nameLength) in the format string to decode a
    # variable-length field. You don't need double quotes to perform
    # substitution in Tcl.
    binary scan $data a${track(nameLength)}cu16a* track(name) track(score) data

    incr charsDecoded -[string length $data]
    return [list [array get track] $charsDecoded $data]
}

proc ::decoder::pretty-print {data} {
    # The command [dict with dictionary code] assigns the values of the keys in
    # $dictionary to variables with the same names in the current scope
    # (procedure), runs the code then put the changes back into the dictionary.
    # E.g.,
    #
    # set test {foo bar}; dict with test {puts $foo; set foo baz}
    #
    # outputs "bar" then changes the value of test to {foo baz}. Since we don't
    # want to change any of the values we will leave the code part empty.
    dict with data { # no code here } ;# $header is now a dictionary.
    dict with header {}
    puts "Saved with HW Version: $version"
    puts "Tempo: [format-float $tempo]"
    foreach track $tracks {
        dict with track {}
        # Alter score with a map...
        set score [string map {0 - 1 x} [join $score ""]]
        # ...and then with a regular expression.
        regsub {(....)(....)(....)(....)} $score {|\1|\2|\3|\4|} score
        puts "($number) $name\t$score"
    }
}

proc ::decoder::format-float {f} {
    # The first argument to the [if] command is an expression. To make doing
    # math more convenient expressions use the [expr] DSL. It is infix (meaning
    # that you write "$a + $b * $c" rather than "[+ $a [* $b $c]]") and has
    # functions that take arguments in parentheses.
    if {round($f) == $f} {
        return [format %.0f $f]
    } else {
        return [format %.1f $f]
    }
}

set decoded [::decoder::decode-file [lindex $argv 0]]
::decoder::pretty-print $decoded

decoder_test.go.diff

 decoder_test.go | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/decoder_test.go b/decoder_test.go
index a3f0e19..92763a2 100644
--- a/decoder_test.go
+++ b/decoder_test.go
@@ -2,6 +2,7 @@ package drum
 
 import (
 	"fmt"
+	"os/exec"
 	"path"
 	"testing"
 )
@@ -61,7 +62,9 @@ Tempo: 999
 	}
 
 	for _, exp := range tData {
-		decoded, err := DecodeFile(path.Join("fixtures", exp.path))
+		output, err := exec.Command("tclsh", "decoder.tcl",
+			path.Join("fixtures", exp.path)).Output()
+		decoded := string(output)
 		if err != nil {
 			t.Fatalf("something went wrong decoding %s - %v", exp.path, err)
 		}
-- 
2.1.0

A "thanks" goes out to Poor Yorick and @rich123 for suggestions that helped improve this code and its comments.


Tags: Tcl, Go, binary data, programming, old blog