Text/sub-text encoding

Imagine a reduntant code in which multiple code-words map to the same meaning. Possibly it is redundant in order to be robust on a noisy channel, possibly not. With such a code we could send some further information in the selection of precisely which code-word to use for each meaning. Thus the text would have a sub-text.

The sub-text might encode a different type of data from the text. For example, a video stream might encode large features as text and fine detail as sub-text.

A noisy channel might only allow the text to be recovered, whereas if the channel is less noisy both text and sub-text can be recovered. There is a reduction in the robustness of the text encoding, but it won't be 100% less robust even if the sub-text is transmitted at the maximum possible rate.

One example of a redundant code is to encode a model for each datum. Having a model allows the datum to be encoded more concisely, but there are multiple possible models, introducing redundancy. Such codes are easier to de-code than non-redundant codes, only the sender need perform model estimation. MML is an example of this.

If an MML code is transmitted with a sub-text, the information from the choice of model can be recovered and perhaps shouldn't be counted in the message length. This kind of consideration already occurs in Snob, although hackishly: part of the text is converted into sub-text.

I propose that

Such a text/sub-text protocol is the best way of thinking about MML. Assume and model/data pair is part of an on-going communication. This allows the lattice constant terms to be dropped from the equations, making everything rather simpler. (Note: Chris Wallace has published a paper along these lines, this isn't a new idea on my part. There's been a curious lack of follow-through on the idea though.)

Text/sub-text encoding is also a good description of human language usage. Note also that in noisy environments or environments that require high accuracy (eg aeroplane radio, military) sub-text tends to get dropped to increase the robustness of the text.