Let A and B be bit strings (sequences of 0’s and 1’s).
If A is equal to the leftmost length(A) bits of B, then A is said to be a prefix of B.
For example, 00110 is a prefix of 001101001, but not of 00111 or 100110.
A prefix-free code of size n is a collection of n distinct bit strings such that no string is a prefix of any other. For example, this is a prefix-free code of size 6:
0000, 0001, 001, 01, 10, 11
Now suppose that it costs one penny to transmit a ‘0’ bit, but four pence to transmit a ‘1’.
Then the total cost of the prefix-free code shown above is 35 pence, which happens to be the cheapest possible for the skewed pricing scheme in question.
In short, we write Cost(6) = 35.
What is Cost(109) ?
This problem is an application of the Huffman coding scheme, a strategy for choosing bit sequences optimally in fact. For creating a prefix-free code, Huffman algorithm is one of the most efficient approaches.
The optimal strategy to minimize the costs involves choosing the cheaper bit (0 in this case) more frequently.
Since Huffman algorithm tells us to use the more frequent symbols with less bits, we start from a binary tree: each sibling (a node’s direct path to another node) represents a bit, where left is 0 and right is 1. The level of the node indicates how many bits are in the prefix code associated with that node. So nodes at the first level (just below root) have 1 bit codes, at second level 2 bits codes, and so on.
Then we start allocating these codes to our input size, first assigning the cheaper bits. So the distribution would look something like this:
– 1st level: 1 node with 0 bit (1 penny)
– 2nd level: 2 nodes with 0 bits (2 pennies)
– 3rd level: 4 nodes with 0 bits (4 pennies)
– 4th level: 8 nodes with 0 bits (8 pennies)
We repeat this until we’ve got less nodes than a full level can accommodate. For example, let’s consider an example of allocating for 15 codes:
– 1st level: 1 node with 0 bit (1 penny)
– 2nd level: 2 nodes with 0 bits (2 pennies)
– 3rd level: 4 nodes with 0 bits (4 pennies)
– 4th level: 7 nodes with 0 bits (7 pennies), 1 nodes with 1 bit (4 pennies)
Cost(15) = 1 + 2 + 4 + 7 + 4 = 18.
In the general case, let’s consider an example of allocating for 109 codes:
As 2 to the power of 6 is 64, and 2 to the power of 7 is 128, we know that we can fill full levels up to level 6, which accommodates 64 nodes. The remaining 45 codes are allocated at the 7th level. Since 45 is more than half of 2^6 (64), we can see the 0 bits are all taken, so 64 codes will consist of 0’s (costing one penny each) and the remaining will consist of 1’s (costing four pennies each).
Here is a calculation:
– 1st level: Cost(1) = 1 penny (0 bit)
– 2nd level: Cost(2) = 2 pennies
– 3rd level: Cost(4) = 4 pennies
– 4th level: Cost(8) = 8 pennies
– 5th level: Cost(16) = 16 pennies
– 6th level: Cost(32) = 32 pennies
– 7th level: Cost(45) = 45 pennies for 0 bits + 4 * (109 – 1 – 2 – 4 – 8 – 16 – 32 – 45) pennies for 1 bits.
Now adding all them up:
Cost(109) = 1 + 2 + 4 + 8 + 16 + 32 + 45 + 4 * (109 – 108) = 108 + 4 = 112 pennies.
Hence, the cost for a prefix-free code of size 109 with the given pricing scheme is 112 pennies.
More Answers:
The Primality of $2n^2 – 1$Balanced Numbers
Perfect Right-angled Triangles