+ # number of bytes in the set must be a power of 2.
+ #
+ # Consider a different example, the set 0x53, 0x54, 0x73, and 0x74. That
+ # requires 4 tests using either ranges or individual values, and even
+ # though the number in the set is a power of 2, it doesn't qualify for the
+ # mask optimization described above because the number of bits that are
+ # different is too large for that. However, the set can be expressed as
+ # two branches with masks thusly:
+ # (c & 0xDF) == 0x53 || (c & 0xDF) == 0x54
+ # a branch savings of 50%. This is done by splitting the set into two
+ # subsets each of which has 2 elements, and within each set the values
+ # differ by 1 byte.
+ #
+ # This function attempts to find some way to save some branches using the
+ # mask technique. If not, it returns an empty list; if so, it
+ # returns a list consisting of
+ # [ [compare1, mask1], [compare2, mask2], ...
+ # [compare_n, undef], [compare_m, undef], ...
+ # ]
+ # The <mask> is undef in the above for those bytes that must be tested
+ # for individually.
+ #
+ # This function does not attempt to find the optimal set. To do so would
+ # probably require testing all possible combinations, and keeping track of
+ # the current best one.
+ #
+ # There are probably much better algorithms, but this is the one I (khw)
+ # came up with. We start with doing a bit-wise compare of every byte in
+ # the set with every other byte. The results are sorted into arrays of
+ # all those that differ by the same bit positions. These are stored in a
+ # hash with the each key being the bits they differ in. Here is the hash
+ # for the 0x53, 0x54, 0x73, 0x74 set:
+ # {
+ # 4 => {
+ # "0,1,2,5" => [
+ # 83,
+ # 116,
+ # 84,
+ # 115
+ # ]
+ # },
+ # 3 => {
+ # "0,1,2" => [
+ # 83,
+ # 84,
+ # 115,
+ # 116
+ # ]
+ # }
+ # 1 => {
+ # 5 => [
+ # 83,
+ # 115,
+ # 84,
+ # 116
+ # ]
+ # },
+ # }
+ #
+ # The set consisting of values which differ in the 4 bit positions 0, 1,
+ # 2, and 5 from some other value in the set consists of all 4 values.
+ # Likewise all 4 values differ from some other value in the 3 bit
+ # positions 0, 1, and 2; and all 4 values differ from some other value in
+ # the single bit position 5. The keys at the uppermost level in the above
+ # hash, 1, 3, and 4, give the number of bit positions that each sub-key
+ # below it has. For example, the 4 key could have as its value an array
+ # consisting of "0,1,2,5", "0,1,2,6", and "3,4,6,7", if the inputs were
+ # such. The best optimization will group the most values into a single
+ # mask. The most values will be the ones that differ in the most
+ # positions, the ones with the largest value for the topmost key. These
+ # keys, are thus just for convenience of sorting by that number, and do
+ # not have any bearing on the core of the algorithm.
+ #
+ # We start with an element from largest number of differing bits. The
+ # largest in this case is 4 bits, and there is only one situation in this
+ # set which has 4 differing bits, "0,1,2,5". We look for any subset of
+ # this set which has 16 values that differ in these 4 bits. There aren't
+ # any, because there are only 4 values in the entire set. We then look at
+ # the next possible thing, which is 3 bits differing in positions "0,1,2".
+ # We look for a subset that has 8 values that differ in these 3 bits.
+ # Again there are none. So we go to look for the next possible thing,
+ # which is a subset of 2**1 values that differ only in bit position 5. 83
+ # and 115 do, so we calculate a mask and base for those and remove them
+ # from every set. Since there is only the one set remaining, we remove
+ # them from just this one. We then look to see if there is another set of
+ # 2 values that differ in bit position 5. 84 and 116 do, so we calculate
+ # a mask and base for those and remove them from every set (again only
+ # this set remains in this example). The set is now empty, and there are
+ # no more sets to look at, so we are done.
+
+ if ($list_count == 256) { # All 256 is trivially masked