LeetCode 137 is to find the number which apprears exactly once in an array assuming other numbers always appear exactly three times. It's very easy to come up with a linear runtime algorithm by using hash table which takes O(n) space. A better approach is to modulo count the number of occurrences of 1 at every bit and compose the final result based on this information. But to implement it efficiently, some bitwise tricks are used which remind me some good old college time when I was learning Digital Circuit and CPU Architeture. So I think I should take these notes down here. Also, this problem could be generalize to find the number which apprears exactly p times in an array while others always appear k times, assuming (p < k). And for LeetCode 137, there is also a two-line solution based on the special parameters (k, p) = (3, 1). I will talk about it at the end.
First, we can use hash table to count the appearance of every number and then check which number's count is exactly one. This can give a linear runtime algorithm having linear space complexity. Some people argues that this is gonna be practically slow because hashing is an expensive operation. But for this special problem, we can use a simple modulo function to compute the hash number. But one thing to note is that although we can choose a prime large enough to give extremely low collision rate, this algorithm is not strictly guaranteed to be always correct.
Another idea is that we can count the occurence of 1 at every bit of the
numbers. The difference is that whenever the count reaches k we reset it to 0
and carry on. At the end, since all numbers except one appear k times, the final
value of the count is actually p or 0. We can easily construct the number itself
based on this information. The only thing here is how to implement these
counters efficiently. A trivial implementation could use an array and calculate
every bit of every number and count it accordingly. Definitely, the time
complexity will be
O(nlog(MAX_INT))and the space complexity
O(log(MAX_INT)). But by using some bitwise tricks, we can change these two
metrics to have a
log(k) scale factor. When k is less than MAX_INT, we can
have an algorithm with better asymptotic performance.
For simplicity, we will assume the numbers are 32 bits long. So we need 32 counters in total. The key observation here is that if k is of m bits, we are having a (32, m) matrix of 1-bit value. The above algorithm we mentioned are updating the counters from a row based point of view. But we can also update it in a column based way.
m...321 count n1 n2 n3 ... nk ... 1 ....... <- . . . . 2 ....... <- . . . . 3 ....... <- . . . . . . . . . . . . . . . . . . 32 ....... <- . . . .
If we can found a way to update
counter[i][m] bitwisely with
1-bit number n. Then we can pack the updates of all 32 counters at an
integer. We can see this technique as a bitwise level parallel computation. But
how can we update
counter[i][1..m] bitwisely? For simplicity, we use xm...x1
to represent the counter. There are two things we need to work out
The reset part is relative easy. If you have experience with circuit design, it is easily to see that we can check if k reaches by compute a value out of xm..x1. The fundamental rule of circuit design or digital logic is that you can work out a mapping between any bit pattern once you have written down the truth table. Assume k = 0b101, we are expecting following truth table for reset
x 321 -> x 321 000 000 001 001 010 010 011 011 100 100 k 101 000
We can see only when
x3x2x1 == 101,
x3 & (~x2) & (x1) == 1. Then the reset
could be expressed as
mask = ~(x3 & (~x2) & x1) x3 = x3 & mask x2 = x2 & mask x1 = x1 & mask
To work out the add part, we start with x1. Because for x1, this is no carry. We are expecting following truth table
x 0 n -> x 0 0 0 0 1 0 1 0 1 1 1 1 0
The mapping could be written out directly as
x0 = ((x0) & (~n)) | ((~x0) & (n))
= x0 ^ n. Perhaps, this is why XOR sign is chosen to be a circle plus, just to
remind you of its ability to act as a circle around in-place plus without carry
for circuit design.
To calculate x2..xm, we need to take carry into consideration.
x i carry -> x i 0 0 0 1 0 1 0 1 1 1 1 0
Actually, it's the same table for x0. So we can write out
xi = xi ^ carry_i.
But how to compute carry? Just think when we will have a carry for xi. Only when xi-1, ..., x1, and n are all 1. This leads to
carry_i = xi-1 & .. & x1 & n
Then we can write out the code to update these counters. Note, we generalize the compuatation of x0 by using n as carry0
x = [0 for _ in xrange(m)] k = bin(k)[2:][::-1] for n in nums: carry = n for i in xrange(m): x[i], carry = x[i] ^ carry, x[i] & carry mask = -1 for i in xrange(m): mi = x[i] if k[i] == '1' else ~x[i] mask = mask & mi mask = ~mask for i in xrange(m): x[i] = x[i] & mask the_number = 0 for i in xrange(m): the_number = the_number | x[i] return the_number
At the end of above code, we add a loop to compute the single number. Since its final occurence count of '1' at any bit is either p or 0, the bit value at ith bit of the single number can be computed the single number by
the_number = xm | ... | x1
Besides the general solution, there could also be a specialized solution for specific k. If we know any specific k for sure, we can do better. As in LeetCode 137, k = 3. The truth table to update count is
x 10 n -> x 10 00 0 00 01 0 01 . 10 0 10 00 1 01 . 01 1 10 10 1 00
Here comes the true beauty of digital logic. We can write out the mapping directly as
x0 = ((~x1) & x0 & (~n)) | ((~x1) & (~x0) & n) = (~x1) & ((x0 & (~n)) | ((~x0) & n)) = (~x1) & (x0 ^ n)
You can save the original value of x0, and then use it to calculate x1 in the smae way above. But we can also use the computed value of x0 as a new input to calculate x1. Namely, we substitute the left x0 column with the right x0 column in above truth table. Then we have
x 10 n -> x 10 00 0 00 01 0 01 10 0 . 10 01 1 01 00 1 . 10 10 1 00
Remeber, the true beauty. We have
x1 = (x1 & (~x0) & (~n)) | ((~x1) & (~x0) & n) = (~x0) & ((x1 & (~n)) | (~x1) & n) = (~x0) & (x1 ^ n)
Then we have following lines of magic to solve LeetCode 137. Note p = 1. It means x1 will always be 0 at the end.
x = [0, 0] for n in nums: x = (~x) & (x ^ n) x = (~x) & (x ^ n) return x
At the end of this post, take a while to savor the true beauty of digital logic. It could be singular as well as single.