The stupidity of C's strtoul
I wanted to parse long strings of ascii-represented-hex to convert them into the hex they represent.
That is, "0123456789ABCDEF" into a numerical 0x0123456789ABCDEF.
Of course, the problem with this is that a number might be larger than can be parsed into a single integer - however large the type is (8, 16, 32, 64, more bits), the string may be larger. So I decided to use repeated calls to
strtoul(const char * str, char ** end, int base)
. The idea was simple:
end
would be set to point to the end of the parsed number; the subsequent call to
strtoul
would then parse the next n hex characters (16 on a modern system, usually) or until the end of the string.
But here's why I think strtoul does something really stupid: end
gets set to the end of the number (delimited by whitespace or the end of the string); not the end of what strtoul was able to parse. In other words, for strings longer than the standard 16 (or whatever) hex characters, end
does not get set to anything useful.
Coupled with endianness issues (x86 being little endian, but humans almost always writing big-endian) and needing to pad the last input (left or right depending on system endianness) in some cases, I decided that it was easier to just consume two chars at a time, check they were within the right bounds, and do character subtraction to get each byte value, and simply stick the resultant bytes into a byte array (essentially an endian-less arbitrarily large integer, already in a perfectly serialized format for transmission over a network or using a hardware communication protocol, which was the end consumer of this input anyways).
Ches Koblents
October 27, 2015