hexdump.c: hexdump library

description

hexdump.c is a C library implementation of the arcane BSD command-line utility, hexdump(1). It allows specifying simple formatting programs for display and analysis of binary blobs. For example, the format specification

"%08.8_ax  " 8/1 "%02x " "  " 8/1 "%02x "
"  |" 16/1 "%_p" "|\n"

produces the more familiar output

00000000  54 68 65 20 71 75 69 63  6b 20 62 72 6f 77 6e 20  |The quick brown |
00000010  66 6f 78 20 6a 75 6d 70  73 20 6f 76 65 72 20 74  |fox jumps over t|
00000020  68 65 20 6c 61 7a 79 20  64 6f 67                 |he lazy dog|

hexdump.c can be built as a simple library, a Lua module, or a command-line utility; or just dropped into your project without fuss. In addition to a few bugs (see below), unlike the traditional utility hexdump.c allows specifying the byte order of the converted words. It also allows processing of odd sized words (e.g. 3 bytes), and in the future will allow processing of words larger than 4 bytes.

I wrote hexdump.c because I kept rewriting fixed format ouput generators over-and-over, in both C and various scripting languages; for simple hexadecimal conversion of checksums, analysis of I/O buffers, etc. Finally I said to myself, why not solve this once and for all. hexdump(1) is what I use on the command-line, so I decided to copy its semantics.

Instead of refactoring or otherwise copying the BSD implementation, I took the opportunity to flex some creative muscle and implement hexdump by translating the formatting specification to instructions for a simple virtual machine. I mean... why not, right?

hexdump.c is fairly conformant to the manual page description of hexdump(1). Known bugs include the lack of floating point support (i.e. no %E, %e, %f, %G, or %g conversions), the inability to handle %_A address conversions, and in a multiline format string no implicit looping of a trailing formatting unit to consume the remainder of a block. Because hexdump.c doesn't generate a parse tree of the formatting string, these latter two are more difficult to support and will have to wait until I have the patience to add the necessary black magic (i.e. splicing instructions into the generated code after analyzing more context).

Note that the original BSD implementation contains a typo, printing the ASCII label "dcl" instead of "dc1" for the %_u conversion of octet 021 (0x11). This typo also manifests in the POSIX od(1) utility, which on BSD systems is implemented with hexdump. I've filed bug reports with FreeBSD, OpenBSD, NetBSD, Debian, and Dragonfly BSD. Apple appears to have quietly fixed their copy of BSD hexdump in OS X. (UPDATE: OpenBSD, NetBSD, FreeBSD, and Debian have fixed this in their respective trunks.)

I'm unaware of any other independent implementations of hexdump. Some Linux distributions repackage BSD hexdump, although not od, which comes from GNU core-utils. RedHat's "hexdump" from util-linux appears to be a simple wrapper to GNU core-utils od, which cannot handle arbitrary formats, yet which is ironically much larger than BSD hexdump or hexdump.c, which belies POSIX's stated reason for excluding hexdump. Solaris doesn't provide anything named hexdump; just od.

todo

Finish hxd_help() to describe where in the format string parsing failed.

Improve conformance.

Add new conversions, such as a Base64 encoder and ANSI-like cursor movements for more sophisticated formatting declarations.

news

2013-09-26

Fixed 32-bit word conversion bug found with Clang Static Analyzer.

Tagged rel-20130926 (50d164dce3b0ced83a6c250a91adb723aa2d8283).

2013-04-26

Some more MinGW patches from Ross Berteig for things I overlooked.

2013-04-12

Tagged rel-20130412 (2c85bc369a6c53730da0b4f4c16a1d0f8e56850d).

2013-04-12

Add MinGW patches and documentation from Ross Berteig.

Add capability to disable padding. Instead of padding-out a formatting specification when the buffer is empty, the entire unit is skipped if the unit byte count is larger than bytes remaining in the block buffer. This can be done on a per-unit basis by appending "?" to the unit byte count, or globally by passing the HXD_NOPADDING flag to hxd_open().

This allows hexdump.c to mimic the -i switch of xxd(1), which generates a C array initializer. The format is " " 12/1? "0x%02x, " "\n".

2013-02-12

Add some conversion instructions which bypass snprintf. hexdump.c can now outperform both BSD hexdump(1) and GNU od(1), depending on the format string and the number of fast conversions issued. (Solaris od(1), however, is crazy fast.)

2013-02-11

Moved predefined hexdump(1) formats -b, -c, -C, -d, -o, -x from inlined string literals within the command-line utility getopt() block to public API macros in the header. In C these are respectively available as HEXDUMP_b, HEXDUMP_c, HEXDUMP_C, HEXDUMP_d, HEXDUMP_o, and HEXDUMP_x. And from Lua as "b", "c", "C", "d", "o", and "x" fields of the module table.

2013-02-11

Tagged rel-20130210 (80db760f6eea95cc5098b3d33c438061ec662ee5).

2013-02-10

Fixed width specification bug.

Added flags to specify byte order of loaded words, and default to the native endian order instead of big-endian.

Added all the pre-specified formats to the command-line utility, including -b, -c, -C, -d, -o, and -x.

Fixed Lua module to export, accept, and apply byte order flags.

2013-02-09

Fixed compilation on Linux and Solaris. Unlike every other platform, Linux defaults to a C99 environment—lacking POSIX interfaces—which is why everybody reflexively defines _GNU_SOURCE. In this case, it was merely needed for getopt(3) for the command-line utility. Solaris required a few tweaks to build with SunPro C. The default CFLAGS in the included Makefile won't work for Solaris. If using c99(1) on Solaris, similar to Linux you'll need to define _XOPEN_SOURCE appropriately to access getopt(3).

Fixed processing of stdin from command-line utility.

Added Lua bindings. Requires GNU Make now so the proper compiler and linker flags can be applied.

Tagged rel-20130209 (28638b4bc1f3753bb56d7c5f20d7a283df9fe786).

2013-02-08

Published project.

Tagged rel-20130208 (1ac7880b426ecd702d09b666fd1c8e3de1326de1).

usage

Create a context with hxd_open(). Compile the format string with hxd_compile(). Write your data as smallish chunks using hxd_write(), interleaving consumption of the formatted data with hxd_read(). Use hxd_flush() to signal end-of-file, and read any remaining data with hxd_read() again.

The context can be reused—without recompiling the format string—by calling hxd_reset(), which will reset the internal buffers and other state. Then call hxd_write(), hxd_flush(), and hxd_read() like before.

The source file contains a simple command-line utility when built with -DHEXDUMP_MAIN. Compiling with -DHEXDUMP_LUALIB will expose and build the Lua bindings.

The header file contains inline documentation of the interfaces.

license

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

source

git clone https://25thandClement.com/~william/projects/hexdump.git

Or visit the GitHub mirror

download

hexdump-20130926.tgz (may not be the most recent release)

other projects