Hadoop Disk Usage in Human Readable Form

The du command in Hadoop 0.20-x version prints out the file sizes in bytes and do not have an option for a “human-readable” output like the Unix du command (i.e. file sizes in units like Kilobyte, Megabyte, Gigabyte, Terabyte etc).

I looked around and this feature supposedly went in a long time ago as part of HADOOP-4861, but it never made it into Hadoop 0.20x and 1.x versions. I checked the code for the latest version of Hadoop and it seems to have made it back into Hadoop 2.x. See line 129 of FsUsage.java.

For those of us who’re still on Hadoop 0.20x or 1.0, this is workaround using shell and AWK:

$ DIR='/*'

$ hadoop dfs -dus "$DIR" | cut -d' ' -f1,2-10 | awk '{ used = $2 ; du[1024 ** 4] = "TB"; du[1024 ** 3] = "GB"; du[1024 ** 2] = "MB"; du[1024] = "KB"; for (unit = 1024 ** 4; unit >= 1024; unit /= 1024){ if (used >= unit) { printf "%.2f %s \t %s\n", used / unit, du[unit], $1; break } }}'