Jan 21 2009
Find Duplicate Files with a Shell Script
This shell script finds duplicate files in a given directory comparing their (md5) checksum. This means the content is checked and is strictly identical, rather than the filename or date of creation.
This is usually useful to delete large files. ‘Find’ command option -size can help speeding up and finding the largest duplicate files.
admin@fileserver$ find /usr/bin -type f -print0 | xargs -0 -n1 md5sum | sort -k 1,32 | uniq -w 32 -d --all-repeated=separate | sed -e 's/^[0-9a-f]*\ *//;' /usr/bin/c2ph /usr/bin/pstruct /usr/bin/pgrep /usr/bin/pkill /usr/bin/perl /usr/bin/perl5.8.8 /usr/bin/suidperl ...
This could be run on Windows file systems mounted via Samba.