Netexpertise

Jan 21 2009

Find Duplicate Files with a Shell Script

Published by dave at 11:02 pm under Linux

This shell script finds duplicate files in a given directory comparing their (md5) checksum. This means the content is checked and is strictly identical, rather than the filename or date of creation.
This is usually useful to delete large files. ‘Find’ command option -size can help speeding up and finding the largest duplicate files.

admin@fileserver$
find /usr/bin -type f -print0 |
xargs -0 -n1 md5sum |
sort -k 1,32 |
uniq -w 32 -d --all-repeated=separate |
sed -e 's/^[0-9a-f]*\ *//;'

/usr/bin/c2ph
/usr/bin/pstruct

/usr/bin/pgrep
/usr/bin/pkill

/usr/bin/perl
/usr/bin/perl5.8.8
/usr/bin/suidperl
...

This could be run on Windows file systems mounted via Samba.

Tags: linux, unix

No responses yet

Comments RSS

Follow @netexpertise
Categories
- Code (4)
  - Java (1)
  - Python (3)
- Database (14)
  - Ldap (2)
  - Mysql (11)
  - Oracle (1)
  - Postgresql (2)
- DevOps (9)
  - Ansible (4)
  - Jenkins (2)
  - Terraform (3)
- Docker (3)
  - Kubernetes (3)
- Mail (6)
  - Exchange (6)
- Misc (13)
  - Apache (4)
  - GLPI (2)
  - Nginx (1)
- Networking (20)
  - Cisco (8)
  - Fortinet (2)
  - Freeradius (10)
  - Monitoring (3)
- Security (7)
  - SSH (4)
- Storage (2)
- Systems (70)
  - Aix (3)
  - AS400 (14)
  - Backup (4)
  - Linux (31)
  - Solaris (3)
  - Virtualization (3)
  - Windows (21)

Get your projects done with Netexpertise
contact us »

Yearly
- 2022 (3)
- 2021 (14)
- 2020 (2)
- 2018 (1)
- 2017 (6)
- 2016 (6)
- 2015 (10)
- 2014 (4)
- 2013 (6)
- 2012 (5)
- 2011 (8)
- 2010 (11)
- 2009 (24)
- 2008 (14)
- 2007 (6)
- 2006 (4)

Systems / Networks / DevOps

Find Duplicate Files with a Shell Script

Leave a Reply

Categories

Yearly