A useful tool I came across recently is freedup. If you’re like me, you have potentially many copies of the same file in a number of locations on your disk (this is particularly true of me because I have version-controlled backups of my most important files). Whilst multiple copies of the same file make restoring from older copies easy, it also makes chewing through disk space easy. To solve this, it’s possible to link two identical files together so they share the same data on the disk. In essence, files on a disk are more like information on where to look for the data. That means it’s easy to get two files to point to the same location on the disk. This approach means you don’t need to have two copies of the same data, you just point the two files at the same data. These types of links are often called hard links.
Whilst it’s possible to find all duplicates manually and link the two files together (through the ln command), it’s tedious if you have hundreds or thousands of duplicates. That’s where freedup comes in handy. It can search through specified paths finding all duplicates, and hard link them together for you, telling you how much space you’re saving in the process. A typical command might look like:
freedup -c -d -t -0 -n ./
where -c counts the disk space saved by each link, -d forces modification times to be identical, -t and -0 disable the use of the external hash functions. Most importantly at this stage, -n forces freedup to perform a dummy run through. Piping the output into a new file or a pager (like less) means you can verify it’s looking in the right places for files and that it’s linking what you expected. Remove the -n and rerun the command and it’ll link those files identified in the dummy run. My experience was a several gigabyte space saving on my external disk — not something to be sniffed at.