dupmerge overview
=================

Dupmerge reads a list of files from standard input (eg., as produced by 
"find . -print") and looks securely for identical files. When it finds 
two or more identical files, all but one are unlinked to reclaim the 
disk space and recreated as hard links to the remaining copy.

Remarks: 
dumpmerge should be used only for backups or archives, where duplicate
files are not needed; it should not be used without nodo mode for /home,
/tmp, /var and most other directories.
The normal mode, hard linking of multiple files, causes no problems in backups
or archives and can also be used on CDs/DVDs. On filesystems without hard
links, e. g. FAT (FAT12, FAT16, FAT32, VFAT ...), it can work only with soft
links (often called shortcuts).
The sparse mode never causes problems (on file systems which support sparse). 
The deletion mode can cause trouble e. g. with ebooks or html documents with
pictures which are multiple. Therefore the deletion mode should only be used
with files which are not assoziated, e. g. audio or video files. The deletion
mode works on all (writable) file systems.

Normal mode: Saves approx. 20 % space.

Sparse mode: Saves approx. 0.2 % space.

Deletion mode: Deletes approx. 10 % of the files.

Many similar programs can be found on freshmeat.net or sourceforge.net by
searching for duplicate.
I found clink, dmerge, duff, Dupseek, epac, fdf, fdfind, fdupe, fdupes,
find_duplicates, freedup, freedups, fslint, ftwin, highlnk, WeedIt, and whatpix.

Most of these programs are not secure: highlnk and FSlint do use md5sum
which is a cryptografical weak hash and therefore they are vunerable to md5sum
collsions. With the hashing they are fast (O(n)) but not safe.
Another point is handling files as zero-terminated strings to avoid problems
with stray filenames, which is done correct from dupmerge.

If you want to delete all hard links (regular files with more than one hard
link), you only have to type
find . -type f -links +1 -exec rm -- {} \;


RF, 2007-10-29
