The idea being that similar things have similar hashes.
For comparing strings: levenshtein, similar_text, gmp_hamdist (Hamming distance), libdistance?
Comments
All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.

Some interesting ideas about topic map similarity, which may be related:
http://kill.devc.at/node/186