Content Hashing

The idea being that similar things have similar hashes.

For comparing strings: levenshtein, similar_text, gmp_hamdist (Hamming distance), libdistance?