Unicode vs Latin-1


Well it's taken over a year to figure it out, but I've finally got HubMed converting between Unicode and Latin-1 properly, so there should be no more errors when searching for names with special characters.

Perl 5.6.1 has a strange level of Unicode compatibility which means that tricks which work in 5.6.0 or 5.8 don't apply. In the end, it was a combination of the following which fixed the problem:

  1. Adding 'use utf8;' at the start of the script.

  2. Adding '$query = pack("U*", unpack("U*", $query));' to set the UTF-8 flag on the search string.

  3. Using '$query =~ s/\x{$value}/$latin{$value}/g' to replace each four-letter Unicode value with the corresponding Latin-1 character, as found in the MEDLINE character database.