Saturday, November 20, 2010

Method for Checking the Values of Strings in Perl

Although I've been convinced that Perl is far from an ideal language to work with, I was required to recently do some text processing with it. Some strings seemed equal but apparently weren't because my code wasn't behaving as desired.

My brother came to my rescue twice, and came up with this handy script:
sub print_codes {
my $value = shift;
foreach (unpack("C*", $value)) {
print " '$_'\n";
}
}


Essentially one can pass the value of a string to print_codes() and it will print the codes for each character (including invisible ones) in the string.

In my case, we determined that 1. the input string was in UTF-16 (from a Gmail .csv export) but was being compared to a UTF-8 string, and 2. chomp was only removing \n and was leaving \r. Makeshift solutions for those issues:
  1. Remove extra \0 characters from the UTF-16 strings (careful if you deal with other languages, though):
    s/\0//g;
  2. Set chomp to remove \n\r at the beginning of your code:
    $/ = "\r\n";

0 comments:

Post a Comment