Author Topic: How to include multiple regular expressions in a ValueConv expression?  (Read 3537 times)

CWCorrea

  • Newbie
  • *
  • Posts: 2
Hello Phil,

First, thank you very much for such fine application as ExifTool. Now I cannot imagine my life without it. Second, please bear with me as I'm totally new to Perl (actually I started learning about regular expressions and Perl just because ExifTool), and third, please excuse me for the long post.

Now, this is what I would like to do:

I have several thousand pictures taken with different cameras and I would like to sort them by camera make and model. I reviewed the ExifTool documentation and several examples in the ExifTool Forum and I understand how to do it. As several pictures have the Model tag with not-valid characters for my filesystem (btw, I use OS X Snow Leopard), I decided to create an user defined tag called MyModel with a regex that deletes any character not valid for me (I used one example that you gave us in the forum).

Code: [Select]
%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MyModel => {
            Require => 'Model',
ValueConv => '$val =~ s/[^A-Za-z0-9\-\_\.\,\(\)\ ]//g; $val',
        },
    },
);
1;  #end

Using the MyModel tag works well, but I don't like the end result so I decided to find a way to process the Model tag contents to suit my taste and needs. For this, I created a list of unique models with the following command:

exiftool -s -r -T -Model . | sort -u > models.txt

Then I wrote a small Perl script to apply some regular expressions to the list of camera models:

Code: [Select]
#!/usr/bin/perl -w
#

use strict;
use warnings;

my $InFile;
my $argnum;

foreach $argnum (0 .. $#ARGV) {
$InFile = $ARGV[$argnum];
open FILE, $InFile or die $!;
while (<FILE>) {
my($model) = $_;
chomp($model);
#
# Remove non-printable characters
#
$model =~ s/[^[:print:]]+//g;
# Remove excess horizontal and vertical whitespace
# e.g.: "DC200      (V01.00)" gets transformed into "DC200 (V01.00)"
#
$model =~ s/[\h\v]+/ /g;
#
# Remove whitespace from the start and end of the string
# e.g.: " PDC 5350" gets transformed into "PDC 5350"
#
$model =~ s/^\s+//;
$model =~ s/\s+$//;
#
# Replace slash and underscore characters with hyphen-minus
#
$model =~ tr{/_}{--};
#
# Removes characters different from [A-Z],[a-z],[0-9],'-','.',',','(',')' and space
# e.g.: "PENTAX *ist DL" gets transformed into "PENTAX ist DL"
#
$model =~ s/[^A-Za-z0-9\-\.\,\(\)\ ]//g;
#
# If after transformation the $model variable is empty give it the value "Unknown model"
#
if ($model eq '') {
$model ='Unknown model';
}
#
# For some Hewlett Packard cameras, cleans strange characters after (Vdd.dd)
# e.g.: "HP Photosmart M22 (V01.00) +ëÕKÄ" gets transformed into "HP Photosmart M22 (V01.00)"
#
$model =~ s{(\(V\d\d\.\d\d\)).*}{$1};


print "$model\n";
}
}
close FILE or die $!

Finally I got the results I was looking for, but now I have seven regular expressions and one conditional statement to work with.

My question is: how can I include multiple regular expressions and the conditional in a ValueConv expression in the MyModel user defined tag?

I guess that the solution is to use a code reference like ValueConv => sub { } just like the one used in the BigImage tag example in the sample .Exiftool_config file. Unfortunately I still do not find a way to correctly include my Perl code into the code reference for ValueConv.

I would appreciate any suggestion on this. Thank you!

Kind regards,

Christian W. Correa

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14896
    • ExifTool Home Page
Hi Christian,

Looking good so far.  I admit the user-defined tag documentation isn't very well organized.

The easiest way is like this:

Code: [Select]
ValueConv => q{
    my $model = $val[0];
    # place arbitrary regular expressions and other Perl code here
    ...
},

Using a code reference is also possible:

Code: [Select]
ValueConv => sub {
    my $val = shift;
    my $model = $$val[0];
    # place your code here
    ...
},

however, with the 2nd method the formatted values are not immediately accessible. (Although this shouldn't matter for you because there is no print conversion for the Model tag.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

CWCorrea

  • Newbie
  • *
  • Posts: 2
Hello Phil,

Thank you for your prompt answer. I feel like I've discovered a gem!

Here is my implementation on the MyModel tag using your first method, it really works great! Maybe others will find it useful. I included my comments so newbies like me can understand what each regular expression does:

Code: [Select]
%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
#
# Cleans Model text
#
        MyModel => {
            Require => 'Model',
ValueConv => q{
my $model = $val[0];
#
# Remove non-printable characters
#
$model =~ s/[^[:print:]]+//g;
# Remove excess horizontal and vertical whitespace
# e.g.: "DC200      (V01.00)" gets transformed into "DC200 (V01.00)"
#
$model =~ s/[\h\v]+/ /g;
#
# Remove whitespace from the start and end of the string
# e.g.: " PDC 5350" gets transformed into "PDC 5350"
#
$model =~ s/^\s+//;
$model =~ s/\s+$//;
#
# Replace slash and underscore characters with hyphen-minus
#
$model =~ tr{/_}{--};
#
# Removes characters different from [A-Z],[a-z],[0-9],'-','.',',','(',')' and space
# e.g.: "PENTAX *ist DL" gets transformed into "PENTAX ist DL"
#
$model =~ s/[^A-Za-z0-9\-\.\,\(\)\ ]//g;
#
# If after transformation the $model variable is empty give it the value "Unknown model"
#
if ($model eq '') {
$model ='Unknown model';
}
#
# For some Hewlett Packard cameras, cleans strange characters after (Vdd.dd)
# e.g.: "HP Photosmart M22 (V01.00) +ëÕKÄ" gets transformed into "HP Photosmart M22 (V01.00)"
#
$model =~ s{(\(V\d\d\.\d\d\)).*}{$1};
return $model;
},
        },
},
);

1;  #end

On a related topic, I think it would be interesting to have a section in the forum where users can share and discuss code snippets and recipes to do interesting things with ExifTool. That would be a great way to empower users so we can learn by example and help increase the knowledge of the ExifTool user community.


Christian W.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14896
    • ExifTool Home Page
Hi Christian,

Glad that worked.  Thanks for posting your config file.

We tried having a "Solutions" board in this forum, which is close to what you suggested, but nobody ever posted there.  What you suggest is maybe better suited to a Wiki, but organizing the Wiki in a useful way would take some effort.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).