Author Topic: emojis, DOS 8.3 filename  (Read 563 times)

Yagop

  • Newbie
  • *
  • Posts: 5
emojis, DOS 8.3 filename
« on: March 29, 2019, 10:46:20 PM »
Hello. Right now I use ExifTool 11.33 for Windows.

When there are emojis (or even a single emoji) like smilies and flags for example in the filename, ExifTool will perform its intended tagging action but it converts the image's filename into DOS 8.3 format (all capital letters plus the tilde). One particular metadata I can't tag correctly then, for example is, "-OriginalFileName<${filename;s/\.jpg$//i}", because the short DOS 8.3 filename is tagged instead of the long filename with emojis. I think whatever the tagging job is, ExifTool will always convert to DOS 8.3 filename as long as the original long filename has an emoji.

YouTube for example has some video titles with emojis and I use those video titles as filenames for the thumbnails I downloaded (eg. the hqdefault.jpg, maxresdefault.jpg). And I use ExifTool in a batch file to handle multiple images easily. So if I have many images with emojis, then I won't be able to remember their original long filenames after they are converted to DOS 8.3 format.

Thank you and more power.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #1 on: March 30, 2019, 09:37:43 AM »
Thanks for this report, but I don't think there is much I can do about this because of the poor support for special characters in Windows file names in Perl.

You'll have to find some external utility to work around this.  For example, if you could write the filename in UTF8 to a sidecar.txt file, then you could do this in ExifTool:

exiftool "-originalfilename<=%d%f.txt" ...

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

  • Sr. Member
  • ****
  • Posts: 287
Re: emojis, DOS 8.3 filename
« Reply #2 on: March 30, 2019, 11:37:06 AM »
Hello Yagop, hello Phil,

please allow an additional question:
Quote
but it converts the image's filename into DOS 8.3 format
When is this done by Exiftool?

I also use Exiftool 11.33 on a Windows 7 system and exiftool.exe is started with -stay_open by my C++ application.
I did a short test with a filename that contained 1 emoji.
Code: [Select]
exiftool.exe -iptc:header<${filename} testfile.jpg with proper charsets UTF8 for filename and IPTC worked properly.

What did I understand wrong?

Best regards
Herb

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #3 on: March 30, 2019, 11:02:45 PM »
Quote
but it converts the image's filename into DOS 8.3 format
When is this done by Exiftool?

ExifTool doesn't do this.  It is possible that this is done somehow in the standard libraries that ExifTool uses.

Quote
I also use Exiftool 11.33 on a Windows 7 system and exiftool.exe is started with -stay_open by my C++ application.
I did a short test with a filename that contained 1 emoji.
Code: [Select]
exiftool.exe -iptc:header<${filename} testfile.jpg with proper charsets UTF8 for filename and IPTC worked properly.

I have seen similar problems where the behaviour seems to depend somehow on the system settings.

Yagop: What version of ExifTool are you using?  Newer versions try to use the Windows-specific I/O libraries if possible, rather than the standard libraries.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Yagop

  • Newbie
  • *
  • Posts: 5
Re: emojis, DOS 8.3 filename
« Reply #4 on: March 31, 2019, 01:20:17 AM »
Thanks to both of you and your examples/tips. So I just then used an external unicode/UTF8 text file (eg. z.txt) wherein it contains the emoji text, then used "-OriginalFileName<=z.txt". It worked and the emojis are tagged inside but the images with the emoji filenames are still converted to DOS 8.3 format. I'm OK with that since I can still rename them back, for example using ExifToolGUI (just tested now too) to view and copy the emoji tags I just tagged earlier.

My intention is to preserve whatever the filenames are no matter how peculiar (unicode/emoji/etc) at least as tags inside, because I lost a few hard disks before and I lost the filenames of my many files even after recovery. Thus if I could embed the filenames, then I could rename them back.

Yagop: What version of ExifTool are you using?  Newer versions try to use the Windows-specific I/O libraries if possible, rather than the standard libraries.
Right now I use ExifTool 11.33, and Windows 7 64-bit.

Thanks for this report, but I don't think there is much I can do about this because of the poor support for special characters in Windows file names in Perl.
Sad to hear that, but I'm OKwith that if that's the case. A similar emoji case for MKVToolNix where it just recently updated its particular library to handle emojis for the first time.

Again, thanks.

herb

  • Sr. Member
  • ****
  • Posts: 287
Re: emojis, DOS 8.3 filename
« Reply #5 on: March 31, 2019, 11:31:45 AM »
Hello,

@Phil: Thanks for the clarifications.
Just another info: I repeated the test (of my previous post) also with Perl: in detail CitrusPerl 5.24.1 ( and no further package installed) and of cource the Exiftool Perl-package. The decribed error did not occur.

@Yagop: So I would be interested which "environment" you are using that changes the "unicode-filename" to a "dos-8.3-filename"

Thanks and best regards
Herb

Yagop

  • Newbie
  • *
  • Posts: 5
Re: emojis, DOS 8.3 filename
« Reply #6 on: July 30, 2019, 12:35:20 PM »
@Yagop: So I would be interested which "environment" you are using that changes the "unicode-filename" to a "dos-8.3-filename"
Apologies for very late reply as I don't visit the web quite often anymore nowadays. Not sure if I understood correctly, but here goes. Again, I use Windows 7 SP2 64-bit.

My goal is to preserve the filenames as tags (and any other available info as tags) so that I could recover those info again, like rename back the files to its original filenames, for example after a disastrous hard disk failure ever struck again. Back in time when I don't have ExifTool, I forever lost all the filenames of my accumulated images and other files.

One scenario where I am forced to embed emojis, is for those YouTube videos I download which have emojis in their titles. After my past experiences including those disk failures, I ended up using Matroska as my prefered media container especially because of its very flexible tagging support. Perhaps just last year with the help of the author of chapterEditor, the developer of MKVToolNix updated one of its libararies to be able to handle emojis for its next release. Thus apparently the Matroska format itself and the related third-party programs benefitted.

In my experience, there are two occasions where ExifTool inadvertently converts the filenames to DOS 8.3 format.
  • If a JPG (or any image file) filename contains emojis, and after I use ExifTool to tag them, they are converted to 8.3.
  • If inside a folder I have a filetype (be it an image file, movie file, audio file, doesn't matter) which contains an emoji, and after I use ExifTool to tag an image in that same folder, the filetype is converted to 8.3. Doesn't matter if the image file I tagged with ExifTool is plain Latin alphanumeric-filenamed or if it have an emoji itself, all other filetypes which have an emoji with them as filenames are inadvertently converted to 8.3.
I just ended up tagging the MKVs with the emojis normally (using for example MKVToolNix itself, Mp3tag, chapterEditor), but for the filenames, I simply removed the emojis. They are preserved as tags inside the MKV after all. So that I could prevent any accidental 8.3 conversion. Thus same approach with image files (using ExifTool). And even if I overlooked and accidentally converted to 8.3, I could still recover the original long filenames from the Matroska and JPG tags.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #7 on: July 30, 2019, 12:44:59 PM »
If inside a folder I have a filetype (be it an image file, movie file, audio file, doesn't matter) which contains an emoji, and after I use ExifTool to tag an image in that same folder, the filetype is converted to 8.3.

You're saying that using ExifTool to write to one file in a folder causes the names of other files in the folder to be converted to 8.3 format?  This can't be ExifTool that is doing this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

  • Sr. Member
  • ****
  • Posts: 287
Re: emojis, DOS 8.3 filename
« Reply #8 on: July 30, 2019, 03:08:41 PM »
Hello,

@Yagop: Can you please tell us how you call Exiftool (e.g. via DOS-box) and can you please give an example of your Exiftool command also with an explicitely used filename etc. etc

Thanks and
Best regards
Herb

herb

  • Sr. Member
  • ****
  • Posts: 287
Re: emojis, DOS 8.3 filename
« Reply #9 on: August 12, 2019, 02:35:32 AM »
Hello Phil,

there is really something strange when Exiftool has to work with filenames that contain surrogate characters:

I did the following test with the WIN-version of Exiftool 11.61.
I have a directory those pathname does only contain ascii characters - F:\dirtest -  and in the directory there is 1 image.
The filename contains ascii characters and also an emoij (which is a surrogate character and which is represented here with X) - P11982XX.JPG

Using the following command to e.g. create an IPTC tag
Code: [Select]
exiftool.exe -charset filename=utf8 -IPTC:Caption-Abstract=caption -progress -ext jpg F:\dirtestI get as response from Exiftool:
Code: [Select]
Warning: [Win32::FindFile] No support for unicode surrogates - F:/dirtest
Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG

The strange thing now also is:
- the original image is deleted/removed
- an image file is created with 8.3-format filename: P11982~1.JPG_original

Which part of exiftool or Perl can do this?

Hint:
When I start Exiftool and specify the filename (with emoij) explicitely all is working properly.

Thanks for your help in advance.

Best regards
Herb

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #10 on: August 12, 2019, 08:01:57 AM »
This is unfortunate.  It must be the Win32::FindFile package that is somehow renaming the file.  I've looked into this package and it looks like it just calls the Windows function FindNextFileW.  I can't find any references for problems like this with FindNextFileW, and I don't understand why it should do anything to the file names.  Unfortunately I don't know what I can do to help with this problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

obetz

  • Sr. Member
  • ****
  • Posts: 153
Re: emojis, DOS 8.3 filename
« Reply #11 on: August 12, 2019, 10:01:25 AM »
This is unfortunate.  It must be the Win32::FindFile package that is somehow renaming the file.

The result looks like the intended rename of the original file (there is no -overwrite_original).

Without support for surrogate pairs, it might simply fall back to the short file name.

The strange thing now also is:
- the original image is deleted/removed
- an image file is created with 8.3-format filename: P11982~1.JPG_original

I guess it's not "deleted/removed" but renamed.

After all, I never would even consider to use surrogate pairs in file names. I'm sure there are other applications not supporting them correctly. I teached my colleagues, friends and family to use posix compatible file names. Plain ASCII, no whitespace.

Oliver

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #12 on: August 12, 2019, 10:09:11 AM »
The result looks like the intended rename of the original file (there is no -overwrite_original).

Without support for surrogate pairs, it might simply fall back to the short file name.

You may be right.  I'll look into this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

  • Sr. Member
  • ****
  • Posts: 287
Re: emojis, DOS 8.3 filename
« Reply #13 on: August 13, 2019, 03:48:33 AM »
Hello Phil, hello Oliver

thanks to both of you for looking into this.

In the meantime I did some tests in order to get an overview:
A)
Giving one single file (fully qualified) to Exiftool
- path and/or filename contains also emotics (surrogates)
- with or without option -overwrite_original_in_place
all is working properly

B)
Giving files to Exiftool with (e.g.) -ext jpg and path-information
- path does contain emotics (surrogates)
  I get the following information:
    1 directories scanned
    0 image files read

- only filename does contain emotics (surrogates)
  -- with option -overwrite_orignal_in-place
      All files are updated properly and
      I get a warning - surrogates not supported
     
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG [1/2]
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG [2/2]
          1 directories scanned
          2 image files updated
      Warning: [Win32::FindFile] No support for unicode surrogates - F:/Work_Eixm/Emotics/dirtest

  -- without option -overwrite _in_place
      Original imagefile is "replaced" with file <8.3-filname>.jpg_original
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG [1/2]
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG [2/2]
          1 directories scanned
          0 image files read
      Warning: [Win32::FindFile] No support for unicode surrogates - F:/Work_Eixm/Emotics/dirtest
      Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG
      Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG


Best regards
Herb

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14895
    • ExifTool Home Page
Re: emojis, DOS 8.3 filename
« Reply #14 on: August 13, 2019, 07:17:19 AM »
Ah, interesting.  Thanks Herb.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).