Author Topic: No support for unicode surrogates | emoji  (Read 797 times)

Anonan

  • Jr. Member
  • **
  • Posts: 22
No support for unicode surrogates | emoji
« on: January 01, 2019, 01:58:36 PM »
The program throws the exception "No support for unicode surrogates at script/exiftool line 3553." when you use it on files that contain emoji in a file name.

The examples of file names: (see the attachment)".
This forum also does not support emoji (I can't post here examples of file names that contain emoji.).


And yes, I don't like emoji too. I don't use them, but other people do. So the support of this is needed.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #1 on: January 01, 2019, 02:22:19 PM »
Windows special characters are really a pain.  (I'm assuming you are on Windows.)

What version of ExifTool are you using?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #2 on: January 01, 2019, 02:31:57 PM »
11.2.2.0 and 11.2.3.0 (I have tested this version right now. The result is the same). Yes, I use Windows 10.

I have also tried use both cmd.exe and Git Bash.

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #3 on: January 01, 2019, 03:03:00 PM »
It also does not support symbols like https://en.wiktionary.org/wiki/º (Do not confuse with https://en.wikipedia.org/wiki/Degree_symbol, ExifTool sees ° normally.)
Example of file name: "360º Test.mp4"
In this case the program just write "No matching files".

StarGeek

  • Global Moderator
  • ExifTool Freak
  • *****
  • Posts: 2368
Re: No support for unicode surrogates | emoji
« Reply #4 on: January 01, 2019, 03:17:28 PM »
It also does not support symbols like https://en.wiktionary.org/wiki/º (Do not confuse with https://en.wikipedia.org/wiki/Degree_symbol, ExifTool sees ° normally.)
Example of file name: "360º Test.mp4"
In this case the program just write "No matching files".

This would seem to be a FAQ #18 answer, as when I change the code page to 65001, it works fine.

Code: [Select]
C:\>exiftool -g1 -a -s -PNG:all "Y:\!temp\bb\360º Test.png"
---- PNG ----
ImageWidth                      : 336
---- PNG ----
ImageWidth                      : 336
ImageHeight                     : 509
BitDepth                        : 8
ColorType                       : Grayscale with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Gamma                           : 2.2
WhitePointX                     : 0.3127
WhitePointY                     : 0.329
RedX                            : 0.64
RedY                            : 0.33
GreenX                          : 0.3
GreenY                          : 0.6
BlueX                           : 0.15
BlueY                           : 0.06
BackgroundColor                 : 255
Label                           : FinalDesignArt
ModifyDate                      : 2018:11:15 11:02:46
Troubleshooting hints:
* When posting, include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).
* Double all percent signs (%) in a Windows batch file.
* If your GPS coords are negative, make sure and set the GpsLatitudeRef and GpsLongitudeRef tags correctly.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #5 on: January 01, 2019, 05:15:31 PM »
I can't figure out that line number.  Line 3553 of exiftool version 11.22 doesn't do anything that could possibly generate a warning like that. :/

I guess I'll have to try this myself when I can.

What was the exact command you used?  (Maybe do a screen grab of the command and the warning you get.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #6 on: January 02, 2019, 06:13:46 AM »
It's strange, but today I have the exception on line 3547. (The result is the same for both 11.2.2 and 11.2.3; Win 10, RUS; "chcp 65001" does not effect on results).

I run "exiftool.exe *". And there is one or more files with emoji in a name in the folder, within that I run the command.
File names: https://pastebin.com/gtNj96mg (I can not post them here, In other way I get the forum error "The message body was left empty.")
Finally I get:
"No support for unicode surrogates at script/exiftool line 3547."
No more results are in a console.



> Maybe do a screen grab of the command and the warning you get.
Ok, I will do this later.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #7 on: January 02, 2019, 07:15:27 AM »
OK.  Line 3547 would be an error in the Win32::FindFile package.  There isn't much I can do about this.

Try not using wildcards when you specify file names on the command line.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #8 on: January 02, 2019, 07:39:44 AM »
Oh, wait. The error on line 3553 occurs when I just use "exiftool.exe FILENAME".
The wildcard usage works fine, when where are not files with these names.

Look at the attachment.
(Mirror: https://i.imgur.com/opg7Rj9.png)

CMD displays emoji incorrectly, but works with it correctly.
I can even copy these ⍰⍰ and paste to a text editor that supports a displaying unicode surrogates, and see the correct "icon".

Or I can use the command to concat all files to one – "copy /b *.txt concated.txt" and this command works fine, even if file names contain unicode surrogates (CMD just displays them like ⍰⍰).
« Last Edit: January 02, 2019, 08:20:36 AM by Anonan »

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #9 on: January 02, 2019, 08:47:20 AM »
OK.  The underlying problem is that Win32::FindFile does not support these surrogate codes.  The reason I'm using Win32::FindFile in the first place is because of the lack of built-in support in ActivePerl for Windows Unicode file names.  The situation is unfortunate, but one possible work-around could be to create a hard link with a plain ASCII name to the file with the surrogate characters, then run exiftool on the hard link.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #10 on: January 02, 2019, 09:24:52 AM »
Can this program just skip the files with unicode surrogates in a name without stopping work?
And at the end write the names of the files that were skipped to be processed manually by me.

I need to get meta info from a lot of files and only rare files contain unicode surrogates in its name, but the program does not work at all in this case.

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #11 on: January 02, 2019, 09:30:14 AM »
I'll see what I can do.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #12 on: January 02, 2019, 10:39:07 AM »
I've managed to reproduce this.  (The hardest part was figuring out how to create a file with a surrogate character in its name.  I finally did it by creating the file on a Mac then sending it to the Windows machine.)

I will patch ExifTool 11.24 to catch this error from Win32::FindFile and issue a warning or error instead.

Thanks for this report.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

  • Jr. Member
  • **
  • Posts: 22
Re: No support for unicode surrogates | emoji
« Reply #13 on: January 02, 2019, 11:11:56 AM »
> And at the end write the names of the files that were skipped to be processed manually by me.
Probably it's better show them also at the start (in "err" stream) to be able to stop the program, fix the names and restart the program. In order not to run twice.
Since the work of the program can take some minutes, when you have several gigabytes of data.


> The hardest part was figuring out how to create a file with a surrogate character in its name.
For example, right click in Chrome/Opera on a text input and the first option in the context menu.




Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 14425
    • ExifTool Home Page
Re: No support for unicode surrogates | emoji
« Reply #14 on: January 02, 2019, 12:27:44 PM »
Probably it's better show them also at the start (in "err" stream) to be able to stop the program, fix the names and restart the program.

This is problematic.  For one, there will likely be a problem interpreting the file name(s) in the ExifTool stderr messages due to character set problems.  I'll be outputting these messages in UTF-8.  The other thing is that it would be very hard for me to find these files beforehand.  So you will unfortunately be stuck trying to process them in a second pass.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).