Discussion:
[Expat-bugs] [ expat-Bugs-3514595 ] "Unknown encoding error" with "iso-8859-15"
SourceForge.net
2012-04-03 14:20:43 UTC
Permalink
Bugs item #3514595, was opened at 2012-04-03 07:20
Message generated for change (Tracker Item Submitted) made by
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: https://www.google.com/accounts ()
Assigned to: Nobody/Anonymous (nobody)
Summary: "Unknown encoding error" with "iso-8859-15"

Initial Comment:
hello,
I have 2 files which looks similar except for the encoding. One (OK) is "iso-8859-1", the other (KO) is "iso-8859-15".
I have tried EXPAT from version 2.0 to 2.1 and both generates an error when parsing the "iso-8859-15" file.
Can you please explain if this is a bug or not and if not why??
I have downloaded expat from this link
http://sourceforge.net/projects/expat/files/expat_win32/2.1.0/expat-win32bin-2.1.0.exe/download
and
http://sourceforge.net/projects/expat/files/expat_win32/2.0.0/expat_win32bin_2_0_0.exe/download

and this is my result
C:\Expat-2.1.0\Bin>dir
Volume in drive C has no label.
Volume Serial Number is 455A-51B9

Directory of C:\Expat-2.1.0\Bin

03/04/2012 16:01 <DIR> .
03/04/2012 16:01 <DIR> ..
03/04/2012 14:35 5.176 FILE_KO.xml
03/04/2012 14:35 1.419 FILE_OK.xml
24/03/2012 15:32 131.584 libexpat.dll
24/03/2012 15:32 17.234 libexpat.lib
24/03/2012 15:32 498.530 libexpatMT.lib
24/03/2012 15:32 134.656 libexpatw.dll
24/03/2012 15:32 17.316 libexpatw.lib
24/03/2012 15:32 515.268 libexpatwMT.lib
24/03/2012 15:32 70.144 xmlwf.exe
9 File(s) 1.391.327 bytes
2 Dir(s) 43.431.870.464 bytes free

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_KO.xml
FILE_KO.xml:1:30: unknown encoding

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_OK.xml

C:\Expat-2.1.0\Bin>

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127
SourceForge.net
2012-04-03 15:32:31 UTC
Permalink
Bugs item #3514595, was opened at 2012-04-03 07:20
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Not a Bug
Status: Open
Resolution: Rejected
Priority: 5
Private: No
Submitted By: https://www.google.com/accounts ()
Assigned to: Nobody/Anonymous (nobody)
Summary: "Unknown encoding error" with "iso-8859-15"

Initial Comment:
hello,
I have 2 files which looks similar except for the encoding. One (OK) is "iso-8859-1", the other (KO) is "iso-8859-15".
I have tried EXPAT from version 2.0 to 2.1 and both generates an error when parsing the "iso-8859-15" file.
Can you please explain if this is a bug or not and if not why??
I have downloaded expat from this link
http://sourceforge.net/projects/expat/files/expat_win32/2.1.0/expat-win32bin-2.1.0.exe/download
and
http://sourceforge.net/projects/expat/files/expat_win32/2.0.0/expat_win32bin_2_0_0.exe/download

and this is my result
C:\Expat-2.1.0\Bin>dir
Volume in drive C has no label.
Volume Serial Number is 455A-51B9

Directory of C:\Expat-2.1.0\Bin

03/04/2012 16:01 <DIR> .
03/04/2012 16:01 <DIR> ..
03/04/2012 14:35 5.176 FILE_KO.xml
03/04/2012 14:35 1.419 FILE_OK.xml
24/03/2012 15:32 131.584 libexpat.dll
24/03/2012 15:32 17.234 libexpat.lib
24/03/2012 15:32 498.530 libexpatMT.lib
24/03/2012 15:32 134.656 libexpatw.dll
24/03/2012 15:32 17.316 libexpatw.lib
24/03/2012 15:32 515.268 libexpatwMT.lib
24/03/2012 15:32 70.144 xmlwf.exe
9 File(s) 1.391.327 bytes
2 Dir(s) 43.431.870.464 bytes free

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_KO.xml
FILE_KO.xml:1:30: unknown encoding

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_OK.xml

C:\Expat-2.1.0\Bin>

----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2012-04-03 08:32

Message:
Expat does not support iso-8859-15. It only supports UTF-8, UTF-16,
ISO-8859-1, and US-ASCII. The XML specification does not require an XML
parser to support anything else. Your best bet is to convert the document
to UTF-8 or UTF-16.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127
SourceForge.net
2012-04-04 09:11:41 UTC
Permalink
Bugs item #3514595, was opened at 2012-04-03 07:20
Message generated for change (Comment added) made by
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Not a Bug
Status: Open
Resolution: Rejected
Priority: 5
Private: No
Submitted By: https://www.google.com/accounts ()
Assigned to: Nobody/Anonymous (nobody)
Summary: "Unknown encoding error" with "iso-8859-15"

Initial Comment:
hello,
I have 2 files which looks similar except for the encoding. One (OK) is "iso-8859-1", the other (KO) is "iso-8859-15".
I have tried EXPAT from version 2.0 to 2.1 and both generates an error when parsing the "iso-8859-15" file.
Can you please explain if this is a bug or not and if not why??
I have downloaded expat from this link
http://sourceforge.net/projects/expat/files/expat_win32/2.1.0/expat-win32bin-2.1.0.exe/download
and
http://sourceforge.net/projects/expat/files/expat_win32/2.0.0/expat_win32bin_2_0_0.exe/download

and this is my result
C:\Expat-2.1.0\Bin>dir
Volume in drive C has no label.
Volume Serial Number is 455A-51B9

Directory of C:\Expat-2.1.0\Bin

03/04/2012 16:01 <DIR> .
03/04/2012 16:01 <DIR> ..
03/04/2012 14:35 5.176 FILE_KO.xml
03/04/2012 14:35 1.419 FILE_OK.xml
24/03/2012 15:32 131.584 libexpat.dll
24/03/2012 15:32 17.234 libexpat.lib
24/03/2012 15:32 498.530 libexpatMT.lib
24/03/2012 15:32 134.656 libexpatw.dll
24/03/2012 15:32 17.316 libexpatw.lib
24/03/2012 15:32 515.268 libexpatwMT.lib
24/03/2012 15:32 70.144 xmlwf.exe
9 File(s) 1.391.327 bytes
2 Dir(s) 43.431.870.464 bytes free

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_KO.xml
FILE_KO.xml:1:30: unknown encoding

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_OK.xml

C:\Expat-2.1.0\Bin>

----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2012-04-04 02:11

Message:
from what I read here
http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding
it seems that the W3C recognize the importance of the encodings in subject.
Especially as this is containing the Euro ? symbol.
[...]
Although an XML processor is required to read only entities in the UTF-8
and UTF-16 encodings, it is recognized that other encodings are used around
the world, and it may be desired for XML processors to read entities that
use them. In the absence of external character encoding information (such
as MIME headers), parsed entities which are stored in an encoding other
than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text
Declaration) containing an encoding declaration:
[...]

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2012-04-03 08:32

Message:
Expat does not support iso-8859-15. It only supports UTF-8, UTF-16,
ISO-8859-1, and US-ASCII. The XML specification does not require an XML
parser to support anything else. Your best bet is to convert the document
to UTF-8 or UTF-16.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127
SourceForge.net
2012-04-04 13:38:53 UTC
Permalink
Bugs item #3514595, was opened at 2012-04-03 07:20
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Not a Bug
Status: Open
Resolution: Rejected
Priority: 5
Private: No
Submitted By: https://www.google.com/accounts ()
Assigned to: Nobody/Anonymous (nobody)
Summary: "Unknown encoding error" with "iso-8859-15"

Initial Comment:
hello,
I have 2 files which looks similar except for the encoding. One (OK) is "iso-8859-1", the other (KO) is "iso-8859-15".
I have tried EXPAT from version 2.0 to 2.1 and both generates an error when parsing the "iso-8859-15" file.
Can you please explain if this is a bug or not and if not why??
I have downloaded expat from this link
http://sourceforge.net/projects/expat/files/expat_win32/2.1.0/expat-win32bin-2.1.0.exe/download
and
http://sourceforge.net/projects/expat/files/expat_win32/2.0.0/expat_win32bin_2_0_0.exe/download

and this is my result
C:\Expat-2.1.0\Bin>dir
Volume in drive C has no label.
Volume Serial Number is 455A-51B9

Directory of C:\Expat-2.1.0\Bin

03/04/2012 16:01 <DIR> .
03/04/2012 16:01 <DIR> ..
03/04/2012 14:35 5.176 FILE_KO.xml
03/04/2012 14:35 1.419 FILE_OK.xml
24/03/2012 15:32 131.584 libexpat.dll
24/03/2012 15:32 17.234 libexpat.lib
24/03/2012 15:32 498.530 libexpatMT.lib
24/03/2012 15:32 134.656 libexpatw.dll
24/03/2012 15:32 17.316 libexpatw.lib
24/03/2012 15:32 515.268 libexpatwMT.lib
24/03/2012 15:32 70.144 xmlwf.exe
9 File(s) 1.391.327 bytes
2 Dir(s) 43.431.870.464 bytes free

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_KO.xml
FILE_KO.xml:1:30: unknown encoding

C:\Expat-2.1.0\Bin>xmlwf.exe FILE_OK.xml

C:\Expat-2.1.0\Bin>

----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2012-04-04 06:38

Message:
I am not sure what you are suggesting. Your quote from the spec basically
states that an XML parser only needs to support UTF-8 and UTF-16 encodings,
but if another encoding is used it must be inidcated so that parsers that
might support these encodings can process them properly.

Just convert to Unicode, the Euro symbol is part of Unicode, so you should
not have a problem.

----------------------------------------------------------------------

Comment By: https://www.google.com/accounts ()
Date: 2012-04-04 02:11

Message:
from what I read here
http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding
it seems that the W3C recognize the importance of the encodings in subject.
Especially as this is containing the Euro ? symbol.
[...]
Although an XML processor is required to read only entities in the UTF-8
and UTF-16 encodings, it is recognized that other encodings are used around
the world, and it may be desired for XML processors to read entities that
use them. In the absence of external character encoding information (such
as MIME headers), parsed entities which are stored in an encoding other
than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text
Declaration) containing an encoding declaration:
[...]

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2012-04-03 08:32

Message:
Expat does not support iso-8859-15. It only supports UTF-8, UTF-16,
ISO-8859-1, and US-ASCII. The XML specification does not require an XML
parser to support anything else. Your best bet is to convert the document
to UTF-8 or UTF-16.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=3514595&group_id=10127
Loading...