[phpxmlrpc] Special special chars in XML Response
korn at prometa.de
Fri Sep 21 10:04:52 BST 2007
thank you again for your valuable hints.
I managed to get some response from my XMLRPC-Server by following your
suggestions and additionally adding \case 'CP1252_':\ and \case
'CP1252_UTF-8':\ to the switch-statement we talked about earlier (not
sure which one he chooses though). Still, what I get does not seem right.
I get this if I set the xmlrpc_internalencoding to CP1252 and the
XML_OPTION_TARGET_ENCODING to UTF-8:
<value><string>aÅ dÅ â€¦Å â€ â€žâ‚¬Æ’dâ€¦câ€ Æ’Ë†dÅ â€ ccÅ dË†â€žâ€°â€žâ€¡â€¡bâ€¦a</string></value>
And this if XML_OPTION_TARGET_ENCODING is ISO-8859-1:
<value><string>cbâ€°dâ€¡â€¦â€žÅ â€¦â€ â€šÅ eË†baâ€šeeaeâ‚¬Æ’dâ€ â€°â‚¬â€šcâ€šfâ€°</string></value>
(It is just a sessionID with numbers and letters)
Unfortunately, I do not understand much of all that encoding-stuff.
For now I am switching over to another approach where dataloss cannot
absolutely be ruled out (in case of CP1252 encoded chars).
Thanks for your help.
Gaetano Giunta schrieb:
> Ok, I have seen that line 922 is actually line 932 on my version of
> the lib.
> This hints to the fact that you are writing an xmlrpcserver.
> xmlrpc_defencoding has nothing to do with the problem.
> The patch I would recommend to xmlrpcs.inc is the following:
> if (!in_array($GLOBALS['xmlrpc_internalencoding'],
> array('UTF-8', 'ISO-8859-1', 'US-ASCII')))
> XML_OPTION_TARGET_ENCODING, 'UTF-8');
> XML_OPTION_TARGET_ENCODING, $GLOBALS['xmlrpc_internalencoding']);
> What this patch does is that
> - it makes sure that no warning is emitted
> - most importantly, it makes sure the charset encoding of the data as
> seen by the user code is not dependent on the encoding of data
> received over the net (as opposed to just prepending an @ in front of
> - it picks the charset encoding with the widest range, to avoid data loss
> This means that, when $xmlrpc_internalencoding is set to a charset
> other than the 3 allowed, incoming data will always be in UTF8.
> It is up to your code to treat it appropriately in xmlrpc method
> handlers (eg. via utf8_decode or using mbstring for UTF8 -> CP1252
>> The answer was clearly given without enough thinking...
>> The first question is: are you using the lib to write the client, the
>> server or both?
>> Then some explanations:
>> - on line 922, if the server has received some CP1252 text, it should
>> default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also
>> change that variable? otherwise I cannot explain it...
>> - are you using php 4 or 5? there are some differences between the
>> xml parser use by php
>> - there is some more work surely to be done for everything to work
>> fine. Setting internalencoding to CP1252 before emitting (encoding)
>> data is fine, but, as you have seen, it cannot be used when decoding
>> it. And both server and client decode data (request and response,
>> respectively). Since cp1252 is not supported by the php4 xml parser,
>> we have to find some workaround
>>> Hi Gaetano,
>>> thank you for your fast reply and advice! I implemented the steps as
>>> you described, but when setting
>>> $GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the
>>> following error:
>>> Warning: xml_parser_set_option() [function.xml-parser-set-option]:
>>> Unsupported target encoding "CP1252" in
>>> ...\module_xmlrpc\lib\xmlrpcs.inc on line 922
>>> The PHP documentation says the only support ISO-8859-1, US-ASCII and
>>> UTF-8: http://de3.php.net/xml_parser_set_option
>>> How can I further tackle this issue?
>>> Thanks and best regards,
>>> Matthias Korn
>>> Gaetano Giunta schrieb:
>>>> The characters you are sending are very likely part of the
>>>> windows charset, aka, cp 1252.
>>>> There is no support for that right now, but it is quite is easy to
>>>> add it:
>>>> in xmlrpc.inc, on line 152, an array is already defined with the
>>>> necessary translation. Using array_keys() and array_values() on it,
>>>> you can modify function xmlrpc_encode_entitites(), adding a new case:
>>>> case 'CP1252_US-ASCII':
>>>> $escaped_data = str_replace(array('&', '"', "'",
>>>> '<', '>'), array('&', '"', ''', '<', '>'), $data);
>>>> $escaped_data =
>>>> $GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
>>>> $escaped_data =
>>>> array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
>>>> then of course you have to declare your internal encoding as CP1252
>>>> ... and maybe check out if there is any decoding function to be
>>>>> I have an encoding problem of some sort. The data (strings) I'm
>>>>> sending through xmlresp contains some really nasty characters
>>>>> (e.g. • „ “ …) and breaks the XML parser on the client side. Most
>>>>> of the characters get automatically converted to their
>>>>> corresponding XML entities by you library, but not those listed
>>>>> How can I convert them so that my XML parser doesn't break? (I can
>>>>> verify it's broken in Internet Explorer, which probably uses the
>>>>> same parser)
>>>>> Best regards,
>>>>> Matthias Korn
>>>>> phpxmlrpc mailing list
>>>>> phpxmlrpc at lists.usefulinc.com
More information about the phpxmlrpc