[phpxmlrpc] Special special chars in XML Response

Matthias Korn korn at prometa.de
Fri Sep 21 10:04:52 BST 2007


Hi Gaetano,

thank you again for your valuable hints.

I managed to get some response from my XMLRPC-Server by following your 
suggestions and additionally adding \case 'CP1252_':\ and \case 
'CP1252_UTF-8':\ to the switch-statement we talked about earlier (not 
sure which one he chooses though). Still, what I get does not seem right.

I get this if I set the xmlrpc_internalencoding to CP1252 and the 
XML_OPTION_TARGET_ENCODING to UTF-8:
<member>
<name>SessionID</name>
<value><string>aŠdŠ…Š†„€ƒd…c†ƒˆdŠ†ccŠdˆ„‰„‡‡b…a</string></value>
</member>

And this if XML_OPTION_TARGET_ENCODING is ISO-8859-1:
<member>
<name>SessionID</name>
<value><string>cb‰d‡…„Š…†‚Šeˆba‚eeae€ƒd†‰€‚c‚f‰</string></value>
</member>

(It is just a sessionID with numbers and letters)

Unfortunately, I do not understand much of all that encoding-stuff.

For now I am switching over to another approach where dataloss cannot 
absolutely be ruled out (in case of CP1252 encoded chars).

Thanks for your help.
Matthias Korn

Gaetano Giunta schrieb:
> Ok, I have seen that line 922 is actually line 932 on my version of 
> the lib.
>
> This hints to the fact that you are writing an xmlrpcserver. 
> xmlrpc_defencoding has nothing to do with the problem.
>
> The patch I would recommend to xmlrpcs.inc is the following:
>            if (!in_array($GLOBALS['xmlrpc_internalencoding'], 
> array('UTF-8', 'ISO-8859-1', 'US-ASCII')))
>            {
>                xml_parser_set_option($parser, 
> XML_OPTION_TARGET_ENCODING, 'UTF-8');
>            }
>            else
>            {
>                xml_parser_set_option($parser, 
> XML_OPTION_TARGET_ENCODING, $GLOBALS['xmlrpc_internalencoding']);
>            }
>
> What this patch does is that
> - it makes sure that no warning is emitted
> - most importantly, it makes sure the charset encoding of the data as 
> seen by the user code is not dependent on the encoding of data 
> received over the net (as opposed to just prepending an @ in front of  
> xml_parser_set_option)
> - it picks the charset encoding with the widest range, to avoid data loss
>
> This means that, when $xmlrpc_internalencoding is set to a charset 
> other than the 3 allowed, incoming data will always be in UTF8.
> It is up to your code to treat it appropriately in xmlrpc method 
> handlers (eg. via utf8_decode or using mbstring for UTF8 -> CP1252 
> translation).
>
> Bye
> Gaetano
>> The answer was clearly given without enough thinking...
>>
>> The first question is: are you using the lib to write the client, the 
>> server or both?
>>
>> Then some explanations:
>> - on line 922, if the server has received some CP1252 text, it should 
>> default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also 
>> change that variable? otherwise I cannot explain it...
>> - are you using php 4 or 5? there are some differences between the 
>> xml parser use by php
>> - there is some more work surely to be done for everything to work 
>> fine. Setting internalencoding to CP1252 before emitting (encoding) 
>> data is fine, but, as you have seen, it cannot be used when decoding 
>> it. And both server and client decode data (request and response, 
>> respectively). Since cp1252 is not supported by the php4 xml parser, 
>> we have to find some workaround
>>
>> Bye
>> Gaetano
>>
>>> Hi Gaetano,
>>>
>>> thank you for your fast reply and advice! I implemented the steps as 
>>> you described, but when setting 
>>> $GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the 
>>> following error:
>>>
>>> Warning:  xml_parser_set_option() [function.xml-parser-set-option]: 
>>> Unsupported target encoding &quot;CP1252&quot; in 
>>> ...\module_xmlrpc\lib\xmlrpcs.inc on line 922
>>>
>>> The PHP documentation says the only support ISO-8859-1, US-ASCII and 
>>> UTF-8: http://de3.php.net/xml_parser_set_option
>>>
>>> How can I further tackle this issue?
>>>
>>> Thanks and best regards,
>>> Matthias Korn
>>>
>>> Gaetano Giunta schrieb:
>>>> The characters you are sending are very likely part of  the 
>>>> windows  charset, aka, cp 1252.
>>>> There is no support for that right now, but it is quite is easy to 
>>>> add it:
>>>>
>>>> in xmlrpc.inc, on line  152,  an array is already defined with the 
>>>> necessary translation. Using array_keys() and array_values() on it, 
>>>> you can modify function xmlrpc_encode_entitites(), adding a new case:
>>>> case 'CP1252_US-ASCII':
>>>>                $escaped_data = str_replace(array('&', '"', "'", 
>>>> '<', '>'), array('&amp;', '&quot;', '&apos;', '&lt;', '&gt;'), $data);
>>>>                $escaped_data = 
>>>> str_replace($GLOBALS['xml_iso88591_Entities']['in'], 
>>>> $GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
>>>>                $escaped_data = 
>>>> str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])), 
>>>> array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
>>>>                break;
>>>>
>>>> then of course you have to declare your internal encoding as CP1252
>>>> ... and maybe check out if there is any decoding function to be 
>>>> patched...
>>>>
>>>> bye
>>>> Gaetano
>>>>
>>>>> Hi,
>>>>>
>>>>> I have an encoding problem of some sort. The data (strings) I'm 
>>>>> sending through xmlresp contains some really nasty characters 
>>>>> (e.g. • „ “ …) and breaks the XML parser on the client side. Most 
>>>>> of the characters get automatically converted to their 
>>>>> corresponding XML entities by you library, but not those listed 
>>>>> above.
>>>>>
>>>>> How can I convert them so that my XML parser doesn't break? (I can 
>>>>> verify it's broken in Internet Explorer, which probably uses the 
>>>>> same parser)
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Matthias Korn
>>>>> _______________________________________________
>>>>> phpxmlrpc mailing list
>>>>> phpxmlrpc at lists.usefulinc.com
>>>>> http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc 


More information about the phpxmlrpc mailing list