[phpxmlrpc] Special special chars in XML Response

Gaetano Giunta giunta.gaetano at gmail.com
Wed Sep 19 23:12:27 BST 2007


Ok, I have seen that line 922 is actually line 932 on my version of the lib.

This hints to the fact that you are writing an xmlrpcserver. 
xmlrpc_defencoding has nothing to do with the problem.

The patch I would recommend to xmlrpcs.inc is the following:
            if (!in_array($GLOBALS['xmlrpc_internalencoding'], 
array('UTF-8', 'ISO-8859-1', 'US-ASCII')))
            {
                xml_parser_set_option($parser, 
XML_OPTION_TARGET_ENCODING, 'UTF-8');
            }
            else
            {
                xml_parser_set_option($parser, 
XML_OPTION_TARGET_ENCODING, $GLOBALS['xmlrpc_internalencoding']);
            }

What this patch does is that
- it makes sure that no warning is emitted
- most importantly, it makes sure the charset encoding of the data as 
seen by the user code is not dependent on the encoding of data received 
over the net (as opposed to just prepending an @ in front of  
xml_parser_set_option)
- it picks the charset encoding with the widest range, to avoid data loss

This means that, when $xmlrpc_internalencoding is set to a charset other 
than the 3 allowed, incoming data will always be in UTF8.
It is up to your code to treat it appropriately in xmlrpc method 
handlers (eg. via utf8_decode or using mbstring for UTF8 -> CP1252 
translation).

Bye
Gaetano
> The answer was clearly given without enough thinking...
>
> The first question is: are you using the lib to write the client, the 
> server or both?
>
> Then some explanations:
> - on line 922, if the server has received some CP1252 text, it should 
> default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also change 
> that variable? otherwise I cannot explain it...
> - are you using php 4 or 5? there are some differences between the xml 
> parser use by php
> - there is some more work surely to be done for everything to work 
> fine. Setting internalencoding to CP1252 before emitting (encoding) 
> data is fine, but, as you have seen, it cannot be used when decoding 
> it. And both server and client decode data (request and response, 
> respectively). Since cp1252 is not supported by the php4 xml parser, 
> we have to find some workaround
>
> Bye
> Gaetano
>
>> Hi Gaetano,
>>
>> thank you for your fast reply and advice! I implemented the steps as 
>> you described, but when setting 
>> $GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the 
>> following error:
>>
>> Warning:  xml_parser_set_option() [function.xml-parser-set-option]: 
>> Unsupported target encoding "CP1252" in 
>> ...\module_xmlrpc\lib\xmlrpcs.inc on line 922
>>
>> The PHP documentation says the only support ISO-8859-1, US-ASCII and 
>> UTF-8: http://de3.php.net/xml_parser_set_option
>>
>> How can I further tackle this issue?
>>
>> Thanks and best regards,
>> Matthias Korn
>>
>> Gaetano Giunta schrieb:
>>> The characters you are sending are very likely part of  the windows  
>>> charset, aka, cp 1252.
>>> There is no support for that right now, but it is quite is easy to 
>>> add it:
>>>
>>> in xmlrpc.inc, on line  152,  an array is already defined with the 
>>> necessary translation. Using array_keys() and array_values() on it, 
>>> you can modify function xmlrpc_encode_entitites(), adding a new case:
>>> case 'CP1252_US-ASCII':
>>>                $escaped_data = str_replace(array('&', '"', "'", '<', 
>>> '>'), array('&amp;', '&quot;', '&apos;', '&lt;', '&gt;'), $data);
>>>                $escaped_data = 
>>> str_replace($GLOBALS['xml_iso88591_Entities']['in'], 
>>> $GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
>>>                $escaped_data = 
>>> str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])), 
>>> array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
>>>                break;
>>>
>>> then of course you have to declare your internal encoding as CP1252
>>> ... and maybe check out if there is any decoding function to be 
>>> patched...
>>>
>>> bye
>>> Gaetano
>>>
>>>> Hi,
>>>>
>>>> I have an encoding problem of some sort. The data (strings) I'm 
>>>> sending through xmlresp contains some really nasty characters (e.g. 
>>>> • „ “ …) and breaks the XML parser on the client side. Most of the 
>>>> characters get automatically converted to their corresponding XML 
>>>> entities by you library, but not those listed above.
>>>>
>>>> How can I convert them so that my XML parser doesn't break? (I can 
>>>> verify it's broken in Internet Explorer, which probably uses the 
>>>> same parser)
>>>>
>>>>
>>>> Best regards,
>>>> Matthias Korn
>>>> _______________________________________________
>>>> phpxmlrpc mailing list
>>>> phpxmlrpc at lists.usefulinc.com
>>>> http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
>>>>
>>>
>>
>>
>
>



More information about the phpxmlrpc mailing list