[phpxmlrpc] xmlrpc_encode_entitites causing parse error
giunta.gaetano at sea-aeroportimilano.it
Tue Nov 15 09:11:31 GMT 2005
- the lib tries to encode all chars outside of the ASCII range as 'XML character entity' when serializing
- this has the main benefit that such an xml is valid regardless of the charset assumed by the parser, i.e. we do not need to add a 'charset' parameter to either the HTTP Content-type header or the XML prologue
- it is also the best solution I could come up with to solve the long-standing problems with cahrset encodings (I also tried the other way round, e.g. explicitly stating the charset used for xml, in a private fork of the lib I use for personal projects, but I would rather stick with the current approach, as it solves the problem in a more elegant way)
- unfortunately, as I work with non-mbstring enabled installs by default, I assumed that internal string representation was iso-8859-1, and coded the xmlrpc_encode_entitites function accordingly
- I am now looking at the PHP man page for utf8_decode, and there are a few examples of a correct utf8-to-xmlentities functions, that might be of use
- basically, I see two options to extend the lib to make up for your problem:
+ extend the xmlrpc_encode_entitites function to take into account the xmlrpc_internalencoding global var, and use 2 different parsing alghoritms (better solution but slower)
+ add a 'workaround' solution: a class var of server/client objects that will prevent the escaping of non-ascii chars to take place.
+ note that both things could actually be combined...
Would you be willing to test the patches?
> -----Original Message-----
> From: phpxmlrpc-bounces at lists.usefulinc.com
> [mailto:phpxmlrpc-bounces at lists.usefulinc.com]On Behalf Of a.h.s. boy
> Sent: Tuesday, November 15, 2005 12:17 AM
> To: phpxmlrpc at lists.usefulinc.com
> Subject: [phpxmlrpc] xmlrpc_encode_entitites causing parse error
> I'm using the XML-RPC library to retrieve calendar listing records
> from a calendar website. Both the client and the server are
> using the
> latest XML-RPC library.
> Both client and server are using UTF-8 encoding all around, and I've
> adjusted $xmlrpc_internalencoding.
> Some of the calendar entries are in Japanese, input with UTF-8
> encoding, and displayed on the site with UTF-8 encoding. (See http://
> If I make an XMLRPC request to retrieve some Japanese entries, the
> library chokes and returns an "Invalid token" error. After
> what seems
> like 90 hours of debugging (checking the strings and arrays at
> various stages of encoding and parsing), I tracked the problem down
> to the default case of xmlrpc_encode_entitites()
> if ($code < 32 || $code > 159)
> $character = ("&#".strval($code).";");
> If I simply comment out that code, leaving a blank default case, the
> XML is now valid and parses (and displays) exactly as expected. I
> have NOT debugged the code to the extent where I can tell exactly
> what character's entity reference might be the exact cause of the
> problem...it's all complicated by the fact that I don't read
> Japanese, so debugging is that much harder.
> Any idea why the entity conversion is causing the XML to become
> invalid? Is it feasible to leave off the
> There's an example page at http://dev.dadaimc.org/mod/calendar/
> index.php with debugging turned on, but it'll only be valid
> for today
> (11/14/05 -0500), after which time the Japanese entry will no longer
> be part of the results. But I'd be happy to reproduce the problem
> upon request.
> a.h.s. boy
> spud(at)nothingness.org "as yes is to if,love is to yes"
> phpxmlrpc mailing list
> phpxmlrpc at lists.usefulinc.com
More information about the phpxmlrpc