[phpxmlrpc] xmlrpc_encode_entitites causing parse error

a.h.s. boy (lists) spudlists at nothingness.org
Tue Nov 15 15:34:23 GMT 2005


On Nov 15, 2005, at 4:11 AM, Gaetano Giunta wrote:

> Brief analysis:
>
> - the lib tries to encode all chars outside of the ASCII range as  
> 'XML character entity' when serializing

I understand the theory, but one of the benefits to using UTF-8 in  
the first place is its ability to properly render all sorts of  
languages and character sets. Debugging becomes brutal when you're  
staring at a huge string of HTML entities.

> - this has the main benefit that such an xml is valid regardless of  
> the charset assumed by the parser, i.e. we do not need to add a  
> 'charset' parameter to either the HTTP Content-type header or the  
> XML prologue

Well...apparently it isn't valid XML despite the lack of charset...or  
we wouldn't be having this discussion! ;-)

> - it is also the best solution I could come up with to solve the  
> long-standing problems with cahrset encodings (I also tried the  
> other way round, e.g. explicitly stating the charset used for xml,  
> in a private fork of the lib I use for personal projects, but I  
> would rather stick with the current approach, as it solves the  
> problem in a more elegant way)

Believe me, I totally understand the issue of long-standing charset  
encoding problems! I've been developing a CMS that needs to handle  
multiple languages, alphabets, directionality, and XML-RPC/RSS feeds  
all on the same page! Not easy, especially if your own linguistic  
range is limited to English and Romance languages!

But I'm also a fan of proper declarations...and I'd rather have an  
XML feed explicitly declare its charset encoding (and work) than try  
to be "universal" and fail. :-)

I'll admit to not being fully familiar with all the XMLRPC library  
code -- only enough to debug a bit -- but it appears that  
$xmlrpc_internalencoding is declared as a global variable, though it  
is only used in object methods. Could it be changed to be a property  
of the xmlrpcmsg and xmlrpc_server classes? That way it could be set  
through scripting with

$xmlrpcmsg->set_internalencoding($foo);

or something similar? That would be more flexible, and since you  
_always_ know what the encoding is, you can send it in the XML  
prologue, which is what that parameter is designed for anyway.

> - basically, I see two options to extend the lib to make up for  
> your problem:
>   + extend the xmlrpc_encode_entitites function to take into  
> account the xmlrpc_internalencoding global var, and use 2 different  
> parsing alghoritms (better solution but slower)

Well...UTF-8 should only require converting "&", "<", and '"'  
explicitly, and the rest is assumed to be valid. So the only fork  
you'd need in the code is to convert additional entities for non- 
UTF-8 encodings. Shouldn't slow anything down...in fact, it would  
make UTF-8 faster, since it would skip additional processing.

In fact, I may be mistaken, but it seems like older versions of the  
library didn't even do the entity translation...at least, in the  
course of my own development, I know I included some entity  
conversion routines to process the data _before_ I sent it to the  
XMLRPC library (but it may have been redundant on my part). Though I  
admit I do like the idea that I can pass _anything_ to the XMLRPC  
library and have it properly encoded for me!

> Would you be willing to test the patches?

Absolutely...but I do think you should give some serious thought to  
making the internal encoding variable more scriptable so no one ever  
needs to hard-code changes in the script file. I hate having to  
remember to change the variable value whenever I upgrade the library...

Cheers,
spud.


-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org            "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------



More information about the phpxmlrpc mailing list