[phpxmlrpc] xmlrpc_encode_entitites causing parse error
a.h.s. boy (lists)
spudlists at nothingness.org
Tue Nov 15 15:34:23 GMT 2005
On Nov 15, 2005, at 4:11 AM, Gaetano Giunta wrote:
> Brief analysis:
>
> - the lib tries to encode all chars outside of the ASCII range as
> 'XML character entity' when serializing
I understand the theory, but one of the benefits to using UTF-8 in
the first place is its ability to properly render all sorts of
languages and character sets. Debugging becomes brutal when you're
staring at a huge string of HTML entities.
> - this has the main benefit that such an xml is valid regardless of
> the charset assumed by the parser, i.e. we do not need to add a
> 'charset' parameter to either the HTTP Content-type header or the
> XML prologue
Well...apparently it isn't valid XML despite the lack of charset...or
we wouldn't be having this discussion! ;-)
> - it is also the best solution I could come up with to solve the
> long-standing problems with cahrset encodings (I also tried the
> other way round, e.g. explicitly stating the charset used for xml,
> in a private fork of the lib I use for personal projects, but I
> would rather stick with the current approach, as it solves the
> problem in a more elegant way)
Believe me, I totally understand the issue of long-standing charset
encoding problems! I've been developing a CMS that needs to handle
multiple languages, alphabets, directionality, and XML-RPC/RSS feeds
all on the same page! Not easy, especially if your own linguistic
range is limited to English and Romance languages!
But I'm also a fan of proper declarations...and I'd rather have an
XML feed explicitly declare its charset encoding (and work) than try
to be "universal" and fail. :-)
I'll admit to not being fully familiar with all the XMLRPC library
code -- only enough to debug a bit -- but it appears that
$xmlrpc_internalencoding is declared as a global variable, though it
is only used in object methods. Could it be changed to be a property
of the xmlrpcmsg and xmlrpc_server classes? That way it could be set
through scripting with
$xmlrpcmsg->set_internalencoding($foo);
or something similar? That would be more flexible, and since you
_always_ know what the encoding is, you can send it in the XML
prologue, which is what that parameter is designed for anyway.
> - basically, I see two options to extend the lib to make up for
> your problem:
> + extend the xmlrpc_encode_entitites function to take into
> account the xmlrpc_internalencoding global var, and use 2 different
> parsing alghoritms (better solution but slower)
Well...UTF-8 should only require converting "&", "<", and '"'
explicitly, and the rest is assumed to be valid. So the only fork
you'd need in the code is to convert additional entities for non-
UTF-8 encodings. Shouldn't slow anything down...in fact, it would
make UTF-8 faster, since it would skip additional processing.
In fact, I may be mistaken, but it seems like older versions of the
library didn't even do the entity translation...at least, in the
course of my own development, I know I included some entity
conversion routines to process the data _before_ I sent it to the
XMLRPC library (but it may have been redundant on my part). Though I
admit I do like the idea that I can pass _anything_ to the XMLRPC
library and have it properly encoded for me!
> Would you be willing to test the patches?
Absolutely...but I do think you should give some serious thought to
making the internal encoding variable more scriptable so no one ever
needs to hard-code changes in the script file. I hate having to
remember to change the variable value whenever I upgrade the library...
Cheers,
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
More information about the phpxmlrpc
mailing list