[phpxmlrpc] Character encoding: many questions

Gaetano Giunta giunta.gaetano at sea-aeroportimilano.it
Mon Jun 16 16:54:30 BST 2003


Hello,
just noticed that this list is alive and well.
What is the relationship with the Sourceforge list? It seems a lot of talk about develeopment is happening here!

Some toughts on character encoding:

- I am probably not a protocol guru, but my understanding of the specs is that xml-rpc messages should maybe not default to UTF-8 encoding as it is stated in the XML spec.

The reason is simple: HTTP takes precedence over XML in determining char encoding, and when sending content labeled as text/xml media type, the correct spec to apply is RFC3023: (quote)

  Conformant with [RFC2046], if a text/xml entity is received with
      the charset parameter omitted, MIME processors and XML processors
      MUST use the default charset value of "us-ascii"[ASCII].  In cases
      where the XML MIME entity is transmitted via HTTP, the default
      charset value is still "us-ascii".  (Note: There is an
      inconsistency between this specification and HTTP/1.1, which uses
      ISO-8859-1[ISO8859] as the default for a historical reason.  Since
      XML is a new format, a new default should be chosen for better
      I18N.  US-ASCII was chosen, since it is the intersection of UTF-8
      and ISO-8859-1 and since it is already used by MIME.)

Apparently, when receiving a request with unspecified content encoding, US-ASCII should be assumed!

- The server (and possibly the client) should be able to understand the encoding used by the received http request, and try to cope with it (either rejecting or processing to it), rather than just assume it is the 'standard'.
I tried to implement this sort of mechanism for the server, and put togheter a descendant class of xmlrpcserver that can: 1-decide which encoding it accepts, 2-guestimate the received request's encoding and 3-decide which encoding to use for responses. Part 2 is carried out following the guidelines found in http://www.yale.edu/pclt/encoding/
The code you can find in the attached files. I did not post it as a patch to SF because I am not too sure if it is correct. Anyone interested, please give it a look and feel free to comment.

- The new xlate function apparently deals with 'HTML' entities. Should'nt we deal only with xml-defined entities?

- Should the xmlrpc client/server classes provide the functionality to encode/decode the strings sent/received to the specified char encoding or leave this task to the app layer?

- Finally, (this was probably answered by Edd many moons ago, but I cannot find it anymore) why is htmlentities() used to escape the string data in xml messages instead of a plain translation of '<' and '&'? Is this related to the last point above?

Thanks,
Gaetano Giunta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sea_xmlrpc_server.inc
Type: application/octet-stream
Size: 16601 bytes
Desc: not available
Url : http://lists.usefulinc.com/pipermail/phpxmlrpc/attachments/20030616/d80115cc/sea_xmlrpc_server-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sea_logging.inc
Type: application/octet-stream
Size: 10751 bytes
Desc: not available
Url : http://lists.usefulinc.com/pipermail/phpxmlrpc/attachments/20030616/d80115cc/sea_logging-0001.obj


More information about the phpxmlrpc mailing list