#6319 closed defect (fixed)
WEB: XML Parse Error with RSS feed
Reported by: | SF/mase76 | Owned by: | bluegr |
---|---|---|---|
Priority: | normal | Component: | Web |
Version: | Keywords: | ||
Cc: | Game: |
Description
Hi! I got an error with your RSS feed with Tiny Tiny RSS: This XML document is invalid, likely due to invalid characters. XML error: Undeclared entity error at line 64, column 24 I reported this to TTRSS, but such errors seem to be site related.
Thomas
Ticket imported from: #3612781. Ticket imported from: bugs/6319.
Change History (14)
comment:1 by , 12 years ago
Summary: | Error with RSS feed → WEB: XML Parse Error with RSS feed |
---|
comment:2 by , 12 years ago
comment:3 by , 12 years ago
$ xmllint scummvm.rss scummvm.rss:64: parser error : Entity 'eacute' not defined <title>Touché: The Adventures of the Fifth Musketeer Music Enhanced So
No, é is /not/ a valid XML token. You see it alot in RSS feeds, and it always pisses me off, because this is /invalid XML/.
Since our RSS feed specifies an UTF-8 encoding, we should use the actual UTF-8 encoded é.
comment:4 by , 12 years ago
See also: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML
Note that there's no eacute.
comment:5 by , 12 years ago
drmccoy: I stand corrected.. I didn't know that XML standardised anything outside of the tag format, though I suspected RSS-XML would be a stricter specific definition, including escaping of non-ASCII characters and/or Unicode support.
Since this is generated from our website news feed, I suspect the code in: https://github.com/scummvm/scummvm-web/blob/master/templates/feed_rss.tpl
will need some work to deal with replacing this and other HTML ISO-8859-1 escaped characters with their unicode equivalent... or switch the RSS feed to HTML ISO-8859-1 ?
comment:6 by , 12 years ago
If we switch the RSS to ISO-8859-1, we have to replace the "é" with "é". There are no predefined entities for those characters in XML at all, and neither in RSS > 0.9. And we really don't want to use RSS 0.9. Also, this won't help with our Atom feed (which, by the way, is currently broken with the inclusion of the invalid é as well).
What we could theoretically do is define those entities ourselves, by added a DTD with entity definitons ( see https://en.wikipedia.org/wiki/SGML_entity#Syntax ), but from what I heard, many RSS readers won't parse that correctly.
So, we really should fix our RSS generator to convert the HTML entities into UTF-8 characters. Interestingly, the ScummVM Planet feed already does that somehow.
comment:7 by , 12 years ago
Not surprising as the planet uses a different web server config. It's templates are here: https://github.com/scummvm/scummvm-sites/tree/web-planet/scummvm_template . Looking at these, it should be possible to update the main sites eDS and Atom feed to fix this.
comment:8 by , 12 years ago
Owner: | set to |
---|
comment:9 by , 12 years ago
djwillis: Since fixing this will need someone who can do Smarty / PHP code to modify the templates, can you look at this as I assume you did this for the Planet code?
comment:10 by , 11 years ago
I tried to add the feed to the owncloud newsreader. It throws an error of an invalid xml. So that is the second reader, which cannot handle the feed.
comment:11 by , 11 years ago
mase76: Thank you for the further information, but as you can see from the comments here, we know the cause of this... however we don't have many developers with familarity with PHP/Smarty CMS who can work on the tpl code to fix this.
If you are capable of PHP, then please feel free to provide a patch to our code to fix this: https://github.com/scummvm/scummvm-web/blob/master/templates/feed_rss.tpl
The planet setup deals with this, but is subtly different and thus the solution can NOT be copied across... hence why this bug is still open... https://github.com/scummvm/scummvm-sites/tree/web-planet/scummvm_template
If you can't fix this, please be patient as djwillis and most of the other project developers are very busy IRL and thus this may take some time to fix...
comment:12 by , 11 years ago
Owner: | changed from | to
---|---|
Resolution: | → fixed |
Status: | new → closed |
comment:13 by , 11 years ago
This has now been fixed in commit eb35f1bb8c69066474ff8c07e6fc70a93b4b8193. Thanks for reporting, closing.
comment:14 by , 6 years ago
Component: | → Web |
---|
mase76: The parse error is associated with this line in our current RSS feed output: <title>Touché: The Adventures of the Fifth Musketeer Music Enhanced Soundtrack Released</title>
Specifically with the "&eacture;;" token which is used to provide an e with acute accent as Touche is french! :)
I'm not sure myself if this is a bug as I think this is correctly escaped and not malformed XML AFAIK... I suspect your reader is not dealing well with non-ASCII characters?