New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect HTML encoding produced by HtmlEscapeMode #21913
Comments
Added Library-Html, Area-Library, Triaged labels. |
HtmlEscape is in dart:convert, not dart:html. Removed Library-Html label. |
Slash does indeed not need escaping, nor does nbsp. The ATTRIBUTE mode isn't documented, but it assumes a double-quoted attribute (and not XHTML). |
This comment was originally written by @hoylen Encoding the apostrophe as / or ' sounds good (it is shocking that IE8 doesn't support '). HTML allows for either single quoted or double quoted attributes: single quotes are not just for XHTML. Earlier HTML versions are based on SGML which supports them, and HTML 5 explicitly talks about single quotes in [1]. The best solution is to fix the implementation of ATTRIBUTE mode - it shouldn't take much effort. The other options aren't very useful for users: documenting the assumption or removing the implementation will mean everyone will have to implement their own attribute escaping mode - better to have a single correct implementation of such a useful function in the library. [1] http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#attribute-value-(single-quoted)-state |
I've stopped encoding nbsp, but have retained the escaping of slash in the unknown context. I've added escaping of < and > for attributes (as you say they are not allowed in strict XHTML, and it's better to not mislead the user). (I admit to having no idea what bad stuff you can do with a slash when < and & are already escaped, it might only be extra precaution because slashes have meaning in some context, or it's addressing known bugs in some HTML parsers). Added Fixed label. |
This comment was originally written by @hoylen Good. So the ATTRIBUTE mode will now escape all the characters < > & ' " and / I too don't know how an unescaped slash can cause problems in compliant HTML parsers. I suspect the reason could either be, "it can't hurt to escape all special characters used in markup, so let's do it anyway" or it has the potential to cause problems with non-compliant parsers (e.g. some quick regular expression hack someone put together in a few minutes instead of using a proper HTML parser). |
This comment was originally written by daven...@gmail.com I think this might be related? Having trouble with HtmlEscapeMode: http://stackoverflow.com/questions/30061271/sanitize-html-with-htmlescape-only-and |
This comment was originally written by @hoylen So some people expect slashes not to be escaped [1] and some want it to be escaped [2]. So whatever the encoder does, please document it, because there is no standard expectation of what HTML escaping does. [1] http://stackoverflow.com/questions/30061271/sanitize-html-with-htmlescape-only-and |
This comment was originally written by dave...@gmail.com But 1.10 notes seem to confirm that HtmlEscapeMode.ELEMENT should not escape forward slash. https://github.com/dart-lang/bleeding_edge/blob/master/dart/CHANGELOG.md POTENTIALLY BREAKING Fix behavior of HtmlEscape. It no longer escapes no-break space (U+00A0) anywhere or forward slash (/, U+002F) in element context. Slash is still escaped using HtmlEscapeMode.UNKNOWN. r45003, r45153, r45189 Actually, I am able to get it work as expected in a new main(), but that same code within my broader code and printing to the browser's console, I can't. Investigating. |
This comment was originally written by daven...@gmail.com Opened: https://code.google.com/p/dart/issues/detail?id=23400&thanks=23400&ts=1430924595 |
This issue was originally filed by @hoylen
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Expected:
Unknown: & < > ' " /
Element: & < > ' " /
Attribute: & < > ' " /
Got:
Unknown: & < > ' " /
Element: & < > ' " /
Attribute: & < > ' " /
The contents of a HTML element permits the slash character, so the "/" does not need to be encoded as "/". Slashes don't need to be encoded at all.
A HTML attributed can be single or double quoted. If it is single quoted, then the value must not contain a literal single quote, so the "'" needs to be escaped as "'". Since the convert method does not know how the caller will quote the value, it should encode both single quotes and double quotes. Also, in some situations (e.g strict XHTML), literal greater-than and less-than characters inside the contents of an attribute is not permitted, so it is safer to encode them too.
What version of the product are you using?
1.8.3
On what operating system?
linux_x64 (CentOS 7)
Please provide any additional information below.
// HTML escape test
import 'dart:io';
import 'dart:convert';
HtmlEscape _default = new HtmlEscape();
HtmlEscape _unknown = new HtmlEscape(HtmlEscapeMode.UNKNOWN);
HtmlEscape _escape_CDATA = new HtmlEscape(HtmlEscapeMode.ELEMENT);
HtmlEscape _escape_PCDATA = new HtmlEscape(HtmlEscapeMode.ATTRIBUTE);
void main() {
String test = "& < > ' " /";
print(" Default: ${_default.convert(test)}");
print(" Unknown: ${_unknown.convert(test)}");
print(" Element: ${_escape_CDATA.convert(test)}");
print("Attribute: ${_escape_PCDATA.convert(test)}");
}
//EOF
The text was updated successfully, but these errors were encountered: