Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect HTML encoding produced by HtmlEscapeMode #21913

Closed
DartBot opened this issue Dec 18, 2014 · 10 comments
Closed

Incorrect HTML encoding produced by HtmlEscapeMode #21913

DartBot opened this issue Dec 18, 2014 · 10 comments

Comments

@DartBot
Copy link

DartBot commented Dec 18, 2014

This issue was originally filed by @hoylen


What steps will reproduce the problem?

  1. Run the test program below, which simply invokes HtmlEscape convert methods on & < > ' " /

What is the expected output? What do you see instead?

Expected:
  Unknown: &amp; &lt; &gt; &apos; &quot; /
  Element: &amp; &lt; &gt; ' " /
  Attribute: &amp; &lt; &gt; &apos; &quot; /

Got:
  Unknown: &amp; &lt; &gt; &#x27; &quot; &#x2F;
  Element: &amp; &lt; &gt; ' " &#x2F;
  Attribute: &amp; < > ' &quot; /

The contents of a HTML element permits the slash character, so the "/" does not need to be encoded as "&#x2F". Slashes don't need to be encoded at all.

A HTML attributed can be single or double quoted. If it is single quoted, then the value must not contain a literal single quote, so the "'" needs to be escaped as "&apos;". Since the convert method does not know how the caller will quote the value, it should encode both single quotes and double quotes. Also, in some situations (e.g strict XHTML), literal greater-than and less-than characters inside the contents of an attribute is not permitted, so it is safer to encode them too.

What version of the product are you using?

1.8.3

On what operating system?
linux_x64 (CentOS 7)

Please provide any additional information below.

// HTML escape test

import 'dart:io';
import 'dart:convert';

HtmlEscape _default = new HtmlEscape();
HtmlEscape _unknown = new HtmlEscape(HtmlEscapeMode.UNKNOWN);
HtmlEscape _escape_CDATA = new HtmlEscape(HtmlEscapeMode.ELEMENT);
HtmlEscape _escape_PCDATA = new HtmlEscape(HtmlEscapeMode.ATTRIBUTE);

void main() {
  String test = "& < > ' " /";

  print(" Default: ${_default.convert(test)}");
  print(" Unknown: ${_unknown.convert(test)}");
  print(" Element: ${_escape_CDATA.convert(test)}");
  print("Attribute: ${_escape_PCDATA.convert(test)}");
}

//EOF

@lrhn
Copy link
Member

lrhn commented Dec 19, 2014

Added Library-Html, Area-Library, Triaged labels.

@alan-knight
Copy link
Contributor

HtmlEscape is in dart:convert, not dart:html.


Removed Library-Html label.
Added Library-Convert label.

@lrhn
Copy link
Member

lrhn commented Mar 19, 2015

Slash does indeed not need escaping, nor does nbsp.
The apos->#x2f is probably due to some browsers not understanding ' (IE 8 being a known culprit). I bet Dart isn't claiming to support any such browser, but I'll keep it for now (or maybe change it to &#­39; because that's shorter).

The ATTRIBUTE mode isn't documented, but it assumes a double-quoted attribute (and not XHTML).
It's all about knowing the expected context.
I'll make the HtmlEscapeMode constructor public so you can create your own modes.


Set owner to @lrhn.
Added Started label.

@DartBot
Copy link
Author

DartBot commented Mar 19, 2015

This comment was originally written by @hoylen


Encoding the apostrophe as / or &#­39; sounds good (it is shocking that IE8 doesn't support ').

HTML allows for either single quoted or double quoted attributes: single quotes are not just for XHTML. Earlier HTML versions are based on SGML which supports them, and HTML 5 explicitly talks about single quotes in [1].

The best solution is to fix the implementation of ATTRIBUTE mode - it shouldn't take much effort. The other options aren't very useful for users: documenting the assumption or removing the implementation will mean everyone will have to implement their own attribute escaping mode - better to have a single correct implementation of such a useful function in the library.

[1] http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#attribute-value-(single-quoted)-state

@lrhn
Copy link
Member

lrhn commented Apr 20, 2015

I've stopped encoding nbsp, but have retained the escaping of slash in the unknown context.
It's recommended for avoiding some XSS attacks: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content

I've added escaping of < and > for attributes (as you say they are not allowed in strict XHTML, and it's better to not mislead the user).

(I admit to having no idea what bad stuff you can do with a slash when < and & are already escaped, it might only be extra precaution because slashes have meaning in some context, or it's addressing known bugs in some HTML parsers).


Added Fixed label.

@DartBot
Copy link
Author

DartBot commented Apr 20, 2015

This comment was originally written by @hoylen


Good. So the ATTRIBUTE mode will now escape all the characters < > & ' " and /

I too don't know how an unescaped slash can cause problems in compliant HTML parsers. I suspect the reason could either be, "it can't hurt to escape all special characters used in markup, so let's do it anyway" or it has the potential to cause problems with non-compliant parsers (e.g. some quick regular expression hack someone put together in a few minutes instead of using a proper HTML parser).

@DartBot
Copy link
Author

DartBot commented May 5, 2015

This comment was originally written by daven...@gmail.com


I think this might be related? Having trouble with HtmlEscapeMode:

http://stackoverflow.com/questions/30061271/sanitize-html-with-htmlescape-only-and

@DartBot
Copy link
Author

DartBot commented May 5, 2015

This comment was originally written by @hoylen


So some people expect slashes not to be escaped [1] and some want it to be escaped [2]. So whatever the encoder does, please document it, because there is no standard expectation of what HTML escaping does.

[1] http://stackoverflow.com/questions/30061271/sanitize-html-with-htmlescape-only-and
[2] https://code.google.com/p/dart/issues/detail?id=23212

@DartBot
Copy link
Author

DartBot commented May 6, 2015

This comment was originally written by dave...@gmail.com


But 1.10 notes seem to confirm that HtmlEscapeMode.ELEMENT should not escape forward slash.

https://github.com/dart-lang/bleeding_edge/blob/master/dart/CHANGELOG.md

POTENTIALLY BREAKING Fix behavior of HtmlEscape. It no longer escapes no-break space (U+00A0) anywhere or forward slash (/, U+002F) in element context. Slash is still escaped using HtmlEscapeMode.UNKNOWN. r45003, r45153, r45189

Actually, I am able to get it work as expected in a new main(), but that same code within my broader code and printing to the browser's console, I can't. Investigating.

@DartBot
Copy link
Author

DartBot commented May 6, 2015

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants