Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VM] _JsonUtf8Parser fails on UTF-8 with BOM #33251

Closed
lexaknyazev opened this issue May 27, 2018 · 4 comments
Closed

[VM] _JsonUtf8Parser fails on UTF-8 with BOM #33251

lexaknyazev opened this issue May 27, 2018 · 4 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@lexaknyazev
Copy link
Contributor

Dart SDK 2.0.0-dev.58.0

import 'dart:convert';

void main() {
  final data = [0xEF, 0xBB, 0xBF, 0x7B, 0x7D]; // BOM { }

  print(json.decode(utf8.decode(data))); // prints "{ }"

  print(utf8.decoder.fuse(json.decoder).convert(data)); // fails in VM
}

The code above runs fine when compiled to JS but fails in Dart VM:

Unhandled exception:
FormatException: Unexpected character (at offset 0)
#0      _ChunkedJsonParser.fail (dart:convert-patch/dart:convert/convert_patch.dart:1362)
#1      _ChunkedJsonParser.parseNumber (dart:convert-patch/dart:convert/convert_patch.dart:1258)
#2      _ChunkedJsonParser.parse (dart:convert-patch/dart:convert/convert_patch.dart:926)
#3      _JsonUtf8Decoder.convert (dart:convert-patch/dart:convert/convert_patch.dart:65)
#4      main
@lrhn
Copy link
Member

lrhn commented May 28, 2018

That could probably be accepted, even though it shouldn't be there to begin with.

From: https://tools.ietf.org/html/rfc7159#section-8.1

Implementations MUST NOT add a byte order mark to the beginning of a
JSON text. In the interests of interoperability, implementations
that parse JSON texts MAY ignore the presence of a byte order mark
rather than treating it as an error.

It means that we may ignore a BOM. For interoperability, it's probably a good idea to be permissive since users are not always in control of the encoding.

@lrhn lrhn added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels May 28, 2018
@lexaknyazev
Copy link
Contributor Author

lexaknyazev commented May 28, 2018

I'm more concerned that there's a difference between Dart VM and compiled JS behavior. It complicates writing portable code.

Normative docs on this matter are kinda mixed.

From RFC-7159 (JSON), section 8.1

Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

But the ES 5.1/6.0 (ECMA-262) definitions of JSON white-space don't include BOM, so JSON.parse shouldn't accept it. Since the code above works, I suspect that JS runtime removes BOM from the string beforehand.

UPD Oops... Didn't saw the edit above with the same quote.

@lexaknyazev
Copy link
Contributor Author

@lrhn Is there any chance that someone from Dart team will be able to fix this soon?
If not, I could prepare a CL myself because it's apparently a blocking issue for me (so instead of putting the workaround into my app it could be implemented in the SDK).

@lrhn
Copy link
Member

lrhn commented May 29, 2018

I'll look into it today. If turned out to be easy, so I'll make a CL for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

2 participants