String.split documentation is misleading #46280

jamesderlin · 2021-06-07T20:09:15Z

https://api.dart.dev/stable/2.13.2/dart-core/String/split.html states:

Empty matches at the beginning and end of the strings are ignored, and so are empty matches right after another match.

var string = "abba";
// Matches:   ^^ ^^
string.split(RegExp(r"b*"));        // ['a', 'a']
                                        // not ['', 'a', 'a', '']
                                        // not ['a', '', 'a']

I don't understand the usage of "matches" in the above statement. The search string being matched is where splits should occur, but "empty matches ... are ignored" and the example seem to imply that it's talking about the resulting tokens, which is inconsistent verbiage.
The Matches: ^^ ^^ doesn't make sense to me. Is it supposed to be showing what's matched by the search pattern? If so, shouldn't it be pointing to just bb? Or if it's supposed to be pointing to the resulting tokens, shouldn't it be pointing to the two as?
The code behaves as described only because of the regular expression used. ~~It is not generally true that empty matches are ignored.~~ If we instead used string.split('b'), it would result in ['a', '', 'a'], or if we used string.split('a'), we'd get ['', 'bb', ''].

The text was updated successfully, but these errors were encountered:

jamesderlin · 2021-06-07T20:26:26Z

Okay, I think I see now what it's trying to say. The RE b* is an empty match at the beginning and end of the string, so this part of the documentation is using "matches" consistently with the rest. I misunderstood what the example was trying to demonstrate. The example also maybe should be splitting something like 'aabbaa' instead (which produces ['a', 'a', 'a', 'a']).

I'm still confused by point 2 and think that the documentation should clearly state that consecutive matches of the search pattern aren't automatically coalesced (and therefore can result in empty tokens in the result).

lrhn · 2021-06-08T12:38:25Z

More documentation and more examples are probably a good idea. The behavior is complex and slightly inconsistent (leading/trailing empty matches are ignored).

An example could be prose like:

The string "abba" contains four matches of RegExp("b*"): An empty match before the first a, a match of bb,
an empty match after bb and before a, and and an empty match after the last a (see [RegExp.allMatches]).
The split method ignores empty matches at the start and end of the input string, as well as right after another match,
so only the match of bb is used for splitting. The result is therefore ["a", "a"].

If a non-empty match immediately follows another match, the two are not combined, and the result will contain the
empty string between the two matches.
Also, an empty match followed by a non-empty match at the same position are treated as two matches.
That's not something which can occur naturally from a [String] or [RegExp] pattern, it requires a custom
written [Pattern] implementation which can somehow produce different matches at the same point of the string.

The ^ notation isn't particularly readable. (Seemed like a great idea at the time!)

Another option is to mark the actual matches as []a[bb][]a[], but it will still need prose to exaplain.

The behavior is not because it's a RegExp, it's general behavior for empty matches. If you do "abba".split(""), you get ["a", "b", "b", "a"]. We still ignore leading and trailing empty matches.
We don't need to ignore empty matches after another match because our String.allMatches deliberately avoids those. If we write a Pattern implementation which doesn't (like RegExp, but not necessarily being RegExp) the rule still counts.

devoncarew added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core labels Jun 7, 2021

dart-bot closed this as completed in fdcc930 Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String.split documentation is misleading #46280

String.split documentation is misleading #46280

jamesderlin commented Jun 7, 2021 •

edited

jamesderlin commented Jun 7, 2021 •

edited

lrhn commented Jun 8, 2021 •

edited

String.split documentation is misleading #46280

String.split documentation is misleading #46280

Comments

jamesderlin commented Jun 7, 2021 • edited

jamesderlin commented Jun 7, 2021 • edited

lrhn commented Jun 8, 2021 • edited

jamesderlin commented Jun 7, 2021 •

edited

jamesderlin commented Jun 7, 2021 •

edited

lrhn commented Jun 8, 2021 •

edited