Issue 4935175632846848: Issue 1527 - Properly escape generated CSS selectors

Sebastian Noack

Nov. 5, 2014, 2:20 p.m. (2014-11-05 14:20:01 UTC) #1

Wladimir Palant

Don't you think we should better overescape slightly but have simpler rules instead?

Nov. 5, 2014, 2:29 p.m. (2014-11-05 14:29:18 UTC) #2

Sebastian Noack

On 2014/11/05 14:29:18, Wladimir Palant wrote: > Don't you think we should better overescape slightly ...

Nov. 5, 2014, 2:41 p.m. (2014-11-05 14:41:14 UTC) #3

Wladimir Palant

Only commenting on the escaping code right now, not the rest of it. http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/include.postload.js File ...

Nov. 5, 2014, 3:14 p.m. (2014-11-05 15:14:41 UTC) #4

Sebastian Noack

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/include.postload.js File include.postload.js (right): http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/include.postload.js#newcode31 include.postload.js:31: if (code >= 128) I just realized that I ...

Nov. 5, 2014, 6:09 p.m. (2014-11-05 18:09:04 UTC) #5

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
File include.postload.js (right):

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
include.postload.js:31: if (code >= 128)
I just realized that I can easily check in the regex for non-ascii in order to
get rid of that check.

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
include.postload.js:35: if ((code >= 0 && code <= 31) || (code >= 48 && code <=
57) || code == 127)
On 2014/11/05 15:14:41, Wladimir Palant wrote:
> How about:
> 
>   if (code <= 0x1F ||&nbsp;/\d/.test(chr) || code == 0x7F)
> 
> Why I prefer this:
> 
> 1. The character code cannot be negative, and there is nothing around that
this
> check should be consistent with.
> 2. Hexadecimal makes the connection to regular expressions below easier to
> understand.
> 3. Digits are better described with a regexp rather than ASCII codes.

I already considered using that regex to check for digits. But I find it
slightly inconsistent. But never mind.

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
include.postload.js:36: return "\\" + code.toString(16) + (pos + 1 < s.length ?
" " : "");
On 2014/11/05 15:14:41, Wladimir Palant wrote:
> Why create a special case for end of string? That extra space certainly won't
> hurt.

I find the extra space in the end confusing. It can be easily misread at the end
of a quoted string, and it looks ugly when there are duplicated spaces between
two tokens. But given this is a corner case, I guess we can remove that check to
simplify the code.

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
include.postload.js:44: return '"' + value.replace(/["\\\r\n\f\0]/g, escapeChar)
+ '"';
On 2014/11/05 15:14:41, Wladimir Palant wrote:
> How about generally escaping all non-printable characters?
> 
>   return '"' + value.replace(/[\0x00-\0x1F"\\], escapeChar/) + '"';
> 
> This removes a bunch of assumptions, and non-printable characters are
generally
> uncommon enough that the overescaping here shouldn't matter.

Yeah, un-escaped control characters make quoted strings harder to read anyway. I
have restructured the code, to have the code points for the control characters
only given once.

http://codereview.adblockplus.org/4935175632846848/diff/5105650963054592/incl...
include.postload.js:48: return s.replace(/^\d|^-(?![^\d-])|[^\w-]/g,
escapeChar);
On 2014/11/05 15:14:41, Wladimir Palant wrote:
> How about always escaping leading dashes? These are uncommon anyway and the
rule
> will become much simpler:
> 
>   // Alphanumerical characters, dash and underscore generally don't need to be
> escaped, first character can only be a letter however.
>   return s.replace(/^[\d-]|[^\w-]/, escapeChar);

I'd prefer to keep the logic for escaping dashes. Dashes in IDs and classes are
pretty common, and occasionally they occur in the first character. And not
over-escaping those, doesn't add any complexity to the JS code, but just a
simple branch to the regex, which is justified IMO.

Wladimir Palant

Looks good except for that "id with leading dash" logic. Hoping to get some feedback ...

Nov. 5, 2014, 7:56 p.m. (2014-11-05 19:56:52 UTC) #6

Sebastian Noack

http://codereview.adblockplus.org/4935175632846848/diff/5685265389584384/include.postload.js File include.postload.js (right): http://codereview.adblockplus.org/4935175632846848/diff/5685265389584384/include.postload.js#newcode50 include.postload.js:50: /^\d|^-(?![^\d-])|[^\w-\u0080-\uFFFF]/g, On 2014/11/05 19:56:52, Wladimir Palant wrote: > Constructing ...

Nov. 6, 2014, 8:25 a.m. (2014-11-06 08:25:13 UTC) #7

kzar

On 2014/11/05 19:56:52, Wladimir Palant wrote: > Looks good except for that "id with leading ...

Nov. 6, 2014, 12:27 p.m. (2014-11-06 12:27:48 UTC) #8

Sebastian Noack

On 2014/11/06 12:27:48, kzar wrote: > On 2014/11/05 19:56:52, Wladimir Palant wrote: > > Looks ...

Nov. 6, 2014, 12:40 p.m. (2014-11-06 12:40:47 UTC) #9

Sebastian Noack

As discussed with Wladmir, I've added a test case and renamed the function to escapeCSS(). ...

Nov. 6, 2014, 1:53 p.m. (2014-11-06 13:53:13 UTC) #10

Wladimir Palant

http://codereview.adblockplus.org/4935175632846848/diff/5685265389584384/include.postload.js File include.postload.js (right): http://codereview.adblockplus.org/4935175632846848/diff/5685265389584384/include.postload.js#newcode50 include.postload.js:50: /^\d|^-(?![^\d-])|[^\w-\u0080-\uFFFF]/g, On 2014/11/06 08:25:14, Sebastian Noack wrote: > You ...

Nov. 6, 2014, 2:13 p.m. (2014-11-06 14:13:59 UTC) #11

Wladimir Palant

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/include.postload.js File include.postload.js (right): http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/include.postload.js#newcode39 include.postload.js:39: return '"' + value.replace(new RegExp('["\\\\]|' + ctrlChar.source, "g"), escapeChar) ...

Nov. 6, 2014, 9:26 p.m. (2014-11-06 21:26:06 UTC) #12

Sebastian Noack

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/include.postload.js File include.postload.js (right): http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/include.postload.js#newcode39 include.postload.js:39: return '"' + value.replace(new RegExp('["\\\\]|' + ctrlChar.source, "g"), escapeChar) ...

Nov. 10, 2014, 12:13 p.m. (2014-11-10 12:13:18 UTC) #13

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/incl...
File include.postload.js (right):

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/incl...
include.postload.js:39: return '"' + value.replace(new RegExp('["\\\\]|' +
ctrlChar.source, "g"), escapeChar) + '"';
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Generating regular expressions dynamically for no good reason whatsoever is a
> clear step back as far as readability and maintainability goes :-(

The idea was not to hard-code the char codes for control characters twice. But
the way I previously reused the regex lead to buggy behavior, since RegExp
objects using the global flag, persist the postion of the last match and resume
at that position when calling .test() the next time. But frankly, I also dislike
this construct, and removed it now.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/incl...
include.postload.js:44: return
s.replace(/^\d|^-(?![^\d\-])|[^\w\-\u0080-\uFFFF]/g, escapeChar);
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Thinking more about this, I'm definitely opposed to a special case for dashes
> not followed by a digit or dash here.
> 
> 1) This special case is irrelevant on real websites.
> 2) It makes the rules more complicated and much harder to verify (as opposed
to:
> "always escape dashes if they are the first character").
> 3) Mozilla implements this logic differently which shows that this rule isn't
> canonical.
> 4) This makes the otherwise trivial regular expression much harder to
> understand.

It's probably not worth to argue here. But I still think that the rule is easier
to read when the leading dash isn't escaped, and therefore should only be
escaped when necessary, and that a look-ahead doesn't make the regex much harder
to understand.

Instead escaping the leading dash, you can also escape the digit or dash in the
second position. That is what Gecko's CSS.escape() function does, when the
string starts with a dash followed by a digit. However, Gecko doesn't escape
sequences of leading dashes, since those need no escaping there.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
File qunit/tests/cssEscaping.js (right):

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
qunit/tests/cssEscaping.js:7: frame.addEventListener("load", function()
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Nit: This should be part of test setup (defined in module() call), with
teardown
> that will remove the frame from the document again. Setup can also call stop()
> and start() to perform asynchronous actions.
> 
> Note the setup function can also initialize escapeCSS and quote properties:
> 
>    this.escapeCSS = frame.contentWindow.escapeCSS;

I already wondered whether qunit has setup/teardown hooks.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
qunit/tests/cssEscaping.js:14: document.body.removeChild(frame);
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Won't removing the frame immediately prevent escapeCSS from calling other
> functions from that frame?

No, the test passes without any errors. I think it's because the cached
functions, keep references to either their window object, or directly to the
functions they call and variables they access.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
qunit/tests/cssEscaping.js:16: function testSelector(opts) {
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Nit: opening bracket on next line please.

Done.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
qunit/tests/cssEscaping.js:92: testEscape("-" + chr);
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> How about testing a letter as leading character as well?

I considered that case redundant. But fair enough.

http://codereview.adblockplus.org/4935175632846848/diff/5728116278296576/quni...
qunit/tests/cssEscaping.js:96: testEscape("😻♥ä");
On 2014/11/06 21:26:06, Wladimir Palant wrote:
> Better not have Unicode characters in a JavaScript file - browsers don't
always
> recognize UTF-8 correctly. "\u1234\u4321\u009F" will be less error-prone.

I actually enjoyed seeing a cat with heart-shaped eyes, when reading this file.
;) But never mind.

Wladimir Palant

Nov. 10, 2014, 12:34 p.m. (2014-11-10 12:34:15 UTC) #14

LGTM with the last nit fixed.

http://codereview.adblockplus.org/4935175632846848/diff/5704837555552256/quni...
File qunit/tests/cssEscaping.js (right):

http://codereview.adblockplus.org/4935175632846848/diff/5704837555552256/quni...
qunit/tests/cssEscaping.js:7: frame.srcdoc='<script
src="../include.postload.js"></script>';
Nit: Missing spaces around =

Issue 4935175632846848: Issue 1527 - Properly escape generated CSS selectors (Closed)

Description

Patch Set 1 : #

Patch Set 2 : Addressed comments #

Patch Set 3 : Renamed function and added test case #

Patch Set 4 : Addressed comments #

Messages