|
|
Created:
April 16, 2015, 10:53 a.m. by kzar Modified:
April 23, 2015, 3:26 p.m. CC:
saroyanm Visibility:
Public. |
DescriptionIssue 2139 - Allow nested translations for tag attributes.
Patch Set 1 #
Total comments: 3
Patch Set 2 : Use complex regexp instead of keeping track of nested levels ourselves. #Patch Set 3 : Rebased on top of Wladimir's changes for link resolving. #
Total comments: 2
Patch Set 4 : Improved regexp some more #MessagesTotal messages: 9
Patch Set 1
http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... File cms/converters.py (right): http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... cms/converters.py:194: r"\{\{\s*([\w\-]+)(?:\[(.*?)\])?\s+(.*)\}\}", Only one nesting level is actually allowed here. So instead of finding positions above we can use a more complicated regexp: r"\{\{\s*" r"([\w\-]+)" # String ID r"(?:\[(.*?)\])?" # Optional comment r"\s+" r"(?:[^\{]|\{(?!{)|\{\{.*\}\})*?" # Translatable text (could contain other translatable strings) r"\}\}" Yes, you can split the regexp into multiple lines like that.
http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... File cms/converters.py (right): http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... cms/converters.py:194: r"\{\{\s*([\w\-]+)(?:\[(.*?)\])?\s+(.*)\}\}", On 2015/04/16 21:01:13, Wladimir Palant wrote: > Only one nesting level is actually allowed here. So instead of finding positions > above we can use a more complicated regexp: > > r"\{\{\s*" > r"([\w\-]+)" # String ID > r"(?:\[(.*?)\])?" # Optional comment > r"\s+" > r"(?:[^\{]|\{(?!{)|\{\{.*\}\})*?" # Translatable text (could contain other > translatable strings) > r"\}\}" > > Yes, you can split the regexp into multiple lines like that. With a couple of tweaks I managed to get your regexp matching how we want: r"\{\{\s*" r"([\w\-]+)" # String ID r"(?:\[(.*?)\])?" # Optional comment r"\s+" r"((?:[^\{]|\{(?!\{)|\{\{.*\}\})*?)" # Translatable text r"\}\}" Problem is when that regexp is used with the re.S (dot matches all) flag it runs forever (or a long time at least) maxing out one of my CPU cores. Without the flag it works fine, but I guess we want to support multi-line default translation strings. Any ideas what's happening?
Patch Set 2 : Use complex regexp instead of keeping track of nested levels ourselves. http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... File cms/converters.py (right): http://codereview.adblockplus.org/6440550915899392/diff/5629499534213120/cms/... cms/converters.py:194: r"\{\{\s*([\w\-]+)(?:\[(.*?)\])?\s+(.*)\}\}", On 2015/04/21 11:28:57, kzar wrote: > On 2015/04/16 21:01:13, Wladimir Palant wrote: > > Only one nesting level is actually allowed here. So instead of finding > positions > > above we can use a more complicated regexp: > > > > r"\{\{\s*" > > r"([\w\-]+)" # String ID > > r"(?:\[(.*?)\])?" # Optional comment > > r"\s+" > > r"(?:[^\{]|\{(?!{)|\{\{.*\}\})*?" # Translatable text (could contain other > > translatable strings) > > r"\}\}" > > > > Yes, you can split the regexp into multiple lines like that. > > With a couple of tweaks I managed to get your regexp matching how we want: > > r"\{\{\s*" > r"([\w\-]+)" # String ID > r"(?:\[(.*?)\])?" # Optional comment > r"\s+" > r"((?:[^\{]|\{(?!\{)|\{\{.*\}\})*?)" # Translatable text > r"\}\}" > > Problem is when that regexp is used with the re.S (dot matches all) flag it runs > forever (or a long time at least) maxing out one of my CPU cores. Without the > flag it works fine, but I guess we want to support multi-line default > translation strings. > > Any ideas what's happening? OK I made another change to the regexp to hopefully avoid exponential complexity. For the contents of nested tags I replaced the .* with (?:[^\}]|\}(?!\}))*? . As we know there can only be one level of nesting I ensure "}}" does not appear inside the child tag. I think that avoids backticking crazily each time the final "}}" of the regexp doesn't match. The regexp is getting pretty huge though now, I worry it's less readable than my old solution.
Patch Set 3 : Rebased on top of Wladimir's changes for link resolving.
LGTM
http://codereview.adblockplus.org/6440550915899392/diff/5170986005561344/cms/... File cms/converters.py (right): http://codereview.adblockplus.org/6440550915899392/diff/5170986005561344/cms/... cms/converters.py:182: r"((?:[^\{]|\{(?!\{)|\{\{(?:[^\}]|\}(?!\}))*?\}\})*?)" # Translatable text Yes, this line isn't very readable now. How about splitting it up: r"((?:(?!{{).|" # Translatable text r"{{(?:(?!}}).)*}}" # Nested translation r")*?)" Note that the quantifier for the nested translation doesn't need to be greedy any more, so I removed the question mark there. Also, I realized that { isn't a special character in regular expressions so we probably shouldn't escape it. Finally, it seems that (?:[^{]|{(?!{)| can be replaced by (?:(?!{{).| which is slightly simpler.
Patch Set 4 : Improved regexp some more http://codereview.adblockplus.org/6440550915899392/diff/5170986005561344/cms/... File cms/converters.py (right): http://codereview.adblockplus.org/6440550915899392/diff/5170986005561344/cms/... cms/converters.py:182: r"((?:[^\{]|\{(?!\{)|\{\{(?:[^\}]|\}(?!\}))*?\}\})*?)" # Translatable text On 2015/04/22 13:52:24, Wladimir Palant wrote: > Yes, this line isn't very readable now. How about splitting it up: > > r"((?:(?!{{).|" # Translatable text > r"{{(?:(?!}}).)*}}" # Nested translation > r")*?)" > > Note that the quantifier for the nested translation doesn't need to be greedy > any more, so I removed the question mark there. Also, I realized that { isn't a > special character in regular expressions so we probably shouldn't escape it. > Finally, it seems that (?:[^{]|{(?!{)| can be replaced by (?:(?!{{).| which is > slightly simpler. Nice, that looks much better and still seems to work fine. Done.
LGTM |