Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: abp/filters/parser.py

Issue 29845767: Issue 6685 - Offer incremental filter list downloads (Closed) Base URL: https://hg.adblockplus.org/python-abp/
Patch Set: Use iterables instead of str, stop repeating code Created Aug. 20, 2018, 6:18 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | abp/filters/renderer.py » ('j') | abp/filters/renderer.py » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # This file is part of Adblock Plus <https://adblockplus.org/>, 1 # This file is part of Adblock Plus <https://adblockplus.org/>,
2 # Copyright (C) 2006-present eyeo GmbH 2 # Copyright (C) 2006-present eyeo GmbH
3 # 3 #
4 # Adblock Plus is free software: you can redistribute it and/or modify 4 # Adblock Plus is free software: you can redistribute it and/or modify
5 # it under the terms of the GNU General Public License version 3 as 5 # it under the terms of the GNU General Public License version 3 as
6 # published by the Free Software Foundation. 6 # published by the Free Software Foundation.
7 # 7 #
8 # Adblock Plus is distributed in the hope that it will be useful, 8 # Adblock Plus is distributed in the hope that it will be useful,
9 # but WITHOUT ANY WARRANTY; without even the implied warranty of 9 # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
(...skipping 122 matching lines...) Expand 10 before | Expand all | Expand 10 after
133 133
134 134
135 Header = _line_type('Header', 'version', '[{.version}]') 135 Header = _line_type('Header', 'version', '[{.version}]')
136 EmptyLine = _line_type('EmptyLine', '', '') 136 EmptyLine = _line_type('EmptyLine', '', '')
137 Comment = _line_type('Comment', 'text', '! {.text}') 137 Comment = _line_type('Comment', 'text', '! {.text}')
138 Metadata = _line_type('Metadata', 'key value', '! {0.key}: {0.value}') 138 Metadata = _line_type('Metadata', 'key value', '! {0.key}: {0.value}')
139 Filter = _line_type('Filter', 'text selector action options', '{.text}') 139 Filter = _line_type('Filter', 'text selector action options', '{.text}')
140 Include = _line_type('Include', 'target', '%include {0.target}%') 140 Include = _line_type('Include', 'target', '%include {0.target}%')
141 141
142 142
143 METADATA_REGEXP = re.compile(r'!\s*(\w+)\s*:\s*(.*)') 143 METADATA_REGEXP = re.compile(r'!\s*([\w-]+)\s*:\s*(.*)')
144 METADATA_KEYS = {'Homepage', 'Title', 'Expires', 'Checksum', 'Redirect', 144 METADATA_KEYS = {'Homepage', 'Title', 'Expires', 'Checksum', 'Redirect',
145 'Version'} 145 'Version', 'Diff-URL', 'Diff-Expires'}
Sebastian Noack 2018/08/21 19:42:45 I would prefer if python-abp would be agnostic of
Vasily Kuznetsov 2018/08/22 13:10:50 Makes sense and I support it. I actually thought i
Sebastian Noack 2018/08/22 14:31:06 You are right, well spotted.
rhowell 2018/08/27 22:06:26 It appears this was done to prevent mistaking a co
146 INCLUDE_REGEXP = re.compile(r'%include\s+(.+)%') 146 INCLUDE_REGEXP = re.compile(r'%include\s+(.+)%')
147 HEADER_REGEXP = re.compile(r'\[(Adblock(?:\s*Plus\s*[\d\.]+?)?)\]', flags=re.I) 147 HEADER_REGEXP = re.compile(r'\[(Adblock(?:\s*Plus\s*[\d\.]+?)?)\]', flags=re.I)
148 HIDING_FILTER_REGEXP = re.compile(r'^([^/*|@"!]*?)#([@?])?#(.+)$') 148 HIDING_FILTER_REGEXP = re.compile(r'^([^/*|@"!]*?)#([@?])?#(.+)$')
149 FILTER_OPTIONS_REGEXP = re.compile( 149 FILTER_OPTIONS_REGEXP = re.compile(
150 r'\$(~?[\w-]+(?:=[^,]+)?(?:,~?[\w-]+(?:=[^,]+)?)*)$' 150 r'\$(~?[\w-]+(?:=[^,]+)?(?:,~?[\w-]+(?:=[^,]+)?)*)$'
151 ) 151 )
152 152
153 153
154 def _parse_comment(text): 154 def _parse_comment(text):
155 match = METADATA_REGEXP.match(text) 155 match = METADATA_REGEXP.match(text)
156 if match and match.group(1) in METADATA_KEYS: 156 if match and match.group(1) in METADATA_KEYS:
Sebastian Noack 2018/08/21 19:42:46 Note that metadata keys are case-insensitive. If w
Vasily Kuznetsov 2018/08/22 13:10:50 Should we then maybe adopt some canonical capitali
Sebastian Noack 2018/08/22 14:31:06 Initially, I had a similar though, but on the othe
Vasily Kuznetsov 2018/08/24 13:03:19 The output that doesn't diverge from the input unn
rhowell 2018/08/27 22:06:26 Acknowledged.
157 return Metadata(match.group(1), match.group(2)) 157 return Metadata(match.group(1), match.group(2))
158 return Comment(text[1:].strip()) 158 return Comment(text[1:].strip())
159 159
160 160
161 def _parse_header(text): 161 def _parse_header(text):
162 match = HEADER_REGEXP.match(text) 162 match = HEADER_REGEXP.match(text)
163 if not match: 163 if not match:
164 raise ParseError('Malformed header', text) 164 raise ParseError('Malformed header', text)
165 return Header(match.group(1)) 165 return Header(match.group(1))
166 166
(...skipping 115 matching lines...) Expand 10 before | Expand all | Expand 10 after
282 line_text = line_text.decode('utf-8') 282 line_text = line_text.decode('utf-8')
283 283
284 content = line_text.strip() 284 content = line_text.strip()
285 285
286 if content == '': 286 if content == '':
287 line = EmptyLine() 287 line = EmptyLine()
288 elif content.startswith('!'): 288 elif content.startswith('!'):
289 line = _parse_comment(content) 289 line = _parse_comment(content)
290 elif content.startswith('%') and content.endswith('%'): 290 elif content.startswith('%') and content.endswith('%'):
291 line = _parse_instruction(content) 291 line = _parse_instruction(content)
292 elif content.startswith('[') and content.endswith(']'): 292 elif content.startswith('[') and content.endswith(']'):
Sebastian Noack 2018/08/21 19:42:45 Somewhat unrelated of these changes, but this is i
Vasily Kuznetsov 2018/08/22 13:10:50 Makes sense. I created a separate ticket for it: h
rhowell 2018/08/27 22:06:26 Acknowledged.
293 line = _parse_header(content) 293 line = _parse_header(content)
294 else: 294 else:
295 line = parse_filter(content) 295 line = parse_filter(content)
296 296
297 assert line.to_string().replace(' ', '') == content.replace(' ', '') 297 assert line.to_string().replace(' ', '') == content.replace(' ', '')
298 return line 298 return line
299 299
300 300
301 def parse_filterlist(lines): 301 def parse_filterlist(lines):
302 """Parse filter list from an iterable. 302 """Parse filter list from an iterable.
(...skipping 11 matching lines...) Expand all
314 Raises 314 Raises
315 ------ 315 ------
316 ParseError 316 ParseError
317 Thrown during iteration for invalid filter list lines. 317 Thrown during iteration for invalid filter list lines.
318 TypeError 318 TypeError
319 If `lines` is not iterable. 319 If `lines` is not iterable.
320 320
321 """ 321 """
322 for line in lines: 322 for line in lines:
323 yield parse_line(line) 323 yield parse_line(line)
OLDNEW
« no previous file with comments | « no previous file | abp/filters/renderer.py » ('j') | abp/filters/renderer.py » ('J')

Powered by Google App Engine
This is Rietveld