Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Delta Between Two Patch Sets: abp/filters/blocks.py

Issue 30053555: Issue 7471 - Add an API for working with blocks of filters (Closed) Base URL: https://hg.adblockplus.org/python-abp
Left Patch Set: Created May 8, 2019, 4:33 p.m.
Right Patch Set: Adjust the API in response to review comments Created May 9, 2019, 4:22 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
Left: Side by side diff | Download
Right: Side by side diff | Download
« no previous file with change/comment | « README.rst ('k') | abp/filters/parser.py » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
1 # This file is part of Adblock Plus <https://adblockplus.org/>, 1 # This file is part of Adblock Plus <https://adblockplus.org/>,
2 # Copyright (C) 2006-present eyeo GmbH 2 # Copyright (C) 2006-present eyeo GmbH
3 # 3 #
4 # Adblock Plus is free software: you can redistribute it and/or modify 4 # Adblock Plus is free software: you can redistribute it and/or modify
5 # it under the terms of the GNU General Public License version 3 as 5 # it under the terms of the GNU General Public License version 3 as
6 # published by the Free Software Foundation. 6 # published by the Free Software Foundation.
7 # 7 #
8 # Adblock Plus is distributed in the hope that it will be useful, 8 # Adblock Plus is distributed in the hope that it will be useful,
9 # but WITHOUT ANY WARRANTY; without even the implied warranty of 9 # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 # GNU General Public License for more details. 11 # GNU General Public License for more details.
12 # 12 #
13 # You should have received a copy of the GNU General Public License 13 # You should have received a copy of the GNU General Public License
14 # along with Adblock Plus. If not, see <http://www.gnu.org/licenses/>. 14 # along with Adblock Plus. If not, see <http://www.gnu.org/licenses/>.
15 15
16 """Extract blocks of filters separated by comments.""" 16 """Extract blocks of filters separated by comments.
17
18 Blocks of filters separated by comments are common in real world filter lists
19 (e.g. easylist). This structure itself is not documented or standardized but
20 it's often useful to be able to parse it.
21
22 This module exports one function: to_blocks(), that further processes a filter
23 list (after has been parsed by abp.filters.parser) by splitting it into blocks
24 of filters. The comments preceeding each block are merged to produce block
25 description.
26
27 Some filter lists (e.g. ABP exception list) also make use of variable notation
28 ("!:varname=value") to define specific attributes of filters blocks. This
29 module supports this notation and will collect those variables in a dictionary
30 that's placed into `variables` attribute of the block. If variables are present
31 in comments preceeding a block, only non-variable comments that follow the
32 first variable declaration will be included into the block description.
33
34 Blocks also provide a method to convert them to dictionaries: .to_dict() --
35 this can be used for JSON conversion.
36
37 Example
38 -------
39
40 The following code will dump the blocks as dictionaries:
41
42 from abp.filters import parse_filterlist
43 from abp.filters.blocks import to_blocks
44
45 with open(fl_path) as f:
46 for block in to_blocks(parse_filterlist(f)):
47 print(block.to_dict())
48
49 This will produce output like this:
50
51 {'variables': {'partner_token': 'abc', 'partner_id': '3372',
52 'type': 'partner'}, 'description': 'Some comments', 'filters': [...]}
53
54 """
17 55
18 from __future__ import unicode_literals 56 from __future__ import unicode_literals
19 57
20 import re 58 import re
21 59
22 from abp.filters.parser import ParseError 60 __all__ = ['to_blocks']
23 61
24 VAR_REGEXP = re.compile(r'^:(\w+)=(.*)$') 62 VAR_REGEXP = re.compile(r'^:(\w+)=(.*)$')
25 63
26 64
27 class FilterBlock(object): 65 class FiltersBlock(object):
28 """A block of filters. 66 """A block of filters (preceded by comments)."""
29
30 Blocks are consecutive groups of filters separated by comments.
31
32 """
33 67
34 def __init__(self, comments, filters): 68 def __init__(self, comments, filters):
69 """Create a filter block from filters and comments preceding them."""
35 self.filters = filters 70 self.filters = filters
71 self.variables = {}
36 descr_lines = [] 72 descr_lines = []
73
37 for comment in comments: 74 for comment in comments:
38 match = VAR_REGEXP.search(comment.text) 75 match = VAR_REGEXP.search(comment.text)
39 if match: 76 if match:
77 if not self.variables:
78 # Normal comments before first variable are not included in
79 # the description.
80 descr_lines = []
40 name, value = match.groups() 81 name, value = match.groups()
41 if name.startswith('_') or name in {'filters', 'description'}: 82 self.variables[name] = value
42 raise ParseError('Invalid variable name',
43 comment.to_string())
44 setattr(self, name, value)
45 else: 83 else:
46 descr_lines.append(comment.text) 84 descr_lines.append(comment.text)
85
47 self.description = '\n'.join(descr_lines) 86 self.description = '\n'.join(descr_lines)
48 87
49 def _asdict(self): 88 def to_dict(self):
50 ret = dict(self.__dict__) 89 ret = dict(self.__dict__)
51 ret['filters'] = [f._asdict() for f in ret['filters']] 90 ret['filters'] = [f.to_dict() for f in ret['filters']]
52 return ret 91 return ret
53 92
54 93
55 def to_blocks(parsed_lines): 94 def to_blocks(parsed_lines):
56 """Convert a sequence of parser filter list lines to blocks. 95 """Convert a sequence of parser filter list lines to blocks.
57 96
58 Parameters 97 Parameters
59 ---------- 98 ----------
60 parsed_lines : iterable of namedtuple 99 parsed_lines : iterable of namedtuple
61 Parsed filter list (see `parser.py` for details on how it's 100 Parsed filter list (see `parser.py` for details on how it's
62 represented). 101 represented).
63 102
64 Returns 103 Returns
65 ------- 104 -------
66 blocks : iterable of FilterBlock. 105 blocks : iterable of FiltersBlock.
106 Blocks extracted from the parsed filter list. Each block carries
107 filters in `.filters` attribute, comments in `.description` attribute
108 and variable-defining comments in `.variables`.
67 109
68 """ 110 """
69 comments = [] 111 comments = []
70 filters = [] 112 filters = []
71 113
72 for line in parsed_lines: 114 for line in parsed_lines:
73 if line.type == 'comment': 115 if line.type == 'comment':
74 if filters: 116 if filters:
75 yield FilterBlock(comments, filters) 117 yield FiltersBlock(comments, filters)
76 comments = [] 118 comments = []
77 filters = [] 119 filters = []
78 comments.append(line) 120 comments.append(line)
79 elif line.type == 'filter': 121 elif line.type == 'filter':
80 filters.append(line) 122 filters.append(line)
81 123
82 if filters: 124 if filters:
83 yield FilterBlock(comments, filters) 125 yield FiltersBlock(comments, filters)
LEFTRIGHT

Powered by Google App Engine
This is Rietveld