Left: | ||
Right: |
LEFT | RIGHT |
---|---|
1 # python-abp | 1 # python-abp |
2 | 2 |
3 This repository contains a library for working with Adblock Plus filter lists | 3 This repository contains a library for working with Adblock Plus filter lists |
4 and the script that is used for building Adblock Plus filter lists from the | 4 and the script that is used for building Adblock Plus filter lists from the |
5 form in which they are authored into the format suitable for consumption by the | 5 form in which they are authored into the format suitable for consumption by the |
6 adblocking software. | 6 adblocking software. |
mathias
2017/08/08 12:24:35
For an introduction that is a bit too much. How ab
| |
7 | 7 |
8 ## Installation | 8 ## Installation |
9 | 9 |
10 Prerequisites: | 10 Prerequisites: |
11 | 11 |
12 * Linux, Mac OS X or Windows (any modern Unix should work too), | 12 * Linux, Mac OS X or Windows (any modern Unix should work too), |
13 * Python (2.7 or 3.5+), | 13 * Python (2.7 or 3.5+), |
14 * pip. | 14 * pip. |
15 | 15 |
16 To install: | 16 To install: |
(...skipping 30 matching lines...) Expand all Loading... | |
47 The first instruction contains a URL that will be fetched and inserted at the | 47 The first instruction contains a URL that will be fetched and inserted at the |
48 point of reference. | 48 point of reference. |
49 The second one contains a path inside easylist repository. | 49 The second one contains a path inside easylist repository. |
50 `flrender` needs to be able to find a copy of the repository on the local | 50 `flrender` needs to be able to find a copy of the repository on the local |
51 filesystem. We use `-i` option to point it to to the right directory: | 51 filesystem. We use `-i` option to point it to to the right directory: |
52 | 52 |
53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt | 53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt |
54 | 54 |
55 Now the second reference above will be resolved to | 55 Now the second reference above will be resolved to |
56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will | 56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will |
57 be read from this file. | 57 be loaded from this file. |
58 | 58 |
59 Directories that contain filter list fragments that are used during rendering | 59 Directories that contain filter list fragments that are used during rendering |
60 are called sources. | 60 are called sources. |
61 They are normally working copies of the repositories that contain filter list | 61 They are normally working copies of the repositories that contain filter list |
62 fragments. | 62 fragments. |
63 Each source is identified by a name: that's the part that comes before ":" | 63 Each source is identified by a name: that's the part that comes before ":" |
64 in the include instruction and it should be the same as what comes before "=" | 64 in the include instruction and it should be the same as what comes before "=" |
65 in the `-i` option. | 65 in the `-i` option. |
66 | 66 |
67 Commonly used sources have generally accepted names. For example the main | 67 Commonly used sources have generally accepted names. For example the main |
(...skipping 22 matching lines...) Expand all Loading... | |
90 If `filterlist.txt` contains a filter list: | 90 If `filterlist.txt` contains a filter list: |
91 | 91 |
92 [Adblock Plus 2.0] | 92 [Adblock Plus 2.0] |
93 ! Title: Example list | 93 ! Title: Example list |
94 | 94 |
95 abc.com,cdf.com##div#ad1 | 95 abc.com,cdf.com##div#ad1 |
96 abc.com/ad$image | 96 abc.com/ad$image |
97 @@/abc\.com/ | 97 @@/abc\.com/ |
98 ... | 98 ... |
99 | 99 |
100 the output will look similar to the following: | 100 the output will look something like: |
101 | 101 |
102 Header(version='Adblock Plus 2.0') | 102 Header(version='Adblock Plus 2.0') |
103 Metadata(key='Title', value='Example list') | 103 Metadata(key='Title', value='Example list') |
104 EmptyLine() | 104 EmptyLine() |
105 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) | 105 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) |
106 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) | 106 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) |
107 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) | 107 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) |
108 ... | 108 ... |
109 | 109 |
110 In general `parse_filterlist` takes an iterable of strings (such as a list or | |
111 an open file) and returns an iterable of parsed filter list lines. Each line | |
112 will have its `.type` attribute set to a string indicating its type. It will | |
113 also have a `.to_string()` method that converts it to a unicode string in the | |
114 filter list format (most of the time it's the same as the string from which the | |
115 filter was parsed). Further attributes depend on the type of the line. | |
116 | |
117 **Note:** `parse_filterlist` returns an iterator, not a list, and only consumes | |
118 the input lines when its output is iterated over. This allows much more memory | |
119 efficient handling of large filter lists, however there are two things to watch | |
120 out for: | |
121 | |
122 **Note:** iteration over parsed lines may throw a `ParseError` exception if a | |
123 line cannot be parsed. The exception will contain the information about the | |
124 error and the original line that failed parsing. | |
mathias
2017/08/08 12:24:35
It is not clear what bits this is about (I assume
Vasily Kuznetsov
2017/08/08 14:31:12
Yeah, we've discussed this. But for now that chang
| |
125 | |
126 - When you're parsing filters from a file, you need to complete the iteration | |
127 before you close the file. | |
128 - Once you iterate over the output of `parse_filterlist` once, it will be | |
129 consumed and you won't be iterate over it again. | |
130 | |
131 If you find that this is bothering you, you probably want to convert the output | |
mathias
2017/08/08 12:24:34
Everything in this section from here on, maybe inc
| |
132 of `parse_filterlist` to a list: | |
133 | |
134 lines_list = list(parse_filterlist(filterlist)) | |
135 | |
136 This will load the whole file into memory but unless you're dealing with a | |
137 gigantic filter list that should not be a problem. | |
138 | |
139 ### Line types | |
140 | |
141 As mentioned above, lines of different types have different attributes: | |
142 | |
143 | type | attributes | | |
mathias
2017/08/08 12:24:35
Are you sure this kind of table markup is supporte
Vasily Kuznetsov
2017/08/08 14:31:12
Indeed the table markup was not part of the origin
| |
144 |------------|------------------------------------------------------------------ ------| | |
145 | header | `version` - plugin version string | | |
146 | emptyline | no options | | |
147 | comment | `text` - text of the comment | | |
148 | metadata | `key` - name of the metadata field, `value` - value of the field | | |
149 | include | `target` - url/path of the file to include | | |
150 | filter | `text` - text of the filter, `selector` - what to look for, `acti on` - what to do with selected items, `options` - filter options | | |
151 | |
152 #### Filter atributes | |
mathias
2017/08/08 12:24:35
This section mentions "Selector" but not ".selecto
| |
153 | |
154 Selector is a dictionary with two keys: | |
155 | |
156 | key | meaning | | |
157 |--------------|---------------------------------------------------------------- --| | |
158 | type | 'css', 'abp-simple', 'url-pattern', 'url-regexp', 'extended-css ' | | |
159 | value | the selector itself, the meaning is type-dependent | | |
160 | |
161 It's preferable to import `SELECTOR_TYPE` namespace from `abp.filters` to refer | |
162 to filter types instead of using strings. `SELECTOR_TYPE` contains constants | |
163 for each filter type: `SELECTOR_TYPE.CSS`, `SELECTOR_TYPE.ABP_SIMPLE`, | |
164 `SELECTOR_TYPE.URL_PATTERN`, `SELECTOR_TYPE.URL_REGEXP` and | |
165 `SELECTOR_TYPE.XCSS`. | |
166 | |
167 Action instructs adblocking software on what should be done with the items | |
168 matching the selector: | |
169 | |
170 | action | meaning | | |
171 |--------|---------------------------------------------------------------------- --| | |
172 | block | block http(s) request that matches the selector | | |
173 | allow | allow http(s) request that matches the filter (whitelist the resource ) | | |
174 | hide | hide the DOM element that matches the selector | | |
175 | show | show the DOM element that matches the selector (whitelist the element ) | | |
176 | |
177 The action constants are contained in `FILTER_ACTION` namespace, which can also | |
178 be imported from `abp.filters` (`FILTER_ACTION.BLOCK`, `FILTER_ACTION.ALLOW`, | |
179 etc.) | |
180 | |
181 Options is a list of tuples consisting of option name and option value. The | |
182 option value is `True` or `False` for flags or, for options with a value, it's | |
183 a string, list of strings or a list of `(string, boolean)` tuples. See | |
184 [documentation on authoring the filter rules][2] for the list of existing | |
185 options and their meanings. | |
186 | |
187 ### Other functions | |
188 | |
189 `abp.filters` module also exports a lower-level function for parsing individual | 110 `abp.filters` module also exports a lower-level function for parsing individual |
190 lines of a filter list: `parse_line`. It returns a parsed line object just like | 111 lines of a filter list: `parse_line`. It returns a parsed line object just like |
191 the items in the iterator returned by `parse_filterlist`. | 112 the items in the iterator returned by `parse_filterlist`. |
113 | |
114 For further information on the library API use `help()` on `abp.filters` and | |
115 its contents in interactive Python session, read the docstrings or look at the | |
116 tests for some usage examples. | |
192 | 117 |
193 ## Testing | 118 ## Testing |
194 | 119 |
195 Unit tests for `python-abp` are located in the `/tests` directory. | 120 Unit tests for `python-abp` are located in the `/tests` directory. |
196 [Pytest][3] is used for quickly running the tests | 121 [Pytest][3] is used for quickly running the tests |
197 during development. | 122 during development. |
198 [Tox][4] is used for testing in different | 123 [Tox][4] is used for testing in different |
199 environments (Python 2.7, Python 3.5+ and PyPy) and code quality | 124 environments (Python 2.7, Python 3.5+ and PyPy) and code quality |
200 reporting. | 125 reporting. |
201 | 126 |
202 In order to execute the tests, first create and activate development | 127 In order to execute the tests, first create and activate development |
203 virtualenv: | 128 virtualenv: |
204 | 129 |
205 $ python setup.py devenv | 130 $ python setup.py devenv |
206 $ . devenv/bin/activate | 131 $ . devenv/bin/activate |
207 | 132 |
208 With the development virtualenv activated use pytest for a quick test run: | 133 With the development virtualenv activated use pytest for a quick test run: |
209 | 134 |
210 (devenv) $ pytest tests | 135 (devenv) $ pytest tests |
211 | 136 |
212 and tox for a comprehensive report: | 137 and tox for a comprehensive report: |
213 | 138 |
214 (devenv) $ tox | 139 (devenv) $ tox |
215 | 140 |
141 ## Development | |
142 | |
143 When adding new functionality, add tests for it (preferably first). Code | |
144 coverage (as measured by `tox -e qa`) should not decrease and the tests | |
145 should pass in all Tox environments. | |
146 | |
147 All public functions, classes and methods should have docstrings compliant with | |
148 [NumPy/SciPy documentation guide][5]. One exception is the constructors of | |
149 classes that the user is not expected to instantiate (such as exceptions). | |
216 | 150 |
217 [1]: https://adblockplus.org/filters#special-comments | 151 [1]: https://adblockplus.org/filters#special-comments |
218 [2]: https://adblockplus.org/filters#options | 152 [2]: https://adblockplus.org/filters#options |
219 [3]: http://pytest.org/ | 153 [3]: http://pytest.org/ |
220 [4]: https://tox.readthedocs.org/ | 154 [4]: https://tox.readthedocs.org/ |
155 [5]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt | |
LEFT | RIGHT |