Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)
Patch Set: Created June 14, 2017, 5:45 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # python-abp 1 # python-abp
2 2
3 This repository contains the script that is used for building Adblock Plus 3 This repository contains a library for working with Adblock Plus filter lists
4 filter lists from the form in which they are authored into the format suitable 4 and the script that is used for building Adblock Plus filter lists from the
5 for consumption by the adblocking software. 5 form in which they are authored into the format suitable for consumption by the
6 adblocking software.
6 7
7 ## Installation 8 ## Installation
8 9
9 Prerequisites: 10 Prerequisites:
10 11
11 * Linux, Mac OS X or Windows (any modern Unix should work too), 12 * Linux, Mac OS X or Windows (any modern Unix should work too),
12 * Python (2.7 or 3.5), 13 * Python (2.7 or 3.5, 3.6),
13 * pip. 14 * pip.
14 15
15 To install: 16 To install:
16 17
17 $ pip install -U python-abp 18 $ pip install -U python-abp
18 19
19 ## Rendering of filter lists 20 ## Rendering of filter lists
20 21
21 The filter lists are originally authored in relatively smaller parts focused 22 The filter lists are originally authored in relatively smaller parts focused
22 on a particular type of filters, related to a specific topic or relevant 23 on a particular type of filters, related to a specific topic or relevant
23 for particular geographical area. 24 for particular geographical area.
24 We call these parts _filter list fragments_ (or just _fragments_) 25 We call these parts _filter list fragments_ (or just _fragments_)
25 to distinguish them from full filter lists that are 26 to distinguish them from full filter lists that are
26 consumed by the adblocking software such as Adblock Plus. 27 consumed by the adblocking software such as Adblock Plus.
27 28
28 Rendering is a process that combines filter list fragments into a filter list. 29 Rendering is a process that combines filter list fragments into a filter list.
29 It starts with one fragment that can include other ones and so forth. 30 It starts with one fragment that can include other ones and so forth.
30 The produced filter list is marked with a version, a timestamp and 31 The produced filter list is marked with a version, a timestamp and
31 a [checksum](https://adblockplus.org/filters#special-comments). 32 a [checksum][1].
32 33
33 Python-abp contains a script that can do this called `flrender`: 34 Python-abp contains a script that can do this called `flrender`:
34 35
35 $ flrender fragment.txt filterlist.txt 36 $ flrender fragment.txt filterlist.txt
36 37
37 This will take the top level fragment in `fragment.txt`, render it and save into 38 This will take the top level fragment in `fragment.txt`, render it and save into
38 `filterlist.txt`. 39 `filterlist.txt`.
39 40
40 Fragments might reference other fragments that should be included into them. 41 Fragments might reference other fragments that should be included into them.
41 The references come in two forms: http(s) includes and local includes: 42 The references come in two forms: http(s) includes and local includes:
42 43
43 %include http://www.server.org/dir/list.txt% 44 %include http://www.server.org/dir/list.txt%
44 %include easylist:easylist/easylist_general_block.txt 45 %include easylist:easylist/easylist_general_block.txt%
45 46
46 The first instruction contains a URL that will be fetched and inserted at the 47 The first instruction contains a URL that will be fetched and inserted at the
47 point of reference. 48 point of reference.
48 The second one contains a path inside easylist repository. 49 The second one contains a path inside easylist repository.
49 `flrender` needs to be able to find a copy of the repository on the local 50 `flrender` needs to be able to find a copy of the repository on the local
50 filesystem. We use `-i` option to point it to to the right directory: 51 filesystem. We use `-i` option to point it to to the right directory:
51 52
52 $ flrender -i easylist=/home/abc/easylist input.txt output.txt 53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt
53 54
54 Now the second reference above will be resolved to 55 Now the second reference above will be resolved to
(...skipping 13 matching lines...) Expand all
68 If you don't know all the source names that are needed to render some list, 69 If you don't know all the source names that are needed to render some list,
69 just run `flrender` and it will report what it's missing: 70 just run `flrender` and it will report what it's missing:
70 71
71 $ flrender easylist.txt output/easylist.txt 72 $ flrender easylist.txt output/easylist.txt
72 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener 73 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
73 al_block.txt' from 'easylist.txt' 74 al_block.txt' from 'easylist.txt'
74 75
75 You can clone the necessary repositories to a local directory and add `-i` 76 You can clone the necessary repositories to a local directory and add `-i`
76 options accordingly. 77 options accordingly.
77 78
79 ## Library API
80
81 Python-abp can also be used as a library for parsing filter lists. For example
82 to read a filter list (we use Python 3 syntax here but the API is the same):
83
84 from abp.filter import parse_filterlist
85
86 with open('filterlist.txt') as filterlist:
87 for line in parse_filterlist(filterlist):
88 print(line)
89
90 If `filterlist.txt` contains a filter list, the output will look similar to
91 the following:
92
93 Header(version='Adblock Plus 2.0')
94 Metadata(key='Title', value='Example List')
95 EmptyLine()
96 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value':
97 'div#ad1'}, action='hide', options={'domains-include': ['abc.com',
98 'cdf.com'], 'domains-none': True})
99 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value':
100 'abc.com/ad'}, action='block', options={'types-none': True,
101 'types-include': ['image']})
102 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value':
103 'abc\\.com'}, action='allow', options={})
104 ...
105
106 In general `parse_filterlist` takes an iterable of strings (such as a list or
107 an open file) and returns an iterable of parsed filter list lines. Each line
108 will have its `.type` attribute set to a string indicating its type. It will
109 also have a `.to_string()` method that converts it to a unicode string in the
110 filter list format (most of the time it's the same as the string from which the
111 filter was parsed). Further attributes depend on the type of the line.
112
113 **Note:** `parse_filterlist` returns an iterator, not a list, and only consumes
114 the input lines when its output is iterated over. This allows much more memory
115 efficient handling of large filter lists, however there are two things to watch
116 out for:
117
118 - When you're parsing filters from a file, you need to complete the iteration
119 before you close the file.
120 - Once you iterate over the output of `parse_filterlist` once, it will be
121 consumed and you won't be iterate over it again.
122
123 If you find that any of these issues is bothering you, you probably want to
124 convert the output of `parse_filterlist` to a list:
125
126 lines_list = list(parse_filterlist(filterlist))
127
128 This will load the whole file into memory but unless you're dealing with a
129 gigantic filter list that should not be a problem.
130
131 ### Line types
132
133 As mentioned before, lines of different types have different attributes:
134
135 | type | attributes |
136 |------------|------------------------------------------------------------------ ------|
137 | header | `version` - plugin version string |
138 | emptyline | no options |
139 | comment | `text` - text of the comment |
140 | metadata | `key` - name of the metadata field, `value` - value of the field |
141 | include | `target` - url/path of the file to include |
142 | invalid | `text` - full text of the line, error - error message |
143 | filter | `text` - text of the filter, `selector` - what to look for, `acti on` - what to do with selected items, `options` - filter options |
144
145 #### Filter atributes
146
147 Selector is a dictionary with two keys:
148
149 | key | meaning |
150 |--------------|----------------------------------------------------|
151 | type | 'css', 'abp-simple', 'url-pattern', 'url-regexp' |
152 | value | the selector itself, the meaning is type-dependent |
153
154 Options is a dictionary with a variable set of keys. Only options that are
155 actually present in the filter will be stored there. The list of possible option s
156 and their meanings can be found in [documentation on authoring the filter
157 rules][2].
158
159 There are four classes of options that are handled differently:
160
161 - Type options (that make the rule apply or not apply to certain types of
162 requests and resources):
163 - `types-include`: List of additional types to which the rule applies.
164 - `types-exclude`: List of types to which the rule doesn't apply.
165 - `types-none`: If this is `True`, the filter only applies to the types
166 in `types-include`. Otherwise all types except for `document`, `popup`,
167 `elemhide`, `generichide` and `genericblock` are implicitly included.
168 - Domain options (that make the rule apply or not apply to specific domains):
169 - `domains-include`: List of domains to which the rule applies (it will also
170 apply to any subdomains unless they are excluded).
171 - `domains-exclude`: Excluded domains (their subdomains are also excluded
172 unless specifically included).
173 - `domains-none`: If this is `True`, all domains that are not mentioned by
174 `domains-include` and `domains-exclude` are excluded. Otherwise they are
175 included.
176 - `sitekeys`: List of sitekeys that can be used to activate the rule.
177 - Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2]
178 for more information on their meaning.
179
180 ### Other functions
181
182 `abp.filters` module also exports two lower-level functions for parsing
183 individual lines of filter list or individual filters. Not very surprisingly
184 they are called `parse_line` and `parse_filter` respectively. Both will return
185 a parsed line object just like the items in the iterator returned by
186 `parse_filterlist`. The difference between them is that `parse_line` tries to
187 do line type detection and `parse_filter` will always try to interpret things
188 as a filter. Both functions will throw a `ParseError` exception instead of
189 returning a line with `type="invalid"`.
190
78 ## Testing 191 ## Testing
79 192
80 Unit tests for `python-abp` are located in the `/tests` directory. 193 Unit tests for `python-abp` are located in the `/tests` directory.
81 [Pytest](http://pytest.org/) is used for quickly running the tests 194 [Pytest][3] is used for quickly running the tests
82 during development. 195 during development.
83 [Tox](https://tox.readthedocs.org/) is used for testing in different 196 [Tox][4] is used for testing in different
84 environments (Python 2.7, Python 3.5 and PyPy) and code quality 197 environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality
85 reporting. 198 reporting.
86 199
87 In order to execute the tests, first create and activate development 200 In order to execute the tests, first create and activate development
88 virtualenv: 201 virtualenv:
89 202
90 $ python setup.py devenv 203 $ python setup.py devenv
91 $ . devenv/bin/activate 204 $ . devenv/bin/activate
92 205
93 With the development virtualenv activated use pytest for a quick test run: 206 With the development virtualenv activated use pytest for a quick test run:
94 207
95 (devenv) $ py.test tests 208 (devenv) $ pytest tests
96 209
97 and tox for a comprehensive report: 210 and tox for a comprehensive report:
98 211
99 (devenv) $ tox 212 (devenv) $ tox
213
214
215 [1]: https://adblockplus.org/filters#special-comments
216 [2]: https://adblockplus.org/filters#options
217 [3]: http://pytest.org/
218 [4]: https://tox.readthedocs.org/
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld