Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: README.md

Issue 29968569: Issue 4014 - Publish python-abp on PyPI (Closed) Base URL: https://hg.adblockplus.org/python-abp/
Patch Set: Address comments on PS2, add README ToC Created Dec. 29, 2018, 1:29 a.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « LICENSE ('k') | abp/filters/sources.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1
1 # python-abp 2 # python-abp
2 3
3 This repository contains a library for working with Adblock Plus filter lists 4 This repository contains a library for working with Adblock Plus filter lists,
4 and the script that is used for building Adblock Plus filter lists from the 5 a script for rendering diffs between filter lists, and the script that is used
5 form in which they are authored into the format suitable for consumption by the 6 for building Adblock Plus filter lists from the form in which they are authored
6 adblocking software. 7 into the format suitable for consumption by the adblocking software (aka
8 rendering).
7 9
10 ## Table of Contents
11
12 - [Installation](#installation)
13 - [Rendering of filter lists](#rendering)
14 - [Generating diffs](#diffs)
15 - [Library API](#library)
16 - [Testing](#testing)
17 - [Development](#development)
18 - [Using the library with R](#r)
19
20 <a id="installation"></a>
8 ## Installation 21 ## Installation
9 22
10 Prerequisites: 23 Prerequisites:
11 24
12 * Linux, Mac OS X or Windows (any modern Unix should work too), 25 * Linux, Mac OS X or Windows (any modern Unix should work too),
13 * Python (2.7 or 3.5+), 26 * Python (2.7 or 3.5+),
14 * pip. 27 * pip.
15 28
16 To install: 29 To install:
17 30
18 $ pip install -U python-abp 31 $ pip install --upgrade python-abp
19 32
33 <a id="rendering"></a>
20 ## Rendering of filter lists 34 ## Rendering of filter lists
21 35
22 The filter lists are originally authored in relatively smaller parts focused 36 The filter lists are originally authored in relatively smaller parts focused
23 on a particular type of filters, related to a specific topic or relevant 37 on particular types of filters, related to a specific topic or relevant for a
24 for particular geographical area. 38 particular geographical area.
25 We call these parts _filter list fragments_ (or just _fragments_) 39 We call these parts _filter list fragments_ (or just _fragments_) to
26 to distinguish them from full filter lists that are 40 distinguish them from full filter lists that are consumed by the adblocking
27 consumed by the adblocking software such as Adblock Plus. 41 software such as Adblock Plus.
28 42
29 Rendering is a process that combines filter list fragments into a filter list. 43 Rendering is a process that combines filter list fragments into a filter list.
30 It starts with one fragment that can include other ones and so forth. 44 It starts with one fragment that can include other ones and so forth.
31 The produced filter list is marked with a [version and a timestamp][1]. 45 The produced filter list is marked with a [version and a timestamp][1].
32 46
33 Python-abp contains a script that can do this called `flrender`: 47 Python-abp contains a script that can do this called `flrender`:
34 48
35 $ flrender fragment.txt filterlist.txt 49 $ flrender fragment.txt filterlist.txt
36 50
37 This will take the top level fragment in `fragment.txt`, render it and save into 51 This will take the top level fragment in `fragment.txt`, render it and save it
38 `filterlist.txt`. 52 into `filterlist.txt`.
39 53
40 The `flrender` script can also be used by only specifying `fragment.txt`: 54 The `flrender` script can also be used by only specifying `fragment.txt`:
41 55
42 $flrender fragment.txt 56 $ flrender fragment.txt
43 57
44 in which case the rendering result will be sent to `stdout`. Moreover, when 58 in which case the rendering result will be sent to `stdout`. Moreover, when
45 it's run with no positional arguments: 59 it's run with no positional arguments:
46 60
47 $flrender 61 $ flrender
48 62
49 it will read from `stdin` and send the results to `stdout`. 63 it will read from `stdin` and send the results to `stdout`.
50 64
51 Fragments might reference other fragments that should be included into them. 65 Fragments might reference other fragments that should be included into them.
52 The references come in two forms: http(s) includes and local includes: 66 The references come in two forms: http(s) includes and local includes:
53 67
54 %include http://www.server.org/dir/list.txt% 68 %include http://www.server.org/dir/list.txt%
55 %include easylist:easylist/easylist_general_block.txt% 69 %include easylist:easylist/easylist_general_block.txt%
56 70
57 The first instruction contains a URL that will be fetched and inserted at the 71 The http include contains a URL that will be fetched and inserted at the point
58 point of reference. 72 of reference.
59 The second one contains a path inside easylist repository. 73 The local include contains a path inside the easylist repository.
60 `flrender` needs to be able to find a copy of the repository on the local 74 `flrender` needs to be able to find a copy of the repository on the local
61 filesystem. We use `-i` option to point it to to the right directory: 75 filesystem. We use `-i` option to point it to to the right directory:
62 76
63 $ flrender -i easylist=/home/abc/easylist input.txt output.txt 77 $ flrender -i easylist=/home/abc/easylist input.txt output.txt
64 78
65 Now the second reference above will be resolved to 79 Now the local include referenced above will be resolved to:
66 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will 80 `/home/abc/easylist/easylist/easylist_general_block.txt`
67 be loaded from this file. 81 and the fragment will be loaded from this file.
68 82
69 Directories that contain filter list fragments that are used during rendering 83 Directories that contain filter list fragments that are used during rendering
70 are called sources. 84 are called sources.
71 They are normally working copies of the repositories that contain filter list 85 They are normally working copies of the repositories that contain filter list
72 fragments. 86 fragments.
73 Each source is identified by a name: that's the part that comes before ":" 87 Each source is identified by a name: that's the part that comes before ":" in
74 in the include instruction and it should be the same as what comes before "=" 88 the include instruction and it should be the same as what comes before "=" in
75 in the `-i` option. 89 the `-i` option.
76 90
77 Commonly used sources have generally accepted names. For example the main 91 Commonly used sources have generally accepted names. For example the main
78 EasyList repository is referred to as `easylist`. 92 EasyList repository is referred to as `easylist`.
79 If you don't know all the source names that are needed to render some list, 93 If you don't know all the source names that are needed to render some list,
80 just run `flrender` and it will report what it's missing: 94 just run `flrender` and it will report what it's missing:
81 95
82 $ flrender easylist.txt output/easylist.txt 96 $ flrender easylist.txt output/easylist.txt
83 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener 97 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
84 al_block.txt' from 'easylist.txt' 98 al_block.txt' from 'easylist.txt'
85 99
86 You can clone the necessary repositories to a local directory and add `-i` 100 You can clone the necessary repositories to a local directory and add `-i`
87 options accordingly. 101 options accordingly.
88 102
89 ## Rendering diffs 103 <a id="diffs"></a>
104 ## Generating diffs
90 105
91 A diff allows a client running ad blocking software such as Adblock Plus to 106 A diff allows a client running ad blocking software such as Adblock Plus to
92 update the filter lists incrementally, instead of downloading a new copy of a 107 update the filter lists incrementally, instead of downloading a new copy of a
93 full list during each update. This is meant to lessen the amount of resources 108 full list during each update. This is meant to lessen the amount of resources
94 used when updating filter lists (e.g. network data, memory usage, battery 109 used when updating filter lists (e.g. network data, memory usage, battery
95 consumption, etc.), allowing clients to update their lists more frequently using 110 consumption, etc.), allowing clients to update their lists more frequently
96 less resources. 111 using less resources.
97 112
98 Python-abp contains a script called `fldiff` that will find the diff between the 113 python-abp contains a script called `fldiff` that will find the diff between
99 latest filter list, and any number of previous filter lists: 114 the latest filter list, and any number of previous filter lists:
100 115
101 $ fldiff -o diffs/easylist easylist.txt archive/* 116 $ fldiff -o diffs/easylist/ easylist.txt archive/*
102 117
103 where `-o diffs/easylist` is the (optional) output directory where the diffs 118 where `-o diffs/easylist/` is the (optional) output directory where the diffs
104 should be written, `easylist.txt` is the most recent version of the filter list, 119 should be written, `easylist.txt` is the most recent version of the filter
105 and `archive/*` is the directory where all the archived filter lists are. When 120 list, and `archive/*` is the directory where all the archived filter lists are.
106 called like this, the shell should automatically expand the `archive/*` 121 When called like this, the shell should automatically expand the `archive/*`
107 directory, giving the script each of the filenames separately. 122 directory, giving the script each of the filenames separately.
108 123
109 In the above example, the output of each archived `list[version].txt` will be 124 In the above example, the output of each archived `list[version].txt` will be
110 written to `diffs/diff[version].txt`. If the output argument is omitted, the 125 written to `diffs/diff[version].txt`. If the output argument is omitted, the
111 diffs will be written to the current directory. 126 diffs will be written to the current directory.
112 127
113 The script produces three types of lines, as specified in the [technical 128 The script produces three types of lines, as specified in the [technical
114 specification][5]: 129 specification][5]:
115 130
116 * Special comments of the form `! <name>:[ <value>]` 131 * Special comments of the form `! <name>:[ <value>]`
117 * Added filters of the form `+ <filter-text>` 132 * Added filters of the form `+ <filter-text>`
118 * Removed filters of the form `- <filter-text>` 133 * Removed filters of the form `- <filter-text>`
119 134
120 135 <a id="library"></a>
121 ## Library API 136 ## Library API
122 137
123 Python-abp can also be used as a library for parsing filter lists. For example 138 python-abp can also be used as a library for parsing filter lists. For example
124 to read a filter list (we use Python 3 syntax here but the API is the same): 139 to read a filter list (we use Python 3 syntax here but the API is the same):
125 140
126 from abp.filters import parse_filterlist 141 from abp.filters import parse_filterlist
127 142
128 with open('filterlist.txt') as filterlist: 143 with open('filterlist.txt') as filterlist:
129 for line in parse_filterlist(filterlist): 144 for line in parse_filterlist(filterlist):
130 print(line) 145 print(line)
131 146
132 If `filterlist.txt` contains a filter list: 147 If `filterlist.txt` contains this filter list:
133 148
134 [Adblock Plus 2.0] 149 [Adblock Plus 2.0]
135 ! Title: Example list 150 ! Title: Example list
136 151
137 abc.com,cdf.com##div#ad1 152 abc.com,cdf.com##div#ad1
138 abc.com/ad$image 153 abc.com/ad$image
139 @@/abc\.com/ 154 @@/abc\.com/
140 ...
141 155
142 the output will look something like: 156 the output will look something like:
143 157
144 Header(version='Adblock Plus 2.0') 158 Header(version='Adblock Plus 2.0')
145 Metadata(key='Title', value='Example list') 159 Metadata(key='Title', value='Example list')
146 EmptyLine() 160 EmptyLine()
147 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) 161 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])
148 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) 162 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])
149 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) 163 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])
150 ...
151 164
152 `abp.filters` module also exports a lower-level function for parsing individual 165 The `abp.filters` module also exports a lower-level function for parsing
153 lines of a filter list: `parse_line`. It returns a parsed line object just like 166 individual lines of a filter list: `parse_line`. It returns a parsed line
154 the items in the iterator returned by `parse_filterlist`. 167 object just like the items in the iterator returned by `parse_filterlist`.
155 168
156 For further information on the library API use `help()` on `abp.filters` and 169 For further information on the library API use `help()` on `abp.filters` and
157 its contents in interactive Python session, read the docstrings or look at the 170 its contents in an interactive Python session, read the docstrings, or look at
158 tests for some usage examples. 171 the tests for some usage examples.
159 172
173 <a id="testing"></a>
160 ## Testing 174 ## Testing
161 175
162 Unit tests for `python-abp` are located in the `/tests` directory. 176 Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2]
163 [Pytest][2] is used for quickly running the tests 177 is used for quickly running the tests during development. [Tox][3] is used for
164 during development. 178 testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
165 [Tox][3] is used for testing in different 179 quality reporting.
166 environments (Python 2.7, Python 3.5+ and PyPy) and code quality
167 reporting.
168 180
169 In order to execute the tests, first create and activate development 181 In order to execute the tests, first create and activate a development
170 virtualenv: 182 virtualenv:
171 183
172 $ python setup.py devenv 184 $ python setup.py devenv
173 $ . devenv/bin/activate 185 $ . devenv/bin/activate
174 186
175 With the development virtualenv activated use pytest for a quick test run: 187 With the development virtualenv activated use pytest for a quick test run:
176 188
177 (devenv) $ pytest tests 189 (devenv) $ pytest tests
178 190
179 and tox for a comprehensive report: 191 and tox for a comprehensive report:
180 192
181 (devenv) $ tox 193 (devenv) $ tox
182 194
195 <a id="development"></a>
183 ## Development 196 ## Development
184 197
185 When adding new functionality, add tests for it (preferably first). Code 198 When adding new functionality, add tests for it (preferably first). If some
186 coverage (as measured by `tox -e qa`) should not decrease and the tests 199 code will never be reached on a certain version of Python, it may be exempted
187 should pass in all Tox environments. 200 from coverage tests by adding a comment, e.g. `# pragma: no py2 cover`.
188 201
189 All public functions, classes and methods should have docstrings compliant with 202 All public functions, classes and methods should have docstrings compliant with
190 [NumPy/SciPy documentation guide][4]. One exception is the constructors of 203 [NumPy/SciPy documentation guide][4]. One exception is the constructors of
191 classes that the user is not expected to instantiate (such as exceptions). 204 classes that the user is not expected to instantiate (such as exceptions).
192 205
193 206 <a id="r"></a>
194 ## Using the library with R 207 ## Using the library with R
195 208
196 Clone the repo to you local machine. Then create a virtualenv and install 209 Clone the repo to you local machine. Then create a virtualenv and install
197 python abp there: 210 python abp there:
198 211
199 $ cd python-abp 212 $ cd python-abp
200 $ virtualenv env 213 $ virtualenv env
201 $ pip install --upgrade . 214 $ pip install --upgrade .
202 215
203 Then import it with `reticulate` in R: 216 Then import it with `reticulate` in R:
204 217
205 > library(reticulate) 218 > library(reticulate)
206 > use_virtualenv("~/python-abp/env", required=TRUE) 219 > use_virtualenv("~/python-abp/env", required=TRUE)
207 > abp <- import("abp.filters.rpy") 220 > abp <- import("abp.filters.rpy")
208 221
209 Now you can use the functions with `abp$functionname`, e.g. 222 Now you can use the functions with `abp$functionname`, e.g.
210 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")` 223 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")`
211 224
212 225
213 [1]: https://adblockplus.org/filters#special-comments 226 [1]: https://adblockplus.org/filters#special-comments
214 [2]: http://pytest.org/ 227 [2]: http://pytest.org/
215 [3]: https://tox.readthedocs.org/ 228 [3]: https://tox.readthedocs.org/
216 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt 229 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
217 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/ 230 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/
OLDNEW
« no previous file with comments | « LICENSE ('k') | abp/filters/sources.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld