Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: README.md

Issue 29968569: Issue 4014 - Publish python-abp on PyPI (Closed) Base URL: https://hg.adblockplus.org/python-abp/
Patch Set: Address comments on PS1, update README Created Dec. 28, 2018, 9:48 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« LICENSE ('K') | « LICENSE ('k') | abp/filters/sources.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # python-abp 1 # python-abp
2 2
3 This repository contains a library for working with Adblock Plus filter lists 3 This repository contains a library for working with Adblock Plus filter lists,
4 and the script that is used for building Adblock Plus filter lists from the 4 a script for rendering diffs between filter lists, and the script that is used
5 form in which they are authored into the format suitable for consumption by the 5 for building Adblock Plus filter lists from the form in which they are authored
6 adblocking software. 6 into the format suitable for consumption by the adblocking software (aka
7 rendering).
7 8
8 ## Installation 9 ## Installation
9 10
10 Prerequisites: 11 Prerequisites:
11 12
12 * Linux, Mac OS X or Windows (any modern Unix should work too), 13 * Linux, Mac OS X or Windows (any modern Unix should work too),
13 * Python (2.7 or 3.5+), 14 * Python (2.7 or 3.5+),
14 * pip. 15 * pip.
15 16
16 To install: 17 To install:
17 18
18 $ pip install -U python-abp 19 $ pip install --upgrade python-abp
19 20
20 ## Rendering of filter lists 21 ## Rendering of filter lists
21 22
22 The filter lists are originally authored in relatively smaller parts focused 23 The filter lists are originally authored in relatively smaller parts focused
23 on a particular type of filters, related to a specific topic or relevant 24 on particular types of filters, related to a specific topic or relevant for a
24 for particular geographical area. 25 particular geographical area.
25 We call these parts _filter list fragments_ (or just _fragments_) 26 We call these parts _filter list fragments_ (or just _fragments_) to
26 to distinguish them from full filter lists that are 27 distinguish them from full filter lists that are consumed by the adblocking
27 consumed by the adblocking software such as Adblock Plus. 28 software such as Adblock Plus.
28 29
29 Rendering is a process that combines filter list fragments into a filter list. 30 Rendering is a process that combines filter list fragments into a filter list.
30 It starts with one fragment that can include other ones and so forth. 31 It starts with one fragment that can include other ones and so forth.
31 The produced filter list is marked with a [version and a timestamp][1]. 32 The produced filter list is marked with a [version and a timestamp][1].
32 33
33 Python-abp contains a script that can do this called `flrender`: 34 Python-abp contains a script that can do this called `flrender`:
34 35
35 $ flrender fragment.txt filterlist.txt 36 $ flrender fragment.txt filterlist.txt
36 37
37 This will take the top level fragment in `fragment.txt`, render it and save into 38 This will take the top level fragment in `fragment.txt`, render it and save it
38 `filterlist.txt`. 39 into `filterlist.txt`.
39 40
40 The `flrender` script can also be used by only specifying `fragment.txt`: 41 The `flrender` script can also be used by only specifying `fragment.txt`:
41 42
42 $flrender fragment.txt 43 $ flrender fragment.txt
43 44
44 in which case the rendering result will be sent to `stdout`. Moreover, when 45 in which case the rendering result will be sent to `stdout`. Moreover, when
45 it's run with no positional arguments: 46 it's run with no positional arguments:
46 47
47 $flrender 48 $ flrender
48 49
49 it will read from `stdin` and send the results to `stdout`. 50 it will read from `stdin` and send the results to `stdout`.
50 51
51 Fragments might reference other fragments that should be included into them. 52 Fragments might reference other fragments that should be included into them.
52 The references come in two forms: http(s) includes and local includes: 53 The references come in two forms: http(s) includes and local includes:
53 54
54 %include http://www.server.org/dir/list.txt% 55 %include http://www.server.org/dir/list.txt%
55 %include easylist:easylist/easylist_general_block.txt% 56 %include easylist:easylist/easylist_general_block.txt%
56 57
57 The first instruction contains a URL that will be fetched and inserted at the 58 The http include contains a URL that will be fetched and inserted at the point
58 point of reference. 59 of reference.
59 The second one contains a path inside easylist repository. 60 The local include contains a path inside the easylist repository.
60 `flrender` needs to be able to find a copy of the repository on the local 61 `flrender` needs to be able to find a copy of the repository on the local
61 filesystem. We use `-i` option to point it to to the right directory: 62 filesystem. We use `-i` option to point it to to the right directory:
62 63
63 $ flrender -i easylist=/home/abc/easylist input.txt output.txt 64 $ flrender -i easylist=/home/abc/easylist input.txt output.txt
64 65
65 Now the second reference above will be resolved to 66 Now the local include referenced above will be resolved to:
66 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will 67 `/home/abc/easylist/easylist/easylist_general_block.txt`
67 be loaded from this file. 68 and the fragment will be loaded from this file.
68 69
69 Directories that contain filter list fragments that are used during rendering 70 Directories that contain filter list fragments that are used during rendering
70 are called sources. 71 are called sources.
71 They are normally working copies of the repositories that contain filter list 72 They are normally working copies of the repositories that contain filter list
72 fragments. 73 fragments.
73 Each source is identified by a name: that's the part that comes before ":" 74 Each source is identified by a name: that's the part that comes before ":" in
74 in the include instruction and it should be the same as what comes before "=" 75 the include instruction and it should be the same as what comes before "=" in
75 in the `-i` option. 76 the `-i` option.
76 77
77 Commonly used sources have generally accepted names. For example the main 78 Commonly used sources have generally accepted names. For example the main
78 EasyList repository is referred to as `easylist`. 79 EasyList repository is referred to as `easylist`.
79 If you don't know all the source names that are needed to render some list, 80 If you don't know all the source names that are needed to render some list,
80 just run `flrender` and it will report what it's missing: 81 just run `flrender` and it will report what it's missing:
81 82
82 $ flrender easylist.txt output/easylist.txt 83 $ flrender easylist.txt output/easylist.txt
83 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener 84 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
84 al_block.txt' from 'easylist.txt' 85 al_block.txt' from 'easylist.txt'
85 86
86 You can clone the necessary repositories to a local directory and add `-i` 87 You can clone the necessary repositories to a local directory and add `-i`
87 options accordingly. 88 options accordingly.
88 89
89 ## Rendering diffs 90 ## Rendering diffs
90 91
91 A diff allows a client running ad blocking software such as Adblock Plus to 92 A diff allows a client running ad blocking software such as Adblock Plus to
92 update the filter lists incrementally, instead of downloading a new copy of a 93 update the filter lists incrementally, instead of downloading a new copy of a
93 full list during each update. This is meant to lessen the amount of resources 94 full list during each update. This is meant to lessen the amount of resources
94 used when updating filter lists (e.g. network data, memory usage, battery 95 used when updating filter lists (e.g. network data, memory usage, battery
95 consumption, etc.), allowing clients to update their lists more frequently using 96 consumption, etc.), allowing clients to update their lists more frequently
96 less resources. 97 using less resources.
97 98
98 Python-abp contains a script called `fldiff` that will find the diff between the 99 python-abp contains a script called `fldiff` that will find the diff between
99 latest filter list, and any number of previous filter lists: 100 the latest filter list, and any number of previous filter lists:
100 101
101 $ fldiff -o diffs/easylist easylist.txt archive/* 102 $ fldiff -o diffs/easylist/ easylist.txt archive/*
102 103
103 where `-o diffs/easylist` is the (optional) output directory where the diffs 104 where `-o diffs/easylist/` is the (optional) output directory where the diffs
104 should be written, `easylist.txt` is the most recent version of the filter list, 105 should be written, `easylist.txt` is the most recent version of the filter
105 and `archive/*` is the directory where all the archived filter lists are. When 106 list, and `archive/*` is the directory where all the archived filter lists are.
106 called like this, the shell should automatically expand the `archive/*` 107 When called like this, the shell should automatically expand the `archive/*`
107 directory, giving the script each of the filenames separately. 108 directory, giving the script each of the filenames separately.
108 109
109 In the above example, the output of each archived `list[version].txt` will be 110 In the above example, the output of each archived `list[version].txt` will be
110 written to `diffs/diff[version].txt`. If the output argument is omitted, the 111 written to `diffs/diff[version].txt`. If the output argument is omitted, the
111 diffs will be written to the current directory. 112 diffs will be written to the current directory.
112 113
113 The script produces three types of lines, as specified in the [technical 114 The script produces three types of lines, as specified in the [technical
114 specification][5]: 115 specification][5]:
115 116
116 * Special comments of the form `! <name>:[ <value>]` 117 * Special comments of the form `! <name>:[ <value>]`
117 * Added filters of the form `+ <filter-text>` 118 * Added filters of the form `+ <filter-text>`
118 * Removed filters of the form `- <filter-text>` 119 * Removed filters of the form `- <filter-text>`
119 120
120
121 ## Library API 121 ## Library API
122 122
123 Python-abp can also be used as a library for parsing filter lists. For example 123 python-abp can also be used as a library for parsing filter lists. For example
124 to read a filter list (we use Python 3 syntax here but the API is the same): 124 to read a filter list (we use Python 3 syntax here but the API is the same):
125 125
126 from abp.filters import parse_filterlist 126 from abp.filters import parse_filterlist
127 127
128 with open('filterlist.txt') as filterlist: 128 with open('filterlist.txt') as filterlist:
129 for line in parse_filterlist(filterlist): 129 for line in parse_filterlist(filterlist):
130 print(line) 130 print(line)
131 131
132 If `filterlist.txt` contains a filter list: 132 If `filterlist.txt` contains this filter list:
133 133
134 [Adblock Plus 2.0] 134 [Adblock Plus 2.0]
135 ! Title: Example list 135 ! Title: Example list
136 136
137 abc.com,cdf.com##div#ad1 137 abc.com,cdf.com##div#ad1
138 abc.com/ad$image 138 abc.com/ad$image
139 @@/abc\.com/ 139 @@/abc\.com/
140 ...
141 140
142 the output will look something like: 141 the output will look something like:
143 142
144 Header(version='Adblock Plus 2.0') 143 Header(version='Adblock Plus 2.0')
145 Metadata(key='Title', value='Example list') 144 Metadata(key='Title', value='Example list')
146 EmptyLine() 145 EmptyLine()
147 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) 146 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])
148 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) 147 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])
149 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) 148 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])
150 ...
151 149
152 `abp.filters` module also exports a lower-level function for parsing individual 150 The `abp.filters` module also exports a lower-level function for parsing
153 lines of a filter list: `parse_line`. It returns a parsed line object just like 151 individual lines of a filter list: `parse_line`. It returns a parsed line
154 the items in the iterator returned by `parse_filterlist`. 152 object just like the items in the iterator returned by `parse_filterlist`.
155 153
156 For further information on the library API use `help()` on `abp.filters` and 154 For further information on the library API use `help()` on `abp.filters` and
157 its contents in interactive Python session, read the docstrings or look at the 155 its contents in an interactive Python session, read the docstrings, or look at
158 tests for some usage examples. 156 the tests for some usage examples.
159 157
160 ## Testing 158 ## Testing
161 159
162 Unit tests for `python-abp` are located in the `/tests` directory. 160 Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2]
163 [Pytest][2] is used for quickly running the tests 161 is used for quickly running the tests during development. [Tox][3] is used for
164 during development. 162 testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
165 [Tox][3] is used for testing in different 163 quality reporting.
166 environments (Python 2.7, Python 3.5+ and PyPy) and code quality
167 reporting.
168 164
169 In order to execute the tests, first create and activate development 165 In order to execute the tests, first create and activate a development
170 virtualenv: 166 virtualenv:
171 167
172 $ python setup.py devenv 168 $ python setup.py devenv
173 $ . devenv/bin/activate 169 $ . devenv/bin/activate
174 170
175 With the development virtualenv activated use pytest for a quick test run: 171 With the development virtualenv activated use pytest for a quick test run:
176 172
177 (devenv) $ pytest tests 173 (devenv) $ pytest tests
178 174
179 and tox for a comprehensive report: 175 and tox for a comprehensive report:
180 176
181 (devenv) $ tox 177 (devenv) $ tox
182 178
183 ## Development 179 ## Development
184 180
185 When adding new functionality, add tests for it (preferably first). Code 181 When adding new functionality, add tests for it (preferably first). Code
186 coverage (as measured by `tox -e qa`) should not decrease and the tests 182 coverage (as measured by `tox -e coverage2` and `tox -e coverage3`) should not
187 should pass in all Tox environments. 183 decrease and the tests should pass in all tox environments.
188 184
189 All public functions, classes and methods should have docstrings compliant with 185 All public functions, classes and methods should have docstrings compliant with
190 [NumPy/SciPy documentation guide][4]. One exception is the constructors of 186 [NumPy/SciPy documentation guide][4]. One exception is the constructors of
191 classes that the user is not expected to instantiate (such as exceptions). 187 classes that the user is not expected to instantiate (such as exceptions).
192 188
193
194 ## Using the library with R 189 ## Using the library with R
195 190
196 Clone the repo to you local machine. Then create a virtualenv and install 191 Clone the repo to you local machine. Then create a virtualenv and install
197 python abp there: 192 python abp there:
198 193
199 $ cd python-abp 194 $ cd python-abp
200 $ virtualenv env 195 $ virtualenv env
201 $ pip install --upgrade . 196 $ pip install --upgrade .
202 197
203 Then import it with `reticulate` in R: 198 Then import it with `reticulate` in R:
204 199
205 > library(reticulate) 200 > library(reticulate)
206 > use_virtualenv("~/python-abp/env", required=TRUE) 201 > use_virtualenv("~/python-abp/env", required=TRUE)
207 > abp <- import("abp.filters.rpy") 202 > abp <- import("abp.filters.rpy")
208 203
209 Now you can use the functions with `abp$functionname`, e.g. 204 Now you can use the functions with `abp$functionname`, e.g.
210 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")` 205 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")`
211 206
212 207
213 [1]: https://adblockplus.org/filters#special-comments 208 [1]: https://adblockplus.org/filters#special-comments
214 [2]: http://pytest.org/ 209 [2]: http://pytest.org/
215 [3]: https://tox.readthedocs.org/ 210 [3]: https://tox.readthedocs.org/
216 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt 211 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
217 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/ 212 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/
OLDNEW
« LICENSE ('K') | « LICENSE ('k') | abp/filters/sources.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld