Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Delta Between Two Patch Sets: README.rst

Issue 29968569: Issue 4014 - Publish python-abp on PyPI (Closed) Base URL: https://hg.adblockplus.org/python-abp/
Left Patch Set: Address comments on PS2, add README ToC Created Dec. 29, 2018, 1:29 a.m.
Right Patch Set: Address comment on PS5 Created Jan. 3, 2019, 9:32 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
Right: Side by side diff | Download
« no previous file with change/comment | « MANIFEST.in ('k') | abp/filters/sources.py » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
(no file at all)
1 # python-abp 1 python-abp
2 2 ==========
3 This repository contains a library for working with Adblock Plus filter lists 3
4 and the script that is used for building Adblock Plus filter lists from the 4 This repository contains a library for working with Adblock Plus filter lists,
5 form in which they are authored into the format suitable for consumption by the 5 a script for rendering diffs between filter lists, and the script that is used
6 adblocking software. 6 for building Adblock Plus filter lists from the form in which they are authored
7 7 into the format suitable for consumption by the adblocking software (aka
8 ## Installation 8 rendering).
9
10 .. contents::
11
12
13 Installation
14 ------------
9 15
10 Prerequisites: 16 Prerequisites:
11 17
12 * Linux, Mac OS X or Windows (any modern Unix should work too), 18 * Linux, Mac OS X or Windows (any modern Unix should work too),
13 * Python (2.7 or 3.5+), 19 * Python (2.7 or 3.5+),
14 * pip. 20 * pip.
15 21
16 To install: 22 To install::
17 23
18 $ pip install -U python-abp 24 $ pip install --upgrade python-abp
19 25
20 ## Rendering of filter lists 26
27 Rendering of filter lists
28 -------------------------
21 29
22 The filter lists are originally authored in relatively smaller parts focused 30 The filter lists are originally authored in relatively smaller parts focused
23 on a particular type of filters, related to a specific topic or relevant 31 on particular types of filters, related to a specific topic or relevant for a
24 for particular geographical area. 32 particular geographical area.
25 We call these parts _filter list fragments_ (or just _fragments_) 33 We call these parts *filter list fragments* (or just *fragments*) to
26 to distinguish them from full filter lists that are 34 distinguish them from full filter lists that are consumed by the adblocking
27 consumed by the adblocking software such as Adblock Plus. 35 software such as Adblock Plus.
28 36
29 Rendering is a process that combines filter list fragments into a filter list. 37 Rendering is a process that combines filter list fragments into a filter list.
30 It starts with one fragment that can include other ones and so forth. 38 It starts with one fragment that can include other ones and so forth.
31 The produced filter list is marked with a [version and a timestamp][1]. 39 The produced filter list is marked with a `version and a timestamp <https://adbl ockplus.org/filters#special-comments>`_.
32 40
33 Python-abp contains a script that can do this called `flrender`: 41 Python-abp contains a script that can do this called ``flrender``::
34 42
35 $ flrender fragment.txt filterlist.txt 43 $ flrender fragment.txt filterlist.txt
36 44
37 This will take the top level fragment in `fragment.txt`, render it and save into 45
38 `filterlist.txt`. 46 This will take the top level fragment in ``fragment.txt``, render it and save it
39 47 into ``filterlist.txt``.
40 The `flrender` script can also be used by only specifying `fragment.txt`: 48
41 49 The ``flrender`` script can also be used by only specifying ``fragment.txt``::
42 $flrender fragment.txt 50
43 51 $ flrender fragment.txt
44 in which case the rendering result will be sent to `stdout`. Moreover, when 52
45 it's run with no positional arguments: 53
46 54 in which case the rendering result will be sent to ``stdout``. Moreover, when
47 $flrender 55 it's run with no positional arguments::
48 56
49 it will read from `stdin` and send the results to `stdout`. 57 $ flrender
58
59
60 it will read from ``stdin`` and send the results to ``stdout``.
50 61
51 Fragments might reference other fragments that should be included into them. 62 Fragments might reference other fragments that should be included into them.
52 The references come in two forms: http(s) includes and local includes: 63 The references come in two forms: http(s) includes and local includes::
53 64
54 %include http://www.server.org/dir/list.txt% 65 %include http://www.server.org/dir/list.txt%
55 %include easylist:easylist/easylist_general_block.txt% 66 %include easylist:easylist/easylist_general_block.txt%
56 67
57 The first instruction contains a URL that will be fetched and inserted at the 68
58 point of reference. 69 The http include contains a URL that will be fetched and inserted at the point
59 The second one contains a path inside easylist repository. 70 of reference.
60 `flrender` needs to be able to find a copy of the repository on the local 71 The local include contains a path inside the easylist repository.
61 filesystem. We use `-i` option to point it to to the right directory: 72 ``flrender`` needs to be able to find a copy of the repository on the local
73 filesystem. We use ``-i`` option to point it to to the right directory::
62 74
63 $ flrender -i easylist=/home/abc/easylist input.txt output.txt 75 $ flrender -i easylist=/home/abc/easylist input.txt output.txt
64 76
65 Now the second reference above will be resolved to 77
66 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will 78 Now the local include referenced above will be resolved to:
67 be loaded from this file. 79 ``/home/abc/easylist/easylist/easylist_general_block.txt``
80 and the fragment will be loaded from this file.
68 81
69 Directories that contain filter list fragments that are used during rendering 82 Directories that contain filter list fragments that are used during rendering
70 are called sources. 83 are called sources.
71 They are normally working copies of the repositories that contain filter list 84 They are normally working copies of the repositories that contain filter list
72 fragments. 85 fragments.
73 Each source is identified by a name: that's the part that comes before ":" 86 Each source is identified by a name: that's the part that comes before ":" in
74 in the include instruction and it should be the same as what comes before "=" 87 the include instruction and it should be the same as what comes before "=" in
75 in the `-i` option. 88 the ``-i`` option.
76 89
77 Commonly used sources have generally accepted names. For example the main 90 Commonly used sources have generally accepted names. For example the main
78 EasyList repository is referred to as `easylist`. 91 EasyList repository is referred to as ``easylist``.
79 If you don't know all the source names that are needed to render some list, 92 If you don't know all the source names that are needed to render some list,
80 just run `flrender` and it will report what it's missing: 93 just run ``flrender`` and it will report what it's missing::
81 94
82 $ flrender easylist.txt output/easylist.txt 95 $ flrender easylist.txt output/easylist.txt
83 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener 96 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
84 al_block.txt' from 'easylist.txt' 97 al_block.txt' from 'easylist.txt'
85 98
86 You can clone the necessary repositories to a local directory and add `-i` 99
100 You can clone the necessary repositories to a local directory and add ``-i``
87 options accordingly. 101 options accordingly.
88 102
89 ## Rendering diffs 103
104 Generating diffs
105 ----------------
90 106
91 A diff allows a client running ad blocking software such as Adblock Plus to 107 A diff allows a client running ad blocking software such as Adblock Plus to
92 update the filter lists incrementally, instead of downloading a new copy of a 108 update the filter lists incrementally, instead of downloading a new copy of a
93 full list during each update. This is meant to lessen the amount of resources 109 full list during each update. This is meant to lessen the amount of resources
94 used when updating filter lists (e.g. network data, memory usage, battery 110 used when updating filter lists (e.g. network data, memory usage, battery
95 consumption, etc.), allowing clients to update their lists more frequently using 111 consumption, etc.), allowing clients to update their lists more frequently
96 less resources. 112 using less resources.
97 113
98 Python-abp contains a script called `fldiff` that will find the diff between the 114 python-abp contains a script called ``fldiff`` that will find the diff between
99 latest filter list, and any number of previous filter lists: 115 the latest filter list, and any number of previous filter lists::
100 116
101 $ fldiff -o diffs/easylist easylist.txt archive/* 117 $ fldiff -o diffs/easylist/ easylist.txt archive/*
102 118
103 where `-o diffs/easylist` is the (optional) output directory where the diffs 119
104 should be written, `easylist.txt` is the most recent version of the filter list, 120 where ``-o diffs/easylist/`` is the (optional) output directory where the diffs
105 and `archive/*` is the directory where all the archived filter lists are. When 121 should be written, ``easylist.txt`` is the most recent version of the filter
106 called like this, the shell should automatically expand the `archive/*` 122 list, and ``archive/*`` is the directory where all the archived filter lists are .
123 When called like this, the shell should automatically expand the ``archive/*``
107 directory, giving the script each of the filenames separately. 124 directory, giving the script each of the filenames separately.
108 125
109 In the above example, the output of each archived `list[version].txt` will be 126 In the above example, the output of each archived ``list[version].txt`` will be
110 written to `diffs/diff[version].txt`. If the output argument is omitted, the 127 written to ``diffs/diff[version].txt``. If the output argument is omitted, the
111 diffs will be written to the current directory. 128 diffs will be written to the current directory.
112 129
113 The script produces three types of lines, as specified in the [technical 130 The script produces three types of lines, as specified in the `technical
114 specification][5]: 131 specification <https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nq bYYlGH72sbsSgQ/>`_:
115 132
116 * Special comments of the form `! <name>:[ <value>]` 133
117 * Added filters of the form `+ <filter-text>` 134 * Special comments of the form ``! <name>:[ <value>]``
118 * Removed filters of the form `- <filter-text>` 135 * Added filters of the form ``+ <filter-text>``
119 136 * Removed filters of the form ``- <filter-text>``
120 137
121 ## Library API 138
122 139 Library API
123 Python-abp can also be used as a library for parsing filter lists. For example 140 -----------
141
142 python-abp can also be used as a library for parsing filter lists. For example
124 to read a filter list (we use Python 3 syntax here but the API is the same): 143 to read a filter list (we use Python 3 syntax here but the API is the same):
144
145 .. code-block:: python
125 146
126 from abp.filters import parse_filterlist 147 from abp.filters import parse_filterlist
127 148
128 with open('filterlist.txt') as filterlist: 149 with open('filterlist.txt') as filterlist:
129 for line in parse_filterlist(filterlist): 150 for line in parse_filterlist(filterlist):
130 print(line) 151 print(line)
131 152
132 If `filterlist.txt` contains a filter list: 153
154 If ``filterlist.txt`` contains this filter list::
133 155
134 [Adblock Plus 2.0] 156 [Adblock Plus 2.0]
135 ! Title: Example list 157 ! Title: Example list
136 158
137 abc.com,cdf.com##div#ad1 159 abc.com,cdf.com##div#ad1
138 abc.com/ad$image 160 abc.com/ad$image
139 @@/abc\.com/ 161 @@/abc\.com/
140 ... 162
141 163
142 the output will look something like: 164 the output will look something like:
165
166 .. code-block:: python
143 167
144 Header(version='Adblock Plus 2.0') 168 Header(version='Adblock Plus 2.0')
145 Metadata(key='Title', value='Example list') 169 Metadata(key='Title', value='Example list')
146 EmptyLine() 170 EmptyLine()
147 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) 171 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])
148 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) 172 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])
149 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) 173 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])
150 ... 174
151 175
152 `abp.filters` module also exports a lower-level function for parsing individual 176 The ``abp.filters`` module also exports a lower-level function for parsing
153 lines of a filter list: `parse_line`. It returns a parsed line object just like 177 individual lines of a filter list: ``parse_line``. It returns a parsed line
154 the items in the iterator returned by `parse_filterlist`. 178 object just like the items in the iterator returned by ``parse_filterlist``.
155 179
156 For further information on the library API use `help()` on `abp.filters` and 180 For further information on the library API use ``help()`` on ``abp.filters`` and
157 its contents in interactive Python session, read the docstrings or look at the 181 its contents in an interactive Python session, read the docstrings, or look at
158 tests for some usage examples. 182 the tests for some usage examples.
159 183
160 ## Testing 184
161 185 Testing
162 Unit tests for `python-abp` are located in the `/tests` directory. 186 -------
163 [Pytest][2] is used for quickly running the tests 187
164 during development. 188 Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest < http://pytest.org/>`_
165 [Tox][3] is used for testing in different 189 is used for quickly running the tests during development. `Tox <https://tox.read thedocs.org/>`_ is used for
166 environments (Python 2.7, Python 3.5+ and PyPy) and code quality 190 testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
167 reporting. 191 quality reporting.
168 192
169 In order to execute the tests, first create and activate development 193 Use tox for a comprehensive report of unit tests and test coverage::
170 virtualenv: 194
171 195 $ tox
172 $ python setup.py devenv 196
173 $ . devenv/bin/activate 197
174 198 Development
175 With the development virtualenv activated use pytest for a quick test run: 199 -----------
176 200
177 (devenv) $ pytest tests 201 When adding new functionality, add tests for it (preferably first). If some
178 202 code will never be reached on a certain version of Python, it may be exempted
179 and tox for a comprehensive report: 203 from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``.
180
181 (devenv) $ tox
182
183 ## Development
184
185 When adding new functionality, add tests for it (preferably first). Code
186 coverage (as measured by `tox -e qa`) should not decrease and the tests
187 should pass in all Tox environments.
188 204
189 All public functions, classes and methods should have docstrings compliant with 205 All public functions, classes and methods should have docstrings compliant with
190 [NumPy/SciPy documentation guide][4]. One exception is the constructors of 206 `NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc /HOWTO_DOCUMENT.rst.txt>`_.
191 classes that the user is not expected to instantiate (such as exceptions). 207 One exception is the constructors of classes that the user is not expected to
192 208 instantiate (such as exceptions).
193 209
194 ## Using the library with R 210
195 211 Using the library with R
196 Clone the repo to you local machine. Then create a virtualenv and install 212 ------------------------
197 python abp there: 213
198 214 Clone the repo to your local machine. Then create a virtualenv and install
199 $ cd python-abp 215 python-abp there::
200 $ virtualenv env 216
201 $ pip install --upgrade . 217 $ cd python-abp
202 218 $ virtualenv env
203 Then import it with `reticulate` in R: 219 $ pip install --upgrade .
204 220
205 > library(reticulate) 221
206 > use_virtualenv("~/python-abp/env", required=TRUE) 222 Then import it with ``reticulate`` in R:
207 > abp <- import("abp.filters.rpy") 223
208 224 .. code-block:: R
209 Now you can use the functions with `abp$functionname`, e.g. 225
210 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")` 226 > library(reticulate)
211 227 > use_virtualenv("~/python-abp/env", required=TRUE)
212 228 > abp <- import("abp.filters.rpy")
213 [1]: https://adblockplus.org/filters#special-comments 229
214 [2]: http://pytest.org/ 230 Now you can use the functions with ``abp$functionname``, e.g.
215 [3]: https://tox.readthedocs.org/ 231 ``abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``.
216 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
217 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/
LEFTRIGHT

Powered by Google App Engine
This is Rietveld