| Index: README.md |
| =================================================================== |
| --- a/README.md |
| +++ b/README.md |
| @@ -1,10 +1,23 @@ |
| + |
| # python-abp |
| -This repository contains a library for working with Adblock Plus filter lists |
| -and the script that is used for building Adblock Plus filter lists from the |
| -form in which they are authored into the format suitable for consumption by the |
| -adblocking software. |
| +This repository contains a library for working with Adblock Plus filter lists, |
| +a script for rendering diffs between filter lists, and the script that is used |
| +for building Adblock Plus filter lists from the form in which they are authored |
| +into the format suitable for consumption by the adblocking software (aka |
| +rendering). |
| +## Table of Contents |
| + |
| +- [Installation](#installation) |
| +- [Rendering of filter lists](#rendering) |
| +- [Generating diffs](#diffs) |
| +- [Library API](#library) |
| +- [Testing](#testing) |
| +- [Development](#development) |
| +- [Using the library with R](#r) |
| + |
| +<a id="installation"></a> |
| ## Installation |
| Prerequisites: |
| @@ -15,16 +28,17 @@ |
| To install: |
| - $ pip install -U python-abp |
| + $ pip install --upgrade python-abp |
| +<a id="rendering"></a> |
|
Sebastian Noack
2018/12/29 03:01:24
Injecting those HTML snippets across the Markdown-
Sebastian Noack
2018/12/29 18:29:16
I had a go translating the README to restructuredT
rhowell
2019/01/03 04:42:27
Yeah, looks good, thanks. I added a newline here a
|
| ## Rendering of filter lists |
| The filter lists are originally authored in relatively smaller parts focused |
| -on a particular type of filters, related to a specific topic or relevant |
| -for particular geographical area. |
| -We call these parts _filter list fragments_ (or just _fragments_) |
| -to distinguish them from full filter lists that are |
| -consumed by the adblocking software such as Adblock Plus. |
| +on particular types of filters, related to a specific topic or relevant for a |
| +particular geographical area. |
| +We call these parts _filter list fragments_ (or just _fragments_) to |
| +distinguish them from full filter lists that are consumed by the adblocking |
| +software such as Adblock Plus. |
| Rendering is a process that combines filter list fragments into a filter list. |
| It starts with one fragment that can include other ones and so forth. |
| @@ -34,17 +48,17 @@ |
| $ flrender fragment.txt filterlist.txt |
| -This will take the top level fragment in `fragment.txt`, render it and save into |
| -`filterlist.txt`. |
| +This will take the top level fragment in `fragment.txt`, render it and save it |
| +into `filterlist.txt`. |
| The `flrender` script can also be used by only specifying `fragment.txt`: |
| - $flrender fragment.txt |
| + $ flrender fragment.txt |
| in which case the rendering result will be sent to `stdout`. Moreover, when |
| it's run with no positional arguments: |
| - $flrender |
| + $ flrender |
| it will read from `stdin` and send the results to `stdout`. |
| @@ -54,25 +68,25 @@ |
| %include http://www.server.org/dir/list.txt% |
| %include easylist:easylist/easylist_general_block.txt% |
| -The first instruction contains a URL that will be fetched and inserted at the |
| -point of reference. |
| -The second one contains a path inside easylist repository. |
| +The http include contains a URL that will be fetched and inserted at the point |
| +of reference. |
| +The local include contains a path inside the easylist repository. |
| `flrender` needs to be able to find a copy of the repository on the local |
| filesystem. We use `-i` option to point it to to the right directory: |
| $ flrender -i easylist=/home/abc/easylist input.txt output.txt |
| -Now the second reference above will be resolved to |
| -`/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will |
| -be loaded from this file. |
| +Now the local include referenced above will be resolved to: |
| +`/home/abc/easylist/easylist/easylist_general_block.txt` |
| +and the fragment will be loaded from this file. |
| Directories that contain filter list fragments that are used during rendering |
| are called sources. |
| They are normally working copies of the repositories that contain filter list |
| fragments. |
| -Each source is identified by a name: that's the part that comes before ":" |
| -in the include instruction and it should be the same as what comes before "=" |
| -in the `-i` option. |
| +Each source is identified by a name: that's the part that comes before ":" in |
| +the include instruction and it should be the same as what comes before "=" in |
| +the `-i` option. |
| Commonly used sources have generally accepted names. For example the main |
| EasyList repository is referred to as `easylist`. |
| @@ -86,24 +100,25 @@ |
| You can clone the necessary repositories to a local directory and add `-i` |
| options accordingly. |
| -## Rendering diffs |
| +<a id="diffs"></a> |
| +## Generating diffs |
| A diff allows a client running ad blocking software such as Adblock Plus to |
| update the filter lists incrementally, instead of downloading a new copy of a |
| full list during each update. This is meant to lessen the amount of resources |
| used when updating filter lists (e.g. network data, memory usage, battery |
| -consumption, etc.), allowing clients to update their lists more frequently using |
| -less resources. |
| +consumption, etc.), allowing clients to update their lists more frequently |
| +using less resources. |
| -Python-abp contains a script called `fldiff` that will find the diff between the |
| -latest filter list, and any number of previous filter lists: |
| +python-abp contains a script called `fldiff` that will find the diff between |
| +the latest filter list, and any number of previous filter lists: |
| - $ fldiff -o diffs/easylist easylist.txt archive/* |
| + $ fldiff -o diffs/easylist/ easylist.txt archive/* |
| -where `-o diffs/easylist` is the (optional) output directory where the diffs |
| -should be written, `easylist.txt` is the most recent version of the filter list, |
| -and `archive/*` is the directory where all the archived filter lists are. When |
| -called like this, the shell should automatically expand the `archive/*` |
| +where `-o diffs/easylist/` is the (optional) output directory where the diffs |
| +should be written, `easylist.txt` is the most recent version of the filter |
| +list, and `archive/*` is the directory where all the archived filter lists are. |
| +When called like this, the shell should automatically expand the `archive/*` |
| directory, giving the script each of the filenames separately. |
| In the above example, the output of each archived `list[version].txt` will be |
| @@ -117,10 +132,10 @@ |
| * Added filters of the form `+ <filter-text>` |
| * Removed filters of the form `- <filter-text>` |
| - |
| +<a id="library"></a> |
| ## Library API |
| -Python-abp can also be used as a library for parsing filter lists. For example |
| +python-abp can also be used as a library for parsing filter lists. For example |
| to read a filter list (we use Python 3 syntax here but the API is the same): |
| from abp.filters import parse_filterlist |
| @@ -129,7 +144,7 @@ |
| for line in parse_filterlist(filterlist): |
| print(line) |
| -If `filterlist.txt` contains a filter list: |
| +If `filterlist.txt` contains this filter list: |
| [Adblock Plus 2.0] |
| ! Title: Example list |
| @@ -137,7 +152,6 @@ |
| abc.com,cdf.com##div#ad1 |
| abc.com/ad$image |
| @@/abc\.com/ |
| - ... |
| the output will look something like: |
| @@ -147,26 +161,24 @@ |
| Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'div#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', True)])]) |
| Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)]) |
| Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[]) |
| - ... |
| -`abp.filters` module also exports a lower-level function for parsing individual |
| -lines of a filter list: `parse_line`. It returns a parsed line object just like |
| -the items in the iterator returned by `parse_filterlist`. |
| +The `abp.filters` module also exports a lower-level function for parsing |
| +individual lines of a filter list: `parse_line`. It returns a parsed line |
| +object just like the items in the iterator returned by `parse_filterlist`. |
| For further information on the library API use `help()` on `abp.filters` and |
| -its contents in interactive Python session, read the docstrings or look at the |
| -tests for some usage examples. |
| +its contents in an interactive Python session, read the docstrings, or look at |
| +the tests for some usage examples. |
| +<a id="testing"></a> |
| ## Testing |
| -Unit tests for `python-abp` are located in the `/tests` directory. |
| -[Pytest][2] is used for quickly running the tests |
| -during development. |
| -[Tox][3] is used for testing in different |
| -environments (Python 2.7, Python 3.5+ and PyPy) and code quality |
| -reporting. |
| +Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2] |
| +is used for quickly running the tests during development. [Tox][3] is used for |
| +testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code |
| +quality reporting. |
| -In order to execute the tests, first create and activate development |
| +In order to execute the tests, first create and activate a development |
| virtualenv: |
| $ python setup.py devenv |
| @@ -180,17 +192,18 @@ |
| (devenv) $ tox |
|
Sebastian Noack
2019/01/03 05:22:03
Running tox is still relevant, but you would run i
rhowell
2019/01/03 21:32:28
Done.
|
| +<a id="development"></a> |
| ## Development |
| -When adding new functionality, add tests for it (preferably first). Code |
| -coverage (as measured by `tox -e qa`) should not decrease and the tests |
| -should pass in all Tox environments. |
| +When adding new functionality, add tests for it (preferably first). If some |
| +code will never be reached on a certain version of Python, it may be exempted |
| +from coverage tests by adding a comment, e.g. `# pragma: no py2 cover`. |
| All public functions, classes and methods should have docstrings compliant with |
| [NumPy/SciPy documentation guide][4]. One exception is the constructors of |
| classes that the user is not expected to instantiate (such as exceptions). |
| - |
| +<a id="r"></a> |
| ## Using the library with R |
| Clone the repo to you local machine. Then create a virtualenv and install |