Index: README.md |
=================================================================== |
--- a/README.md |
+++ b/README.md |
@@ -1,10 +1,23 @@ |
+ |
# python-abp |
-This repository contains a library for working with Adblock Plus filter lists |
-and the script that is used for building Adblock Plus filter lists from the |
-form in which they are authored into the format suitable for consumption by the |
-adblocking software. |
+This repository contains a library for working with Adblock Plus filter lists, |
+a script for rendering diffs between filter lists, and the script that is used |
+for building Adblock Plus filter lists from the form in which they are authored |
+into the format suitable for consumption by the adblocking software (aka |
+rendering). |
+## Table of Contents |
+ |
+- [Installation](#installation) |
+- [Rendering of filter lists](#rendering) |
+- [Generating diffs](#diffs) |
+- [Library API](#library) |
+- [Testing](#testing) |
+- [Development](#development) |
+- [Using the library with R](#r) |
+ |
+<a id="installation"></a> |
## Installation |
Prerequisites: |
@@ -15,16 +28,17 @@ |
To install: |
- $ pip install -U python-abp |
+ $ pip install --upgrade python-abp |
+<a id="rendering"></a> |
## Rendering of filter lists |
The filter lists are originally authored in relatively smaller parts focused |
-on a particular type of filters, related to a specific topic or relevant |
-for particular geographical area. |
-We call these parts _filter list fragments_ (or just _fragments_) |
-to distinguish them from full filter lists that are |
-consumed by the adblocking software such as Adblock Plus. |
+on particular types of filters, related to a specific topic or relevant for a |
+particular geographical area. |
+We call these parts _filter list fragments_ (or just _fragments_) to |
+distinguish them from full filter lists that are consumed by the adblocking |
+software such as Adblock Plus. |
Rendering is a process that combines filter list fragments into a filter list. |
It starts with one fragment that can include other ones and so forth. |
@@ -34,17 +48,17 @@ |
$ flrender fragment.txt filterlist.txt |
-This will take the top level fragment in `fragment.txt`, render it and save into |
-`filterlist.txt`. |
+This will take the top level fragment in `fragment.txt`, render it and save it |
+into `filterlist.txt`. |
The `flrender` script can also be used by only specifying `fragment.txt`: |
- $flrender fragment.txt |
+ $ flrender fragment.txt |
in which case the rendering result will be sent to `stdout`. Moreover, when |
it's run with no positional arguments: |
- $flrender |
+ $ flrender |
it will read from `stdin` and send the results to `stdout`. |
@@ -54,25 +68,25 @@ |
%include http://www.server.org/dir/list.txt% |
%include easylist:easylist/easylist_general_block.txt% |
-The first instruction contains a URL that will be fetched and inserted at the |
-point of reference. |
-The second one contains a path inside easylist repository. |
+The http include contains a URL that will be fetched and inserted at the point |
+of reference. |
+The local include contains a path inside the easylist repository. |
`flrender` needs to be able to find a copy of the repository on the local |
filesystem. We use `-i` option to point it to to the right directory: |
$ flrender -i easylist=/home/abc/easylist input.txt output.txt |
-Now the second reference above will be resolved to |
-`/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will |
-be loaded from this file. |
+Now the local include referenced above will be resolved to: |
+`/home/abc/easylist/easylist/easylist_general_block.txt` |
+and the fragment will be loaded from this file. |
Directories that contain filter list fragments that are used during rendering |
are called sources. |
They are normally working copies of the repositories that contain filter list |
fragments. |
-Each source is identified by a name: that's the part that comes before ":" |
-in the include instruction and it should be the same as what comes before "=" |
-in the `-i` option. |
+Each source is identified by a name: that's the part that comes before ":" in |
+the include instruction and it should be the same as what comes before "=" in |
+the `-i` option. |
Commonly used sources have generally accepted names. For example the main |
EasyList repository is referred to as `easylist`. |
@@ -86,24 +100,25 @@ |
You can clone the necessary repositories to a local directory and add `-i` |
options accordingly. |
-## Rendering diffs |
+<a id="diffs"></a> |
+## Generating diffs |
A diff allows a client running ad blocking software such as Adblock Plus to |
update the filter lists incrementally, instead of downloading a new copy of a |
full list during each update. This is meant to lessen the amount of resources |
used when updating filter lists (e.g. network data, memory usage, battery |
-consumption, etc.), allowing clients to update their lists more frequently using |
-less resources. |
+consumption, etc.), allowing clients to update their lists more frequently |
+using less resources. |
-Python-abp contains a script called `fldiff` that will find the diff between the |
-latest filter list, and any number of previous filter lists: |
+python-abp contains a script called `fldiff` that will find the diff between |
+the latest filter list, and any number of previous filter lists: |
- $ fldiff -o diffs/easylist easylist.txt archive/* |
+ $ fldiff -o diffs/easylist/ easylist.txt archive/* |
-where `-o diffs/easylist` is the (optional) output directory where the diffs |
-should be written, `easylist.txt` is the most recent version of the filter list, |
-and `archive/*` is the directory where all the archived filter lists are. When |
-called like this, the shell should automatically expand the `archive/*` |
+where `-o diffs/easylist/` is the (optional) output directory where the diffs |
+should be written, `easylist.txt` is the most recent version of the filter |
+list, and `archive/*` is the directory where all the archived filter lists are. |
+When called like this, the shell should automatically expand the `archive/*` |
directory, giving the script each of the filenames separately. |
In the above example, the output of each archived `list[version].txt` will be |
@@ -117,10 +132,10 @@ |
* Added filters of the form `+ <filter-text>` |
* Removed filters of the form `- <filter-text>` |
- |
+<a id="library"></a> |
## Library API |
-Python-abp can also be used as a library for parsing filter lists. For example |
+python-abp can also be used as a library for parsing filter lists. For example |
to read a filter list (we use Python 3 syntax here but the API is the same): |
from abp.filters import parse_filterlist |
@@ -129,7 +144,7 @@ |
for line in parse_filterlist(filterlist): |
print(line) |
-If `filterlist.txt` contains a filter list: |
+If `filterlist.txt` contains this filter list: |
[Adblock Plus 2.0] |
! Title: Example list |
@@ -137,7 +152,6 @@ |
abc.com,cdf.com##div#ad1 |
abc.com/ad$image |
@@/abc\.com/ |
- ... |
the output will look something like: |
@@ -147,26 +161,24 @@ |
Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'div#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', True)])]) |
Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)]) |
Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[]) |
- ... |
-`abp.filters` module also exports a lower-level function for parsing individual |
-lines of a filter list: `parse_line`. It returns a parsed line object just like |
-the items in the iterator returned by `parse_filterlist`. |
+The `abp.filters` module also exports a lower-level function for parsing |
+individual lines of a filter list: `parse_line`. It returns a parsed line |
+object just like the items in the iterator returned by `parse_filterlist`. |
For further information on the library API use `help()` on `abp.filters` and |
-its contents in interactive Python session, read the docstrings or look at the |
-tests for some usage examples. |
+its contents in an interactive Python session, read the docstrings, or look at |
+the tests for some usage examples. |
+<a id="testing"></a> |
## Testing |
-Unit tests for `python-abp` are located in the `/tests` directory. |
-[Pytest][2] is used for quickly running the tests |
-during development. |
-[Tox][3] is used for testing in different |
-environments (Python 2.7, Python 3.5+ and PyPy) and code quality |
-reporting. |
+Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2] |
+is used for quickly running the tests during development. [Tox][3] is used for |
+testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code |
+quality reporting. |
-In order to execute the tests, first create and activate development |
+In order to execute the tests, first create and activate a development |
virtualenv: |
$ python setup.py devenv |
@@ -180,17 +192,18 @@ |
(devenv) $ tox |
+<a id="development"></a> |
## Development |
-When adding new functionality, add tests for it (preferably first). Code |
-coverage (as measured by `tox -e qa`) should not decrease and the tests |
-should pass in all Tox environments. |
+When adding new functionality, add tests for it (preferably first). If some |
+code will never be reached on a certain version of Python, it may be exempted |
+from coverage tests by adding a comment, e.g. `# pragma: no py2 cover`. |
All public functions, classes and methods should have docstrings compliant with |
[NumPy/SciPy documentation guide][4]. One exception is the constructors of |
classes that the user is not expected to instantiate (such as exceptions). |
- |
+<a id="r"></a> |
## Using the library with R |
Clone the repo to you local machine. Then create a virtualenv and install |