Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Unified Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)
Patch Set: Created June 14, 2017, 5:45 p.m.
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: README.md
===================================================================
--- a/README.md
+++ b/README.md
@@ -1,20 +1,21 @@
# python-abp
-This repository contains the script that is used for building Adblock Plus
-filter lists from the form in which they are authored into the format suitable
-for consumption by the adblocking software.
+This repository contains a library for working with Adblock Plus filter lists
+and the script that is used for building Adblock Plus filter lists from the
+form in which they are authored into the format suitable for consumption by the
+adblocking software.
## Installation
Prerequisites:
* Linux, Mac OS X or Windows (any modern Unix should work too),
-* Python (2.7 or 3.5),
+* Python (2.7 or 3.5, 3.6),
* pip.
To install:
$ pip install -U python-abp
## Rendering of filter lists
@@ -23,30 +24,30 @@
for particular geographical area.
We call these parts _filter list fragments_ (or just _fragments_)
to distinguish them from full filter lists that are
consumed by the adblocking software such as Adblock Plus.
Rendering is a process that combines filter list fragments into a filter list.
It starts with one fragment that can include other ones and so forth.
The produced filter list is marked with a version, a timestamp and
-a [checksum](https://adblockplus.org/filters#special-comments).
+a [checksum][1].
Python-abp contains a script that can do this called `flrender`:
$ flrender fragment.txt filterlist.txt
This will take the top level fragment in `fragment.txt`, render it and save into
`filterlist.txt`.
Fragments might reference other fragments that should be included into them.
The references come in two forms: http(s) includes and local includes:
%include http://www.server.org/dir/list.txt%
- %include easylist:easylist/easylist_general_block.txt
+ %include easylist:easylist/easylist_general_block.txt%
The first instruction contains a URL that will be fetched and inserted at the
point of reference.
The second one contains a path inside easylist repository.
`flrender` needs to be able to find a copy of the repository on the local
filesystem. We use `-i` option to point it to to the right directory:
$ flrender -i easylist=/home/abc/easylist input.txt output.txt
@@ -70,30 +71,148 @@
$ flrender easylist.txt output/easylist.txt
Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
al_block.txt' from 'easylist.txt'
You can clone the necessary repositories to a local directory and add `-i`
options accordingly.
+## Library API
+
+Python-abp can also be used as a library for parsing filter lists. For example
+to read a filter list (we use Python 3 syntax here but the API is the same):
+
+ from abp.filter import parse_filterlist
+
+ with open('filterlist.txt') as filterlist:
+ for line in parse_filterlist(filterlist):
+ print(line)
+
+If `filterlist.txt` contains a filter list, the output will look similar to
+the following:
+
+ Header(version='Adblock Plus 2.0')
+ Metadata(key='Title', value='Example List')
+ EmptyLine()
+ Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value':
+ 'div#ad1'}, action='hide', options={'domains-include': ['abc.com',
+ 'cdf.com'], 'domains-none': True})
+ Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value':
+ 'abc.com/ad'}, action='block', options={'types-none': True,
+ 'types-include': ['image']})
+ Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value':
+ 'abc\\.com'}, action='allow', options={})
+ ...
+
+In general `parse_filterlist` takes an iterable of strings (such as a list or
+an open file) and returns an iterable of parsed filter list lines. Each line
+will have its `.type` attribute set to a string indicating its type. It will
+also have a `.to_string()` method that converts it to a unicode string in the
+filter list format (most of the time it's the same as the string from which the
+filter was parsed). Further attributes depend on the type of the line.
+
+**Note:** `parse_filterlist` returns an iterator, not a list, and only consumes
+the input lines when its output is iterated over. This allows much more memory
+efficient handling of large filter lists, however there are two things to watch
+out for:
+
+- When you're parsing filters from a file, you need to complete the iteration
+ before you close the file.
+- Once you iterate over the output of `parse_filterlist` once, it will be
+ consumed and you won't be iterate over it again.
+
+If you find that any of these issues is bothering you, you probably want to
+convert the output of `parse_filterlist` to a list:
+
+ lines_list = list(parse_filterlist(filterlist))
+
+This will load the whole file into memory but unless you're dealing with a
+gigantic filter list that should not be a problem.
+
+### Line types
+
+As mentioned before, lines of different types have different attributes:
+
+| type | attributes |
+|------------|------------------------------------------------------------------------|
+| header | `version` - plugin version string |
+| emptyline | no options |
+| comment | `text` - text of the comment |
+| metadata | `key` - name of the metadata field, `value` - value of the field |
+| include | `target` - url/path of the file to include |
+| invalid | `text` - full text of the line, error - error message |
+| filter | `text` - text of the filter, `selector` - what to look for, `action` - what to do with selected items, `options` - filter options |
+
+#### Filter atributes
+
+Selector is a dictionary with two keys:
+
+| key | meaning |
+|--------------|----------------------------------------------------|
+| type | 'css', 'abp-simple', 'url-pattern', 'url-regexp' |
+| value | the selector itself, the meaning is type-dependent |
+
+Options is a dictionary with a variable set of keys. Only options that are
+actually present in the filter will be stored there. The list of possible options
+and their meanings can be found in [documentation on authoring the filter
+rules][2].
+
+There are four classes of options that are handled differently:
+
+- Type options (that make the rule apply or not apply to certain types of
+ requests and resources):
+ - `types-include`: List of additional types to which the rule applies.
+ - `types-exclude`: List of types to which the rule doesn't apply.
+ - `types-none`: If this is `True`, the filter only applies to the types
+ in `types-include`. Otherwise all types except for `document`, `popup`,
+ `elemhide`, `generichide` and `genericblock` are implicitly included.
+- Domain options (that make the rule apply or not apply to specific domains):
+ - `domains-include`: List of domains to which the rule applies (it will also
+ apply to any subdomains unless they are excluded).
+ - `domains-exclude`: Excluded domains (their subdomains are also excluded
+ unless specifically included).
+ - `domains-none`: If this is `True`, all domains that are not mentioned by
+ `domains-include` and `domains-exclude` are excluded. Otherwise they are
+ included.
+- `sitekeys`: List of sitekeys that can be used to activate the rule.
+- Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2]
+ for more information on their meaning.
+
+### Other functions
+
+`abp.filters` module also exports two lower-level functions for parsing
+individual lines of filter list or individual filters. Not very surprisingly
+they are called `parse_line` and `parse_filter` respectively. Both will return
+a parsed line object just like the items in the iterator returned by
+`parse_filterlist`. The difference between them is that `parse_line` tries to
+do line type detection and `parse_filter` will always try to interpret things
+as a filter. Both functions will throw a `ParseError` exception instead of
+returning a line with `type="invalid"`.
+
## Testing
Unit tests for `python-abp` are located in the `/tests` directory.
-[Pytest](http://pytest.org/) is used for quickly running the tests
+[Pytest][3] is used for quickly running the tests
during development.
-[Tox](https://tox.readthedocs.org/) is used for testing in different
-environments (Python 2.7, Python 3.5 and PyPy) and code quality
+[Tox][4] is used for testing in different
+environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality
reporting.
In order to execute the tests, first create and activate development
virtualenv:
$ python setup.py devenv
$ . devenv/bin/activate
With the development virtualenv activated use pytest for a quick test run:
- (devenv) $ py.test tests
+ (devenv) $ pytest tests
and tox for a comprehensive report:
(devenv) $ tox
+
+
+ [1]: https://adblockplus.org/filters#special-comments
+ [2]: https://adblockplus.org/filters#options
+ [3]: http://pytest.org/
+ [4]: https://tox.readthedocs.org/
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld