Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Unified Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)
Patch Set: Update README to match the changes from https://codereview.adblockplus.org/29465715/ Created Aug. 7, 2017, 8:28 p.m.
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: README.md
===================================================================
--- a/README.md
+++ b/README.md
@@ -1,20 +1,21 @@
# python-abp
-This repository contains the script that is used for building Adblock Plus
-filter lists from the form in which they are authored into the format suitable
-for consumption by the adblocking software.
+This repository contains a library for working with Adblock Plus filter lists
+and the script that is used for building Adblock Plus filter lists from the
+form in which they are authored into the format suitable for consumption by the
+adblocking software.
mathias 2017/08/08 12:24:35 For an introduction that is a bit too much. How ab
## Installation
Prerequisites:
* Linux, Mac OS X or Windows (any modern Unix should work too),
-* Python (2.7 or 3.5),
+* Python (2.7 or 3.5+),
* pip.
To install:
$ pip install -U python-abp
## Rendering of filter lists
@@ -23,30 +24,30 @@
for particular geographical area.
We call these parts _filter list fragments_ (or just _fragments_)
to distinguish them from full filter lists that are
consumed by the adblocking software such as Adblock Plus.
Rendering is a process that combines filter list fragments into a filter list.
It starts with one fragment that can include other ones and so forth.
The produced filter list is marked with a version, a timestamp and
-a [checksum](https://adblockplus.org/filters#special-comments).
+a [checksum][1].
Python-abp contains a script that can do this called `flrender`:
$ flrender fragment.txt filterlist.txt
This will take the top level fragment in `fragment.txt`, render it and save into
`filterlist.txt`.
Fragments might reference other fragments that should be included into them.
The references come in two forms: http(s) includes and local includes:
%include http://www.server.org/dir/list.txt%
- %include easylist:easylist/easylist_general_block.txt
+ %include easylist:easylist/easylist_general_block.txt%
The first instruction contains a URL that will be fetched and inserted at the
point of reference.
The second one contains a path inside easylist repository.
`flrender` needs to be able to find a copy of the repository on the local
filesystem. We use `-i` option to point it to to the right directory:
$ flrender -i easylist=/home/abc/easylist input.txt output.txt
@@ -70,30 +71,150 @@
$ flrender easylist.txt output/easylist.txt
Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
al_block.txt' from 'easylist.txt'
You can clone the necessary repositories to a local directory and add `-i`
options accordingly.
+## Library API
+
+Python-abp can also be used as a library for parsing filter lists. For example
+to read a filter list (we use Python 3 syntax here but the API is the same):
+
+ from abp.filters import parse_filterlist
+
+ with open('filterlist.txt') as filterlist:
+ for line in parse_filterlist(filterlist):
+ print(line)
+
+If `filterlist.txt` contains a filter list:
+
+ [Adblock Plus 2.0]
+ ! Title: Example list
+
+ abc.com,cdf.com##div#ad1
+ abc.com/ad$image
+ @@/abc\.com/
+ ...
+
+the output will look similar to the following:
+
+ Header(version='Adblock Plus 2.0')
+ Metadata(key='Title', value='Example list')
+ EmptyLine()
+ Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'div#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', True)])])
+ Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)])
+ Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[])
+ ...
+
+In general `parse_filterlist` takes an iterable of strings (such as a list or
+an open file) and returns an iterable of parsed filter list lines. Each line
+will have its `.type` attribute set to a string indicating its type. It will
+also have a `.to_string()` method that converts it to a unicode string in the
+filter list format (most of the time it's the same as the string from which the
+filter was parsed). Further attributes depend on the type of the line.
+
+**Note:** `parse_filterlist` returns an iterator, not a list, and only consumes
+the input lines when its output is iterated over. This allows much more memory
+efficient handling of large filter lists, however there are two things to watch
+out for:
+
+**Note:** iteration over parsed lines may throw a `ParseError` exception if a
+line cannot be parsed. The exception will contain the information about the
+error and the original line that failed parsing.
mathias 2017/08/08 12:24:35 It is not clear what bits this is about (I assume
Vasily Kuznetsov 2017/08/08 14:31:12 Yeah, we've discussed this. But for now that chang
+
+- When you're parsing filters from a file, you need to complete the iteration
+ before you close the file.
+- Once you iterate over the output of `parse_filterlist` once, it will be
+ consumed and you won't be iterate over it again.
+
+If you find that this is bothering you, you probably want to convert the output
mathias 2017/08/08 12:24:34 Everything in this section from here on, maybe inc
+of `parse_filterlist` to a list:
+
+ lines_list = list(parse_filterlist(filterlist))
+
+This will load the whole file into memory but unless you're dealing with a
+gigantic filter list that should not be a problem.
+
+### Line types
+
+As mentioned above, lines of different types have different attributes:
+
+| type | attributes |
mathias 2017/08/08 12:24:35 Are you sure this kind of table markup is supporte
Vasily Kuznetsov 2017/08/08 14:31:12 Indeed the table markup was not part of the origin
+|------------|------------------------------------------------------------------------|
+| header | `version` - plugin version string |
+| emptyline | no options |
+| comment | `text` - text of the comment |
+| metadata | `key` - name of the metadata field, `value` - value of the field |
+| include | `target` - url/path of the file to include |
+| filter | `text` - text of the filter, `selector` - what to look for, `action` - what to do with selected items, `options` - filter options |
+
+#### Filter atributes
mathias 2017/08/08 12:24:35 This section mentions "Selector" but not ".selecto
+
+Selector is a dictionary with two keys:
+
+| key | meaning |
+|--------------|------------------------------------------------------------------|
+| type | 'css', 'abp-simple', 'url-pattern', 'url-regexp', 'extended-css' |
+| value | the selector itself, the meaning is type-dependent |
+
+It's preferable to import `SELECTOR_TYPE` namespace from `abp.filters` to refer
+to filter types instead of using strings. `SELECTOR_TYPE` contains constants
+for each filter type: `SELECTOR_TYPE.CSS`, `SELECTOR_TYPE.ABP_SIMPLE`,
+`SELECTOR_TYPE.URL_PATTERN`, `SELECTOR_TYPE.URL_REGEXP` and
+`SELECTOR_TYPE.XCSS`.
+
+Action instructs adblocking software on what should be done with the items
+matching the selector:
+
+| action | meaning |
+|--------|------------------------------------------------------------------------|
+| block | block http(s) request that matches the selector |
+| allow | allow http(s) request that matches the filter (whitelist the resource) |
+| hide | hide the DOM element that matches the selector |
+| show | show the DOM element that matches the selector (whitelist the element) |
+
+The action constants are contained in `FILTER_ACTION` namespace, which can also
+be imported from `abp.filters` (`FILTER_ACTION.BLOCK`, `FILTER_ACTION.ALLOW`,
+etc.)
+
+Options is a list of tuples consisting of option name and option value. The
+option value is `True` or `False` for flags or, for options with a value, it's
+a string, list of strings or a list of `(string, boolean)` tuples. See
+[documentation on authoring the filter rules][2] for the list of existing
+options and their meanings.
+
+### Other functions
+
+`abp.filters` module also exports a lower-level function for parsing individual
+lines of a filter list: `parse_line`. It returns a parsed line object just like
+the items in the iterator returned by `parse_filterlist`.
+
## Testing
Unit tests for `python-abp` are located in the `/tests` directory.
-[Pytest](http://pytest.org/) is used for quickly running the tests
+[Pytest][3] is used for quickly running the tests
during development.
-[Tox](https://tox.readthedocs.org/) is used for testing in different
-environments (Python 2.7, Python 3.5 and PyPy) and code quality
+[Tox][4] is used for testing in different
+environments (Python 2.7, Python 3.5+ and PyPy) and code quality
reporting.
In order to execute the tests, first create and activate development
virtualenv:
$ python setup.py devenv
$ . devenv/bin/activate
With the development virtualenv activated use pytest for a quick test run:
- (devenv) $ py.test tests
+ (devenv) $ pytest tests
and tox for a comprehensive report:
(devenv) $ tox
+
+
+ [1]: https://adblockplus.org/filters#special-comments
+ [2]: https://adblockplus.org/filters#options
+ [3]: http://pytest.org/
+ [4]: https://tox.readthedocs.org/
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld