Index: README.md |
=================================================================== |
new file mode 100644 |
--- /dev/null |
+++ b/README.md |
@@ -0,0 +1,48 @@ |
+abpcrawler |
+========== |
+ |
+This tool loads a range of websites in Firefox and records which requests are |
+blocked by the [Adblock Plus extension](http://adblockplus.org). |
+ |
+Requirements |
+------------ |
+ |
+* [Python 2.7](https://www.python.org) |
+* [The Jinja2 module](http://jinja.pocoo.org/docs) |
+* [mozrunner module](https://pypi.python.org/pypi/mozrunner) |
+* [Firefox](https://www.mozilla.org/en-US/firefox/) |
+ |
+Running |
+------- |
+ |
+Execute the following: |
+ |
+ ./run.py -b /usr/bin/firefox urls.txt outputdir |
+ |
+This will run the specified Firefox binary to crawl the URLs from `urls.txt` |
+(one URL per line). The resulting data and screenshots will be written to the |
+`outputdir` directory. Firefox will close automatically once all URLs have been |
+processed. |
+ |
+The complete list of command line flags: |
+ |
+ -h, --help show help message and exit |
+ -b BINARY, --binary BINARY |
+ path to the Firefox binary |
+ -a ABPDIR, --abpdir ABPDIR |
+ path to the Adblock Plus repository |
+ -f url [url ...], --filters url [url ...] |
+ filter lists to install in Adblock Plus. The arguments |
+ can also have the format path=url, the data will be |
+ read from the specified path then. |
+ -t TIMEOUT, --timeout TIMEOUT |
+ Load timeout (seconds) |
+ -x MAXTABS, --maxtabs MAXTABS |
+ Maximal number of tabs to open in parallel |
+ |
+License |
+------- |
+ |
+This Source Code is subject to the terms of the Mozilla Public License |
+version 2.0 (the "License"). You can obtain a copy of the License at |
+http://mozilla.org/MPL/2.0/. |