Index: sitescripts/crawler/README.md |
=================================================================== |
--- a/sitescripts/crawler/README.md |
+++ b/sitescripts/crawler/README.md |
@@ -4,7 +4,7 @@ |
Backend for the Adblock Plus Crawler. It provides the following URLs: |
* */crawlableSites* - Return a list of sites to be crawled |
-* */crawlerData* - Receive data on filtered elements |
+* */crawlerRequests* - Receive all requests made, and whether they were filtered |
Required packages |
----------------- |
@@ -21,6 +21,10 @@ |
Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_. |
+If you want to import crawlable sites or domain-specific filters from |
+easylist (see below), you need to make _easylist\_repository_ point to |
+the local Mercurial repository of easylist. |
+ |
Also make sure that the following keys are configured in the _DEFAULT_ |
section: |
@@ -31,14 +35,12 @@ |
* _basic\_auth\_username_ |
* _basic\_auth\_password_ |
-Extracting crawler sites |
------------------------- |
+Importing crawlable sites from easylist |
+--------------------------------------- |
-Make _filter\_list\_repository_ in the _crawler_ configuration section |
-point to the local Mercurial repository of a filter list. |
+ python -m sitescripts.crawler.bin.import_sites |
-Then execute the following: |
+Importing domain-specific filters from easylist |
+----------------------------------------------- |
- python -m sitescripts.crawler.bin.extract_sites > sites.sql |
- |
-Now you can execute the insert statements from _crawler.sql_. |
+ python -m sitescripts.crawler.bin.import_filters |