| Index: sitescripts/crawler/README.md |
| =================================================================== |
| --- a/sitescripts/crawler/README.md |
| +++ b/sitescripts/crawler/README.md |
| @@ -4,7 +4,7 @@ |
| Backend for the Adblock Plus Crawler. It provides the following URLs: |
| * */crawlableSites* - Return a list of sites to be crawled |
| -* */crawlerData* - Receive data on filtered elements |
| +* */crawlerRequests* - Receive all requests made, and whether they were filtered |
| Required packages |
| ----------------- |
| @@ -21,6 +21,10 @@ |
| Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_. |
| +If you want to import crawlable sites from easylist (see below), you |
| +need to make _easylist\_repository_ point to the local Mercurial |
| +repository of easylist. |
| + |
| Also make sure that the following keys are configured in the _DEFAULT_ |
| section: |
| @@ -31,14 +35,7 @@ |
| * _basic\_auth\_username_ |
| * _basic\_auth\_password_ |
| -Extracting crawler sites |
| ------------------------- |
| +Importing crawlable sites from easylist |
| +--------------------------------------- |
| -Make _filter\_list\_repository_ in the _crawler_ configuration section |
| -point to the local Mercurial repository of a filter list. |
| - |
| -Then execute the following: |
| - |
| - python -m sitescripts.crawler.bin.extract_sites > sites.sql |
| - |
| -Now you can execute the insert statements from _crawler.sql_. |
| + python -m sitescripts.crawler.bin.import_sites |