Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Unified Diff: sitescripts/crawler/README.md

Issue 8492019: sitescripts: Collect unmatched filters (Closed)
Patch Set: Created Oct. 2, 2012, 5:02 a.m.
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: sitescripts/crawler/README.md
===================================================================
--- a/sitescripts/crawler/README.md
+++ b/sitescripts/crawler/README.md
@@ -4,7 +4,7 @@
Backend for the Adblock Plus Crawler. It provides the following URLs:
* */crawlableSites* - Return a list of sites to be crawled
-* */crawlerData* - Receive data on filtered elements
+* */crawlerRequests* - Receive all requests made, and whether they were filtered
Required packages
-----------------
@@ -21,6 +21,10 @@
Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.
+If you want to import crawlable sites or domain-specific filters from
+easylist (see below), you need to make _easylist\_repository_ point to
+the local Mercurial repository of easylist.
+
Also make sure that the following keys are configured in the _DEFAULT_
section:
@@ -31,14 +35,12 @@
* _basic\_auth\_username_
* _basic\_auth\_password_
-Extracting crawler sites
-------------------------
+Importing crawlable sites from easylist
+---------------------------------------
-Make _filter\_list\_repository_ in the _crawler_ configuration section
-point to the local Mercurial repository of a filter list.
+ python -m sitescripts.crawler.bin.import_sites
-Then execute the following:
+Importing domain-specific filters from easylist
+-----------------------------------------------
- python -m sitescripts.crawler.bin.extract_sites > sites.sql
-
-Now you can execute the insert statements from _crawler.sql_.
+ python -m sitescripts.crawler.bin.import_filters
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld