Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Unified Diff: sitescripts/crawler/README.md

Issue 9045097: sitescripts: Unmerged changes (Closed)
Patch Set: Created Dec. 21, 2012, 9:39 a.m.
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | sitescripts/crawler/bin/import_sites.py » ('J')
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: sitescripts/crawler/README.md
===================================================================
--- a/sitescripts/crawler/README.md
+++ b/sitescripts/crawler/README.md
@@ -4,7 +4,7 @@
Backend for the Adblock Plus Crawler. It provides the following URLs:
* */crawlableSites* - Return a list of sites to be crawled
-* */crawlerData* - Receive data on filtered elements
+* */crawlerRequests* - Receive all requests made, and whether they were filtered
Required packages
-----------------
@@ -21,6 +21,10 @@
Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.
+If you want to import crawlable sites from easylist (see below), you
+need to make _easylist\_repository_ point to the local Mercurial
+repository of easylist.
+
Also make sure that the following keys are configured in the _DEFAULT_
section:
@@ -31,14 +35,7 @@
* _basic\_auth\_username_
* _basic\_auth\_password_
-Extracting crawler sites
-------------------------
+Importing crawlable sites from easylist
+---------------------------------------
-Make _filter\_list\_repository_ in the _crawler_ configuration section
-point to the local Mercurial repository of a filter list.
-
-Then execute the following:
-
- python -m sitescripts.crawler.bin.extract_sites > sites.sql
-
-Now you can execute the insert statements from _crawler.sql_.
+ python -m sitescripts.crawler.bin.import_sites
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | sitescripts/crawler/bin/import_sites.py » ('J')

Powered by Google App Engine
This is Rietveld