Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: sitescripts/crawler/README.md

Issue 8327353: Crawler backend (Closed)
Patch Set: Created Sept. 14, 2012, 2:23 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 crawler
2 =======
3
4 Backend for the Adblock Plus Crawler. It provides the following URLs:
5
6 * */crawlableSites* - Return a list of sites to be crawled
7 * */crawlerRun, /crawlerData* - Receive data on filtered elements
8
9 Required packages
10 -----------------
11
12 * [simplejson](http://pypi.python.org/pypi/simplejson/)
13
14 Database setup
15 --------------
16
17 Just execute the statements in _schema.sql_.
18
19 Configuration
20 -------------
21
22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.
23
24 Also make sure that the following keys are configured in the _DEFAULT_
25 section:
26
27 * _database_
28 * _dbuser_
29 * _dbpassword_
30 * _basic\_auth\_realm_
31 * _basic\_auth\_username_
32 * _basic\_auth\_password_
33
34 Extracting crawler sites
35 ------------------------
36
37 Make _filter\_list\_repository_ in the _crawler_ configuration section
38 point to the local Mercurial repository of a filter list.
39
40 Then execute the following:
41
42 python -m sitescripts.crawler.bin.extract_crawler_sites > crawler_sites.sql
43
44 Now you can execute the insert statements from _crawler\_sites.sql_.
OLDNEW

Powered by Google App Engine
This is Rietveld