Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: sitescripts/crawler/README.md

Issue 9045097: sitescripts: Unmerged changes (Closed)
Patch Set: Created Dec. 21, 2012, 9:39 a.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
OLDNEW
1 crawler 1 crawler
2 ======= 2 =======
3 3
4 Backend for the Adblock Plus Crawler. It provides the following URLs: 4 Backend for the Adblock Plus Crawler. It provides the following URLs:
5 5
6 * */crawlableSites* - Return a list of sites to be crawled 6 * */crawlableSites* - Return a list of sites to be crawled
7 * */crawlerData* - Receive data on filtered elements 7 * */crawlerRequests* - Receive all requests made, and whether they were filtered
8 8
9 Required packages 9 Required packages
10 ----------------- 10 -----------------
11 11
12 * [simplejson](http://pypi.python.org/pypi/simplejson/) 12 * [simplejson](http://pypi.python.org/pypi/simplejson/)
13 13
14 Database setup 14 Database setup
15 -------------- 15 --------------
16 16
17 Just execute the statements in _schema.sql_. 17 Just execute the statements in _schema.sql_.
18 18
19 Configuration 19 Configuration
20 ------------- 20 -------------
21 21
22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_. 22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.
23 23
24 If you want to import crawlable sites from easylist (see below), you
25 need to make _easylist\_repository_ point to the local Mercurial
26 repository of easylist.
27
24 Also make sure that the following keys are configured in the _DEFAULT_ 28 Also make sure that the following keys are configured in the _DEFAULT_
25 section: 29 section:
26 30
27 * _database_ 31 * _database_
28 * _dbuser_ 32 * _dbuser_
29 * _dbpassword_ 33 * _dbpassword_
30 * _basic\_auth\_realm_ 34 * _basic\_auth\_realm_
31 * _basic\_auth\_username_ 35 * _basic\_auth\_username_
32 * _basic\_auth\_password_ 36 * _basic\_auth\_password_
33 37
34 Extracting crawler sites 38 Importing crawlable sites from easylist
35 ------------------------ 39 ---------------------------------------
36 40
37 Make _filter\_list\_repository_ in the _crawler_ configuration section 41 python -m sitescripts.crawler.bin.import_sites
38 point to the local Mercurial repository of a filter list.
39
40 Then execute the following:
41
42 python -m sitescripts.crawler.bin.extract_sites > sites.sql
43
44 Now you can execute the insert statements from _crawler.sql_.
OLDNEW
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | sitescripts/crawler/bin/import_sites.py » ('J')

Powered by Google App Engine
This is Rietveld