sitescripts/crawler/README.md - Issue 9045097: sitescripts: Unmerged changes

Side by Side Diff

Use n/p to move between diff chunks; N/P to move between comments.

Keyboard Shortcuts

	File
u :	up to issue
m :	publish + mail comments
M :	edit review message
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line
<Enter> :	respond to / edit current comment
d :	mark current comment as done

	Issue
u :	up to list of issues
m :	publish + mail comments
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue
# :	close issue

	Comment/message editing
<Ctrl> + s or <Ctrl> + Enter :	save comment
<Esc> :	cancel edit

Side by Side Diff: sitescripts/crawler/README.md

Issue 9045097: sitescripts: Unmerged changes (Closed)

Patch Set: Created Dec. 21, 2012, 9:39 a.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View unified diff | Download patch

OLD	NEW
1 crawler	1 crawler

2 =======	2 =======

3	3

4 Backend for the Adblock Plus Crawler. It provides the following URLs:	4 Backend for the Adblock Plus Crawler. It provides the following URLs:

5	5

6 * /crawlableSites - Return a list of sites to be crawled	6 * /crawlableSites - Return a list of sites to be crawled

7 * /crawlerData - Receive data on filtered elements	7 * /crawlerRequests - Receive all requests made, and whether they were filtered

8	8

9 Required packages	9 Required packages

10 -----------------	10 -----------------

11	11

12 * [simplejson](http://pypi.python.org/pypi/simplejson/)	12 * [simplejson](http://pypi.python.org/pypi/simplejson/)

13	13

14 Database setup	14 Database setup

15 --------------	15 --------------

16	16

17 Just execute the statements in _schema.sql_.	17 Just execute the statements in _schema.sql_.

18	18

19 Configuration	19 Configuration

20 -------------	20 -------------

21	21

22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.	22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_.

23	23

	24 If you want to import crawlable sites from easylist (see below), you

	25 need to make _easylist\_repository_ point to the local Mercurial

	26 repository of easylist.

	27

24 Also make sure that the following keys are configured in the _DEFAULT_	28 Also make sure that the following keys are configured in the _DEFAULT_

25 section:	29 section:

26	30

27 * _database_	31 * _database_

28 * _dbuser_	32 * _dbuser_

29 * _dbpassword_	33 * _dbpassword_

30 * _basic\_auth\_realm_	34 * _basic\_auth\_realm_

31 * _basic\_auth\_username_	35 * _basic\_auth\_username_

32 * _basic\_auth\_password_	36 * _basic\_auth\_password_

33	37

34 Extracting crawler sites	38 Importing crawlable sites from easylist

35 ------------------------	39 ---------------------------------------

36	40

37 Make _filter\_list\_repository_ in the _crawler_ configuration section	41 python -m sitescripts.crawler.bin.import_sites

38 point to the local Mercurial repository of a filter list.

39

40 Then execute the following:

41

42 python -m sitescripts.crawler.bin.extract_sites > sites.sql

43

44 Now you can execute the insert statements from _crawler.sql_.

OLD	NEW

« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_sites.py » ('j') | sitescripts/crawler/bin/import_sites.py » ('J')