Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 8327353: Crawler backend (Closed)

Created:
Sept. 14, 2012, 2:23 p.m. by Felix Dahlke
Modified:
Sept. 28, 2012, 2:31 p.m.
Reviewers:
Wladimir Palant
Visibility:
Public.

Description

I've made various changes and improvements to the crawler backend while working on the crawler.

Patch Set 1 #

Total comments: 2

Patch Set 2 : README fix #

Total comments: 20

Patch Set 3 : #

Total comments: 9

Patch Set 4 : #

Total comments: 8

Patch Set 5 : #

Total comments: 1
Unified diffs Side-by-side diffs Delta from patch set Stats (+382 lines, -43 lines) Patch
M multiplexer.py View 1 2 1 chunk +17 lines, -13 lines 0 comments Download
A sitescripts/crashes/__init__.py View 1 2 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/crashes/web/__init__.py View 1 2 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/crashes/web/submitCrash.py View 1 2 1 chunk +55 lines, -0 lines 0 comments Download
A sitescripts/crawler/README.md View 1 1 chunk +44 lines, -0 lines 0 comments Download
A sitescripts/crawler/bin/__init__.py View 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/crawler/bin/extract_crawler_sites.py View 1 2 3 1 chunk +36 lines, -0 lines 0 comments Download
A sitescripts/crawler/schema.sql View 1 2 1 chunk +27 lines, -0 lines 0 comments Download
M sitescripts/crawler/web/crawler.py View 1 2 3 4 1 chunk +102 lines, -24 lines 0 comments Download
M sitescripts/extensions/bin/createNightlies.py View 5 chunks +51 lines, -11 lines 0 comments Download
A sitescripts/extensions/template/androidupdates.xml View 1 chunk +11 lines, -0 lines 0 comments Download
M sitescripts/extensions/utils.py View 1 chunk +2 lines, -0 lines 0 comments Download
A sitescripts/hg/__init__.py View 1 2 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/hg/bin/__init__.py View 1 2 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/hg/bin/irchook.py View 1 2 1 chunk +13 lines, -0 lines 0 comments Download
M sitescripts/web.py View 1 2 3 4 2 chunks +29 lines, -0 lines 1 comment Download

Messages

Total messages: 15
Felix Dahlke
Sept. 14, 2012, 2:31 p.m. (2012-09-14 14:31:27 UTC) #1
Felix Dahlke
Sept. 14, 2012, 2:40 p.m. (2012-09-14 14:40:57 UTC) #2
Wladimir Palant
http://codereview.adblockplus.org/8327353/diff/1/sitescripts/crawler/bin/extract_crawler_sites.py File sitescripts/crawler/bin/extract_crawler_sites.py (right): http://codereview.adblockplus.org/8327353/diff/1/sitescripts/crawler/bin/extract_crawler_sites.py#newcode23 sitescripts/crawler/bin/extract_crawler_sites.py:23: matches = re.match(r"[A-Z]:.*(https?://.*)", line) What if the URL is ...
Sept. 14, 2012, 2:42 p.m. (2012-09-14 14:42:13 UTC) #3
Felix Dahlke
Uploaded a new patch set with a quick README fix.
Sept. 14, 2012, 2:42 p.m. (2012-09-14 14:42:44 UTC) #4
Wladimir Palant
http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/bin/extract_crawler_sites.py File sitescripts/crawler/bin/extract_crawler_sites.py (right): http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/bin/extract_crawler_sites.py#newcode20 sitescripts/crawler/bin/extract_crawler_sites.py:20: if line == "": Is this really a good ...
Sept. 14, 2012, 5:24 p.m. (2012-09-14 17:24:18 UTC) #5
Felix Dahlke
Wow, a lot of helpful suggestions, thanks :) Particularly happy about "for line in process.stdout" ...
Sept. 15, 2012, 2:40 a.m. (2012-09-15 02:40:19 UTC) #6
Felix Dahlke
http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/bin/extract_crawler_sites.py File sitescripts/crawler/bin/extract_crawler_sites.py (right): http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/bin/extract_crawler_sites.py#newcode20 sitescripts/crawler/bin/extract_crawler_sites.py:20: if line == "": On 2012/09/14 17:24:18, Wladimir Palant ...
Sept. 26, 2012, 3:20 p.m. (2012-09-26 15:20:30 UTC) #7
Wladimir Palant
Sorry, I cannot really review that without interdiff - too many changes to compare them ...
Sept. 27, 2012, 6:13 a.m. (2012-09-27 06:13:24 UTC) #8
Felix Dahlke
On 2012/09/27 06:13:24, Wladimir Palant wrote: > Sorry, I cannot really review that without interdiff ...
Sept. 27, 2012, 6:23 a.m. (2012-09-27 06:23:58 UTC) #9
Wladimir Palant
http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/web/crawler.py File sitescripts/crawler/web/crawler.py (right): http://codereview.adblockplus.org/8327353/diff/3002/sitescripts/crawler/web/crawler.py#newcode49 sitescripts/crawler/web/crawler.py:49: if current_line < 5 or not line: On 2012/09/26 ...
Sept. 27, 2012, 7:34 a.m. (2012-09-27 07:34:17 UTC) #10
Felix Dahlke
http://codereview.adblockplus.org/8327353/diff/18002/sitescripts/crawler/bin/extract_crawler_sites.py File sitescripts/crawler/bin/extract_crawler_sites.py (right): http://codereview.adblockplus.org/8327353/diff/18002/sitescripts/crawler/bin/extract_crawler_sites.py#newcode19 sitescripts/crawler/bin/extract_crawler_sites.py:19: matches = re.match(r".*\b(https?://\S*)", line) On 2012/09/27 07:34:17, Wladimir Palant ...
Sept. 27, 2012, 9:26 a.m. (2012-09-27 09:26:24 UTC) #11
Wladimir Palant
http://codereview.adblockplus.org/8327353/diff/23003/sitescripts/crawler/web/crawler.py File sitescripts/crawler/web/crawler.py (right): http://codereview.adblockplus.org/8327353/diff/23003/sitescripts/crawler/web/crawler.py#newcode28 sitescripts/crawler/web/crawler.py:28: @basic_auth Shouldn't this be @basic_auth(config_section="crawler")? http://codereview.adblockplus.org/8327353/diff/23003/sitescripts/crawler/web/crawler.py#newcode44 sitescripts/crawler/web/crawler.py:44: raise ValueError("Content-Type ...
Sept. 27, 2012, 1:44 p.m. (2012-09-27 13:44:51 UTC) #12
Felix Dahlke
http://codereview.adblockplus.org/8327353/diff/23003/sitescripts/crawler/web/crawler.py File sitescripts/crawler/web/crawler.py (right): http://codereview.adblockplus.org/8327353/diff/23003/sitescripts/crawler/web/crawler.py#newcode28 sitescripts/crawler/web/crawler.py:28: @basic_auth On 2012/09/27 13:44:51, Wladimir Palant wrote: > Shouldn't ...
Sept. 27, 2012, 2:15 p.m. (2012-09-27 14:15:33 UTC) #13
Wladimir Palant
LGTM http://codereview.adblockplus.org/8327353/diff/29002/sitescripts/web.py File sitescripts/web.py (right): http://codereview.adblockplus.org/8327353/diff/29002/sitescripts/web.py#newcode29 sitescripts/web.py:29: return decorator I already suspected that the old ...
Sept. 27, 2012, 2:24 p.m. (2012-09-27 14:24:13 UTC) #14
Felix Dahlke
Sept. 27, 2012, 2:25 p.m. (2012-09-27 14:25:23 UTC) #15
On 2012/09/27 14:24:13, Wladimir Palant wrote:
> LGTM

Hurray!
 
> http://codereview.adblockplus.org/8327353/diff/29002/sitescripts/web.py
> File sitescripts/web.py (right):
> 
>
http://codereview.adblockplus.org/8327353/diff/29002/sitescripts/web.py#newco...
> sitescripts/web.py:29: return decorator
> I already suspected that the old version didn't work :)

It did work until I added the parameter - and actually used it :)

Powered by Google App Engine
This is Rietveld