Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 5288886037118976: Adblock Plus Crawler rewrite (Closed)

Created:
April 24, 2015, 3:38 p.m. by Wladimir Palant
Modified:
Sept. 14, 2016, 2:38 p.m.
CC:
Felix Dahlke
Visibility:
Public.

Description

Don`t hesitate to nit pick, this code was written rather quickly so readability might not be ideal.

Patch Set 1 #

Total comments: 117

Patch Set 2 : Addressed comments #

Total comments: 4
Unified diffs Side-by-side diffs Delta from patch set Stats (+1156 lines, -0 lines) Patch
A .hgignore View 1 1 chunk +7 lines, -0 lines 0 comments Download
A README.md View 1 1 chunk +48 lines, -0 lines 0 comments Download
A build.py View 1 chunk +16 lines, -0 lines 0 comments Download
A chrome/locale/en-US/meta.properties View 1 chunk +6 lines, -0 lines 0 comments Download
A dependencies View 1 chunk +3 lines, -0 lines 0 comments Download
A ensure_dependencies.py View 1 chunk +299 lines, -0 lines 0 comments Download
A icon.png View Binary file 0 comments Download
A icon64.png View Binary file 0 comments Download
A lib/commandLine.js View 1 1 chunk +59 lines, -0 lines 1 comment Download
A lib/crawler.js View 1 1 chunk +398 lines, -0 lines 2 comments Download
A lib/debug.js View 1 1 chunk +24 lines, -0 lines 1 comment Download
A lib/main.js View 1 1 chunk +88 lines, -0 lines 0 comments Download
A metadata.gecko View 1 1 chunk +8 lines, -0 lines 0 comments Download
A run.py View 1 1 chunk +200 lines, -0 lines 0 comments Download

Messages

Total messages: 11
Wladimir Palant
April 24, 2015, 3:38 p.m. (2015-04-24 15:38:46 UTC) #1
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/.hgignore File .hgignore (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/.hgignore#newcode5 .hgignore:5: *.pyc Nit: A while ago we started to list ...
April 27, 2015, 2:55 p.m. (2015-04-27 14:55:50 UTC) #2
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py File run.py (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py#newcode154 run.py:154: print >>handle, 'url=%s' % url On 2015/04/27 14:55:50, Sebastian ...
April 28, 2015, 9:21 a.m. (2015-04-28 09:21:36 UTC) #3
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py File run.py (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py#newcode25 run.py:25: def __init__(self, parameters): On 2015/04/27 14:55:50, Sebastian Noack wrote: ...
April 28, 2015, 10:42 a.m. (2015-04-28 10:42:25 UTC) #4
saroyanm
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/README.md File README.md (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/README.md#newcode26 README.md:26: Optionally, you can provide the path to the Adblock ...
May 4, 2015, 6:13 p.m. (2015-05-04 18:13:43 UTC) #5
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/commandLine.js File lib/commandLine.js (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/commandLine.js#newcode8 lib/commandLine.js:8: * @module commandLine On 2015/05/04 18:13:43, saroyanm wrote: > ...
May 4, 2015, 8:39 p.m. (2015-05-04 20:39:01 UTC) #6
saroyanm
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/commandLine.js File lib/commandLine.js (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/commandLine.js#newcode8 lib/commandLine.js:8: * @module commandLine On 2015/05/04 20:39:01, Sebastian Noack wrote: ...
May 5, 2015, 9:36 a.m. (2015-05-05 09:36:39 UTC) #7
Wladimir Palant
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/.hgignore File .hgignore (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/.hgignore#newcode5 .hgignore:5: *.pyc On 2015/04/27 14:55:50, Sebastian Noack wrote: > Nit: ...
May 7, 2015, 12:04 a.m. (2015-05-07 00:04:59 UTC) #8
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py File run.py (right): http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/run.py#newcode50 run.py:50: urlhash = hashlib.new('md5', data['url']).hexdigest() On 2015/05/07 00:04:59, Wladimir Palant ...
May 7, 2015, 12:46 a.m. (2015-05-07 00:46:45 UTC) #9
Sebastian Noack
http://codereview.adblockplus.org/5288886037118976/diff/5673385510043648/lib/commandLine.js File lib/commandLine.js (right): http://codereview.adblockplus.org/5288886037118976/diff/5673385510043648/lib/commandLine.js#newcode31 lib/commandLine.js:31: onShutdown.add((function() Nit: I'd rather use an arrow function instead ...
May 7, 2015, 12:33 p.m. (2015-05-07 12:33:04 UTC) #10
saroyanm
May 7, 2015, 1:19 p.m. (2015-05-07 13:19:00 UTC) #11
LGTM with small Nit.

http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/READ...
File README.md (right):

http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/READ...
README.md:29: License
On 2015/05/07 00:04:59, Wladimir Palant wrote:
> On 2015/05/04 18:13:43, saroyanm wrote:
> > Is there a purpose why we use MPL instead of GPL ? Same questions goes to
> other
> > JS files License headers.
> 
> This extension was written before we switched to the GPL, and I didn't bother
> relicensing it. Frankly, none of our Firefox extensions have been relicensed
so
> far - with exception of Adblock Plus itself.

Got it, thanks.

http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/...
File lib/crawler.js (right):

http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/...
lib/crawler.js:319: result.screenshot = canvas.toDataURL("image/jpeg", 0.8);
On 2015/05/07 00:04:59, Wladimir Palant wrote:
> On 2015/05/04 18:13:43, saroyanm wrote:
> > Maybe make sense to let user specify the quality of image same way it's done
> for
> > "maxtabs" ?
> 
> Well, changing maxtabs is very useful when testing. Given that there won't be
a
> "user" here but merely a server running this, I cannot really imagine anybody
> tweaking this parameter. Still, we can easily add it if we need it.

Good point.

http://codereview.adblockplus.org/5288886037118976/diff/5629499534213120/lib/...
lib/crawler.js:324: result.error = "Capturing screenshot failed: " + e;
On 2015/05/07 00:04:59, Wladimir Palant wrote:
> On 2015/05/04 18:13:43, saroyanm wrote:
> > Isn't result.error redundant ?
> 
> No. The exception is merely reported to the console but not stored with the
data
> - useful for debugging, nothing beyond that. The result variable on the other
> hand will be saved in a file, so somebody will be able to look up why the
> screenshot for this particular site wasn't saved.

Yes, missed that. 
Good point.

http://codereview.adblockplus.org/5288886037118976/diff/5673385510043648/lib/...
File lib/crawler.js (right):

http://codereview.adblockplus.org/5288886037118976/diff/5673385510043648/lib/...
lib/crawler.js:132: if ((flags & Ci.nsIWebProgressListener.STATE_STOP) &&
Nit: We usually also move operators to new line, for consistency.
if ((flags & Ci.nsIWebProgressListener.STATE_STOP)
    && (flags & Ci.nsIWebProgressListener.STATE_IS_WINDOW))

Powered by Google App Engine
This is Rietveld