|
|
Created:
March 16, 2016, 2:18 p.m. by sergei Modified:
Sept. 23, 2016, 10:51 a.m. Reviewers:
Wladimir Palant CC:
tschuster Visibility:
Public. |
Description#abpcrawler project
Patch Set 1 #
Total comments: 25
Patch Set 2 : address comments #Patch Set 3 : add missed semicolon #Patch Set 4 : eliminate race conditions #
Total comments: 4
Patch Set 5 : address comment about comment #Patch Set 6 : Address comment about tab keeping window alive #
Total comments: 6
Patch Set 7 : remove additional empty line #MessagesTotal messages: 9
https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:13: Cu.import("resource://gre/modules/Promise.jsm"); Please remove this if you don't use Promise.defer() - we should be using the built-in Promise class. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:28: * Allocates a new tab "about:blank" in the `browser`. Please don't use backticks, these have no special meaning for JSDoc. Also, 'Allocates a new tab "about:blank"' => "Creates a blank tab" https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:29: * The method returns a {Promise} promise which is resolved with the `tab` "{Promise} promise" => "promise" please, the type doesn't add any information. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:30: * when `outerWindowID` is already initialized. Too many implementation details here. "which is resolved with the tab when `outerWindowID` is already initialized" => "which resolves once the tab is fully initialized." Besides, why do you describe the return value twice? That text belongs under @return below. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:32: * See: https://bugzilla.mozilla.org/show_bug.cgi?id=1256602#c1 This belongs into a comment above the actual implementation, not into JSDoc. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:35: * The tabbed browser where tabs should be created "tabs" => "the tab" https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:36: * @return {Promise} promise which will be resolved with the tab. Please use the more specific Promise.<tab> as type here. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:43: return new Promise((resolve, reject)=> Nit: Space before => please. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:68: // the array of `resolve` functions of {Promise} promises returned by `getTab`. This is a code comment, it's intended to be read by humans. Please (please!) don't use backticks or type annotations. Nit: how about "The queue containing resolve functions of promises waiting for a tab." https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:70: // Keep at least one tab alive to prevent browser from closing of it self. "browser from closing of it self" => "the browser from closing itself"? https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:79: * @result {Promise} Nit: Promise.<tab> here as well. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:86: // close initial tab, we don't need it anymore. Nit: Please start sentences with a capital letter. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:111: this._tabs--; Now you have the problem that in the theoretical scenario where tabs are closed faster than new allocation requests come in you will close the browser window along the way (last tab removed).
https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:13: Cu.import("resource://gre/modules/Promise.jsm"); On 2016/03/16 14:52:52, Wladimir Palant wrote: > Please remove this if you don't use Promise.defer() - we should be using the > built-in Promise class. Currently defer is still used by LoadListener. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:28: * Allocates a new tab "about:blank" in the `browser`. On 2016/03/16 14:52:52, Wladimir Palant wrote: > Please don't use backticks, these have no special meaning for JSDoc. Also, > 'Allocates a new tab "about:blank"' => "Creates a blank tab" Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:29: * The method returns a {Promise} promise which is resolved with the `tab` On 2016/03/16 14:52:52, Wladimir Palant wrote: > "{Promise} promise" => "promise" please, the type doesn't add any information. Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:30: * when `outerWindowID` is already initialized. On 2016/03/16 14:52:52, Wladimir Palant wrote: > Too many implementation details here. "which is resolved with the tab when > `outerWindowID` is already initialized" => "which resolves once the tab is fully > initialized." Besides, why do you describe the return value twice? That text > belongs under @return below. Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:32: * See: https://bugzilla.mozilla.org/show_bug.cgi?id=1256602#c1 On 2016/03/16 14:52:52, Wladimir Palant wrote: > This belongs into a comment above the actual implementation, not into JSDoc. Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:35: * The tabbed browser where tabs should be created On 2016/03/16 14:52:52, Wladimir Palant wrote: > "tabs" => "the tab" Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:36: * @return {Promise} promise which will be resolved with the tab. On 2016/03/16 14:52:52, Wladimir Palant wrote: > Please use the more specific Promise.<tab> as type here. Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:43: return new Promise((resolve, reject)=> On 2016/03/16 14:52:52, Wladimir Palant wrote: > Nit: Space before => please. Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:68: // the array of `resolve` functions of {Promise} promises returned by `getTab`. On 2016/03/16 14:52:52, Wladimir Palant wrote: > This is a code comment, it's intended to be read by humans. Please (please!) > don't use backticks or type annotations. > > Nit: how about "The queue containing resolve functions of promises waiting for a > tab." Sounds good! https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:70: // Keep at least one tab alive to prevent browser from closing of it self. On 2016/03/16 14:52:52, Wladimir Palant wrote: > "browser from closing of it self" => "the browser from closing itself"? Done. https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:111: this._tabs--; On 2016/03/16 14:52:52, Wladimir Palant wrote: > Now you have the problem that in the theoretical scenario where tabs are closed > faster than new allocation requests come in you will close the browser window > along the way (last tab removed). Yes. However I think it should be addressed at another level. What about setting browser.tabs.closeWindowWithLastTab to false? Do we need it and should it be in that commit or should I revert it to the array of tabs?
https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29338443/lib/crawler.js#newc... lib/crawler.js:111: this._tabs--; On 2016/03/16 15:47:00, sergei wrote: > On 2016/03/16 14:52:52, Wladimir Palant wrote: > > Now you have the problem that in the theoretical scenario where tabs are > closed > > faster than new allocation requests come in you will close the browser window > > along the way (last tab removed). > > Yes. However I think it should be addressed at another level. What about setting > browser.tabs.closeWindowWithLastTab to false? Do we need it and should it be in > that commit or should I revert it to the array of tabs? I have addressed that issue. BTW, having an array of tabs generates a lot of erros like ###!!! [Child][MessageChannel] Error: (msgtype=0x2C0040,name=PBrowser::Msg___delete__) Channel closing: too late to send/recv, messages will be lost ###!!! [Child][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost ###!!! [Child][MessageChannel] Error: (msgtype=0x480108,name=PContent::Msg_AsyncMessage) Channel closing: too late to send/recv, messages will be lost at the end. The current version with one tab still can generate it, but it seems fast enough, so it happens not so often. It also sometimes generates something like the snippet below. I guess it comes from the race condition that the browser is begin closed and some parts are already destroyed and some are still running. ************************* A coding exception was thrown in a Promise resolution callback. See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Pro... Full message: TypeError: this._removingTabs is undefined Full stack: _beginRemoveTab@chrome://browser/content/tabbrowser.xml:2306:13 removeTab@chrome://browser/content/tabbrowser.xml:2209:18 TabAllocator.prototype.releaseTab@resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/bootstrap.js -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/lib/crawler.js:119:5 TabAllocator.prototype.releaseTab/this._tab<@resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/bootstrap.js -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/lib/crawler.js:113:9 Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:937:23 this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:816:7 Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:747:11 this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:779:7 this.PromiseWalker.completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:714:7 TabAllocator.prototype._createTab/</onBrowserInit@resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/bootstrap.js -> jar:file:///tmp/tmpPxNRXz.mozrunner/extensions/abpcrawler@adblockplus.org.xpi!/lib/crawler.js:76:9 *************************
Sorry about taking so long, this review simply got forgotten. If this happens again - please ask me, don't wait a year. https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js#newc... lib/crawler.js:55: }); There is little point pre-allocating a tab, you can just as well have this._tabKeepingWindowAlive and remove that tab in _createTab before resolving. You can also use the same code to get rid of the race condition on release - just set this._tabKeepingWindowAlive instead of removing the last tab. You won't need to mess with the bookkeeping then, tab count is updated either way - currently verifying that the tab count is correct under all circumstances is really complicated. https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js#newc... lib/crawler.js:73: // https://bugzilla.mozilla.org/show_bug.cgi?id=1256602#c1 Please don't use URL-only comments, it should be obvious what this is about. For example: // "Browser:Init" message is sent once the browser is ready, see https://bugzil.la/1256602#c1 Also, this comment is misplaced, should be above the addMessageListener call.
https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js#newc... lib/crawler.js:55: }); On 2016/09/14 15:00:13, Wladimir Palant wrote: > There is little point pre-allocating a tab, you can just as well have > this._tabKeepingWindowAlive and remove that tab in _createTab before resolving. > You can also use the same code to get rid of the race condition on release - > just set this._tabKeepingWindowAlive instead of removing the last tab. You won't > need to mess with the bookkeeping then, tab count is updated either way - > currently verifying that the tab count is correct under all circumstances is > really complicated. Done. https://codereview.adblockplus.org/29338442/diff/29339639/lib/crawler.js#newc... lib/crawler.js:73: // https://bugzilla.mozilla.org/show_bug.cgi?id=1256602#c1 On 2016/09/14 15:00:13, Wladimir Palant wrote: > Please don't use URL-only comments, it should be obvious what this is about. For > example: > > // "Browser:Init" message is sent once the browser is ready, see > https://bugzil.la/1256602#c1 > > Also, this comment is misplaced, should be above the addMessageListener call. Done.
https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:68: return Promise.resolve(tab); I think that rather than introducing a _removeTabKeepingWindowAlive() function I would have added an nested function onReady(): function onReady() { if (this._tabKeepingWindowAlive) { this._browser.removeTab(this._tabKeepingWindowAlive); delete this._tabKeepingWindowAlive; } resolve(tab); } You can then call it when the tab is ready - regardless of whether that happens synchronously. Means moving the tab.linkedBrowser.outerWindowID check inside the promise of course. Up to you whether you want to do it this way but IMHO it makes more sense to keep all related actions in one place. https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:83: Nit: Don't add this extra newline please. https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:112: tab.linkedBrowser.loadURI('about:blank', null, null); What's the point of navigating away if we are ignoring this tab and will close it soon anyway?
https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js File lib/crawler.js (right): https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:68: return Promise.resolve(tab); On 2016/09/16 07:10:37, Wladimir Palant wrote: > I think that rather than introducing a _removeTabKeepingWindowAlive() function I > would have added an nested function onReady(): > > function onReady() > { > if (this._tabKeepingWindowAlive) > { > this._browser.removeTab(this._tabKeepingWindowAlive); > delete this._tabKeepingWindowAlive; > } > resolve(tab); > } > > You can then call it when the tab is ready - regardless of whether that happens > synchronously. Means moving the tab.linkedBrowser.outerWindowID check inside the > promise of course. Up to you whether you want to do it this way but IMHO it > makes more sense to keep all related actions in one place. It's a valid point but I would like to keep it this way because I find it cleaner. There are some tiny things (bigger depth of callbacks calls, the generalization is used merely in two places, it would be good to put the first part of onReady in a separate named function _removeTabKeepingWindowAlive, the heaviness of creation of functions on each call of createTab), whose contribution seems negligible in the current context but I would like to avoid it. In addition, I would like to have a simple API (it's better to have more small functions with less responsibility than a few functions which do a lot), dealing with that keeping alive tab is actually not related to createTab function, so I thought rather about the following _createTab() { const tab = this._allocateTab(); // where _allocateTab is _createTab without _removeTabKeepingWindowAlive. tab.then(() => this._removeTabKeepingWindowAlive()); return tab; } So, finally, for the sake of simplicity, I decided that it would be better to simply put two calls of this._removeTabKeepingWindowAlive() into _createTab. https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:83: On 2016/09/16 07:10:37, Wladimir Palant wrote: > Nit: Don't add this extra newline please. Done. https://codereview.adblockplus.org/29338442/diff/29353170/lib/crawler.js#newc... lib/crawler.js:112: tab.linkedBrowser.loadURI('about:blank', null, null); On 2016/09/16 07:10:37, Wladimir Palant wrote: > What's the point of navigating away if we are ignoring this tab and will close > it soon anyway? I have a version when the crawler is always running (of course with possibility for graceful shutdown) and is waiting for a new tasks. If the tab is neither immediately closed nor navigated away then the web page will be "alive" until arriving of the next task. On practice it means that the web page can be opened for hours between the scheduling of portions of URLs. Some web pages consume a lot of resources (network traffic and leak memory) and I would not like their counters/analytics scripts to send data for a long time. It just puts Firefox in a cleaner state.
LGTM |