Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 29367507: Issue #3595 - Add an actual scheduler; use joined threads for file system

Created:
Dec. 14, 2016, 4:38 p.m. by Eric
Modified:
April 3, 2017, 3:35 p.m.
Reviewers:
sergei, anton, Oleksandr, hub
CC:
Felix Dahlke
Visibility:
Public.

Description

Issue #3595 - Add an actual scheduler; use joined threads for file system Add `SchedulerT`, an actual scheduler, albeit not sophisticated. It still uses threads only once per task and still has no capability for delayed execution, but it does not detach threads. Its destructor joins all pending threads. It contains its own internal thread to perform the join, implemented as class `OperationRunner`. Added unit tests for the scheduler and its components. Add a scheduler instance to `JsEngine`. Convert all file system operations to use this new scheduler. Add a `TearDown()` function the base test class. Call a new function `WaitForQuietScheduler()` on the engine in it; it does not complete until all pending tasks have completed and their threads joined. All file system tests pass without alteration. Moved `StartImmediatelyInSingleUseThread()` out of a rump class `Scheduler`, which was removed. Renamed it `StartImmediatelyInSingleUseDetachedThread` to emphasize that the thread runs detached. -- Depends upon https://codereview.adblockplus.org/29367003/

Patch Set 1 : #

Total comments: 62
Unified diffs Side-by-side diffs Delta from patch set Stats (+490 lines, -39 lines) Patch
M include/AdblockPlus/JsEngine.h View 5 chunks +34 lines, -1 line 10 comments Download
M libadblockplus.gyp View 1 chunk +1 line, -0 lines 0 comments Download
M src/FileSystemJsObject.cpp View 5 chunks +5 lines, -5 lines 0 comments Download
M src/GlobalJsObject.cpp View 2 chunks +4 lines, -3 lines 0 comments Download
M src/JsEngine.cpp View 4 chunks +30 lines, -7 lines 4 comments Download
M src/Scheduler.h View 2 chunks +214 lines, -13 lines 31 comments Download
M src/Scheduler.cpp View 2 chunks +80 lines, -1 line 15 comments Download
M src/WebRequestJsObject.cpp View 1 chunk +1 line, -1 line 0 comments Download
M test/BaseJsTest.h View 1 chunk +2 lines, -8 lines 0 comments Download
M test/BaseJsTest.cpp View 1 chunk +13 lines, -0 lines 0 comments Download
A test/SchedulerTest.cpp View 1 chunk +106 lines, -0 lines 2 comments Download

Messages

Total messages: 6
Eric
This change goes a good way to getting detached threads under control. We have three ...
Dec. 14, 2016, 5:34 p.m. (2016-12-14 17:34:17 UTC) #1
sergei
https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus/JsEngine.h File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus/JsEngine.h#newcode78 include/AdblockPlus/JsEngine.h:78: extern const ImmediateSingleUseThreadType ImmediateSingleUseThread; ///< Dummy constant for scheduling ...
Jan. 20, 2017, 1:08 p.m. (2017-01-20 13:08:14 UTC) #2
sergei
https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#newcode172 src/Scheduler.h:172: * Could you please move that Scheduler with the ...
March 30, 2017, 1:14 p.m. (2017-03-30 13:14:41 UTC) #3
Eric
Added more reviewers. If I haven't added people that ought to see this, let me ...
March 30, 2017, 2:20 p.m. (2017-03-30 14:20:55 UTC) #4
Eric
Addressed Sergei's comments. Usually I get complaints about over-documenting, but working through these responses I ...
March 30, 2017, 5:16 p.m. (2017-03-30 17:16:59 UTC) #5
sergei
April 3, 2017, 3:35 p.m. (2017-04-03 15:35:34 UTC) #6
The functionality I want is
https://gist.github.com/abby-sergz/12a2c1d25fc93c17838c878a11d7fe89, I'm pretty
sure that the commit with it does not require so much changes as in the current
code review, like changes in JsEngine or in some *JsObject.cpp. I think it will
be fair to let you create a codereview with it because you have already
submitted it here, though it should be shaped a bit.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
File include/AdblockPlus/JsEngine.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task,
ImmediateSingleUseThreadType);
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Although it's expected that task can be copied internally, it would be still
> > better to pass it as const reference.
> 
> In the present code, it's passed by value intentionally. We have to invoke the
> copy constructor because we'll eventually run it out of scope, and the
cleanest
> way of doing this is to do it as part of the call.
> 
> In later code, I changed this to an rvalue argument. The reason for that is to
> put responsibility for allocation inside the scheduler instead of relying upon
> its callers to do the right thing. The caller creates an argument and calls it
> with std::move. If I had figured this out before, I would have done it that
way
> from the outset.

Acknowledged.

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp
File src/JsEngine.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne...
src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique
after we upgrade out of VS2012
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > We don't need this comment about std::make_unique
> 
> Is that because we're planning never to upgrade from VS2012?
> 
> Comments like this allow later code maintenance using `grep`.
Maybe, though I think that std::make_shared are much more important than
std::make_unique in general. So, I would rather expect that anyway one will have
to go through the code and simple change all places, including std::make_unique
where it's needed or everywhere.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp
File src/Scheduler.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:50: */
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > That comment is good however I don't think we need it
> 
> Behavior is always clearer *after* you read the comment.
> 
> It's also not a trivial comment. There could be other ways of implementing
> `ThreadMain` and you have to avoid race conditions upon construction, where
this
> class gets a queued task but no notification, which would cause the object to
> freeze up and never run anything.

I'm talking about implementation without race conditions and other bugs.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:75: if (!isRunning)
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Why not to put isRunning into while condition?
> 
> Because it's not correct. It opens up an opportunity for a race condition.

Could you please explain when that race condition occurs?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:86: if (op)
Why not to prevent putting of empty functions in the queue?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:91: catch (std::exception& e)
it generates a warning about unused local variable.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:111: th = std::move(std::thread(f));
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Do we really need to explicitly say std::move here?
> 
> It avoids a second copy, since there's already one in the call by value.
Copy of what? JIC, std::thread cannot be copied.

> 
> I don't think there's a semantic rule in the C++ specification that makes the
> `move` call redundant, but there might be.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h
File src/Scheduler.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:81: * If a user wants to execute an operation in a different
thread,
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > We don't need this part "If a user wants to execute an operation in a
> different
> > thread ...".
> 
> Documenting responsibilities is part of what makes documentation good. The
line
> you want to remove is documenting a responsibility that this class does not
> assume.
> 
> Responsibility allocation is part of the design, so it's got to be part of the
> documentation. For where this started, see
>     https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card
Sorry, but I have not seen on the provisioned link anything explaining the
reason to have such comment.
If you think that this comment should be then where are comments like
- if a user wants to execute an operation immediately ...
- if a user wants to execute an operation in 5 min ...
- if a user wants to execute an operation in 3 min ...
...
- if a user wants to execute an operation and show a green button
...
?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:85: class OperationRunner
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > Why not to call it ActiveObject?
> 
> Because it's not a generic active object. It's tailored for use inside the
> scheduler.

Maybe it's not a generic active object but it is a simple active object which
can be used as a base for any another active object.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:99: std::queue<std::function<void()>> queue;
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > Basically, it's a synchronized queue which has already appeared several
times.
> > So, it would be better to have it as a separate class. However, currently,
> it's
> > not important.
> 
> It's not a separate class because the mutex is used for more than queue
> management. You need atomicity between queue operations and notification, for
> instance. If you separate them, you get race conditions.

Could you please point to that race condition in the example I have posted?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:133: class SingleUseWorker
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > It looks like there was a lot of discussions, finally decided that it's
better
> > if a thread class has no member start/Run/whatever, but here we again
> introduce
> > it...
> 
> It doesn't look from my perspective that there were any discussions at all.
When
> did these purported discussions happen? Why was I not included?
There were a lot of talks regarding designing a thread class in different C++
communities like boost and Qt and maybe during considering proposals for C++11,
strangely I cannot find any link, but one of main conclusions was that it's a
bad idea to have such method like run or start on a thread class because in this
case one can call it several times on the same object. Therefore there is only
one way to start a thread which is by passing a callable to the constructor or
to a run method accepting callable. Of course there is still plenty ways to use
it a wrong way but, because member methods start have proven themselves on
practice, the recommendation for designers of a thread class was to try another
interface and see whether it helps.

> 
> Besides, this isn't a "thread class". If you want a name for it, it's a
"worker
> class" and acts as a resource, which I described in my previous message.
OK, clear, I would propose to introduce that abstraction when it's really
needed.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:172: *
On 2017/03/30 14:20:55, Eric wrote:
> On 2017/03/30 13:14:41, sergei wrote:
> > Could you please move that Scheduler with the functionality described in few
> > lines above into a separate codereview? We will soon need it for android.
> 
> You might know what you want, but this description is far too short to make
> sense of what you are actually asking. I'll also remind you that this review
> depended upon a previous one, and your request might involve pulling in code
> from the previous one.
So, what are the difficulties to create a codereview only with SchedulerT (I
think the name should be changed) and its dependencies?

> 
> > BTW, Scheduler should be even simpler, not a template class and it seems
> instead
> > of SingleUseWorker we can directly use std::thread.
> 
> No, it should not be simpler. If you want me to do this, the template stays.
If
> you understood this code better, you'd see that the template is necessary for
> the scheduler to work as written.
Apparently the template is not necessary in my example, so the template is not
necessary.

> 
> Hard-coding the particular behaviors of std::thread is a bad idea. std::thread
> runs its body upon construction, so if you use that interface, it becomes
> impractical to use a thread pool later because the life-cycle of std::thread
is
> different from what you want with a thread pool. std::thread does not act as a
> manageable resource (nor should it), but a proper scheduler needs to be able
to
> manage its thread resources. Using raw threads diminishes the code that's
> already written.
I think that thread pool should be a completely different class because with the
current approach when a new instance of a worker is created for each task and
with `monitor` the implementation of the thread pool is not going to be simple,
the same for implementation of timers.
BTW, so far there is no need in thread pool, so I would like to ask to not
complicate the code before it's needed.

> 
> SingleUseWorker above is a wrapper around a raw thread that converts it into a
> resource. It's also why it has a Run() method, because it's acting as a
resource
> and not a raw thread. One of the characteristics of a resource is that it
> separates (1) life-cycle, for example, acquire and release from (2) usage,
here
> running a function body. std::thread does not have that kind of interface, so
it
> is *necessary* to add it.
I consider std::thread also as resource and so far the interface of std::thread
is enough.
> 
> ----
> 
> Furthermore, this review was number 2 of 12 (eventually) in a series that
fixed
> all the structural problems with threads and engine allocation, completing all
> the substantive work for both #3595 and #3593 (they are interrelated, which
> isn't obvious). The usage of the Worker template argument is slated for
> modification, which is yet another reason not to use raw threads. Right now,
the
> only usages of a worker resource are `Run()` and `Join()`, but it's slated for
a
> `Cancel()` method.
That's good but why not to introduce that abstraction in a separate commit?
Smaller commits are always better. What concerns the number of reviews, I don't
think it's an issue because anyway these codereviews should be adapted to
whatever state of master, and I think that the easiest way is to simply pick
small portions with simple functionality and rebase the rest afterwards.

> 
> In detail, review 9 of 12 (if I'm counting correctly) allows interruption of
> timer from SetTimeout. This works by changing the executor for timer bodies to
> contain an condition variable. Wake up the condition variable before its timer
> fires and you forestall execution of the body. During development that
executor
> began as a class specific to timers, but the plan is for some of its interface
> to migrate into the worker interface so that the scheduler can interrupt and
> cancel pending work.
I see the point but for present I would like to don't pull that abstraction. I
doubt that the proposed implementation in
https://codereview.adblockplus.org/29370876/ is the right way we should go now
(because there are more elegant approaches) and maybe later (I think about
clearTimeout).
Cancelling functionality in general has a lot of questions, for instance I would
implement it slightly differently, most likely even on another level, and it
highly depends on requirements. So far, I would propose to create corresponding
issues and discuss it there. For example,
- we will definitely need `clearTimeout` however it's not very important right
now because it's never used.
- `XMLHttpRequest.abort()` is similar to the previous one
- so far, we don't use any cancellable API to work with file system, so no need
for such issue
- an issue with arguments against current cancelling of timers when we are
destroying JsEngine and it would be good to see a reasoned proposal
- cancelling of running file system operations and web requests when JsEngine is
being destroyed. Though I think it should be a part of another issue. I don't
mind to discuss it in https://issues.adblockplus.org/ticket/3595, because the
latter is actually a meta issue for subsequent work, although the description
should be adjusted to its generic title, even more the things mentioned in "What
to change" are already done.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:175: *   - Separate schedulers for each `JsEngine` without any
shared facilities.
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > I would remove this string because whether there is a "Separate schedulers
for
> > each `JsEngine`" is not defined yet, it rather looks like an advantage and
> > scheduler should not be aware about any JsEngine.
> 
> Since you've not delved into this code the way I have, I'll forgive this
> misapprehension. If you want to be able to destroy JsEngine instances, engines
> and schedulers must absolutely be aware of each other.
> 
> This line documents the particular relationship between the present scheduler
> and the present engine.
> 
> I don't see any reason as of today why we cannot eventually use a single
> scheduler for multiple engine instances. Schedulers generically require their
> own thread(s), so it would be better eventually to use a singleton instance,
but
> we're not there yet. Thus the documentation.

I think that scheduler should not be so direct member of JsEngine, I would like
to provide users of libadblockplus with possibility to use their own scheduler
for asynchronous operations. It's useful because, e.g. on MS Windows only one
worker thread is enough to support all our asynchronous operations. It means
changing of interface of injected interfaces like WebRequest and FileSystem to
have asynchronous methods and having a separate instances of scheduler in each
default implementation. All asynchronous operations should be cancelled when the
implementation is destroyed by JsEngine.
So, should JsEngine and scheduler be aware about each other? - No. JsEngine
should take into account that it schedules asynchronous tasks through certain
interfaces and that these tasks either completed or stopped and cleaned after
destroying of a corresponding interface. Neither scheduler nor injected
interfaces should know where they are used.

I can expect here some reaction like, "I have sent XX codereviews, why did you
not tell it earlier/stopped me, where are issues with that, etc". Apparently on
my question https://issues.adblockplus.org/ticket/3595#comment:2 instead of
opinions and following discussion, I have received
https://codereview.adblockplus.org/29367003/ and
https://codereview.adblockplus.org/29367507/ which are actually good but should
be slightly adjusted, so I planned they will be changed during the reviewing
process, and then within a pretty shot time span a bunch of following
codereviews. BTW, there are also good things in them but I would like to pick
them gradually.
I think that before doing such much work it would be better to firstly discuss
it.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:176: *   - Working threads are single-use. No provision for a
thread pool.
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > That's also questionable.
> 
> It is absolutely not questionable that this class, as written, uses only
> single-use threads and has no provision for a thread pool. Did you not see the
> "limitations and inefficiencies" point just three lines above.
Why should it have a thread pool? It's similar to the example in another comment
thread, where is a comment that it does not show a big green button?
> 
> Documenting deficiencies is important for code maintenance. It's impractical
> and, more significantly, must slower to try to make everything perfect at each
> iteration. That practice creates huge friction.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:188: std::mutex m;
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > it should be rather std::recursive_mutex because if Worker does not run in a
a
> > thread and instead does the everything immediately, thus in the same thread,
> > there will be an exception.
> 
> No, it should not be recursive mutex. That would be masking an error. There's
no
> correct usage scenario where the mutex would ever be called recursively.
Acknowledged.

> And for
> the record, of all the errors I encountered developing this code, not once did
I
> ever have the kind of exception you're positing.
> 
> Since this is apparently not an obvious point, when you run a task through
this
> scheduler, the second thing it does (after locking the mutex) is to obtain a
new
> worker, and that worker is always a separate thread. See the `emplace`
statement
> in `Run()` below; it invokes the default constructor for `Worker`.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:226: task->Join();
here is a race condition, it can happen that move-assignment in
SingleUseWorker::Run has not happened yet but body has already finished and
monitor has already reached that point and inside SingleUseWorker::Join it
accessed SingleUseWorker::th which is default constructed yet.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:253: void Run(std::function<void()> body)
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > It would be easier to read if argument "body" is renamed to something like a
> > "op" (from operation) or "call".
> 
> I've used "body" consistently for a function that's subject to scheduling and
> management.

I still think that another name is better here, body of what is used here? Such
words like task, operation, function, callable, executable, runnable fit much
better here than body, IMO.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); });
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > I'm not sure that it's a wise decision to assume that Worker runs in some
> > another thread.
> 
> This is not just an assumption. It's a design principle.
> 
> If you don't want a task to be scheduled, don't use the scheduler. If you
> schedule it, it runs in some arbitrary thread different that the one it's
called
> in.
> 
> Perhaps you don't recall, but the high-level goal in using asynchrony at all
is
> to support JavaScript callbacks without re-entering v8. In order to do that,
you
> *must* restart the call stack from the top, and the only mechanism we have
> available for that is a thread. If you were to implement a Worker without a
> thread inside somewhere, it would be incorrect in the only usage we have for
it.
> 
> > In this change it's OK but looking at the final code it seems
> > very questionable.
> 
> Apparently you don't understand how it works, so please indicate what you
don't
> understand so that I can expand upon the documentation.
OK, let's say I want to implement timers using SetTimer from Win API and I'm
allowed to create only one thread for all timers per instance of Scheduler. What
will be the implementation of template class Worker?
JIC, It's merely a theoretical question, I'm not going to use Win API for
timers.

> 
> Ever since this code was working, I've had no problems with dangling threads
for
> tasks run through this scheduler. The mechanism here is correct, reliable, and
> efficient.
> 
> > So, I  would rather have it changed here, in this codereview.
> > "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather
some
> > kind of a next function which is executed after worker has finished, which
> > happens not exactly right after a call of "body".
> 
> That's the way the code works already. `JoinAndRemove` is called within the
> monitor thread and not within whatever thread is inside the worker. What does
> happen immediately upon completion of the body is enqueuing a `JoinAndRemove`
> operation onto the monitor.
> 
> Also note that the scheduler takes full responsibility for the entire life
cycle
> of its tasks. It starts them up initially and detects when they're finished.
> Workers have no awareness that they're being used by schedulers, so the
> scheduler does not have to present any kind of interface to the worker. This
is
> far simpler than any alternative where the worker notifies the scheduler that
> it's finished.
> 
> > It's clear if consider a case
> > when we use facilities for asynchronous IO.
> 
> This mechanism works just fine for asynchronous I/O. I rewrote all of it to
use
> this mechanism. You say that you've read my code, but if you have, statement
> like this lead me believe you did so at best only cursorily or didn't
> understand.
What I really don't understand is the reason you are always repeating that I
didn't understand something. Please stay professional.

Powered by Google App Engine
This is Rietveld