Issue 29367507: Issue #3595 - Add an actual scheduler; use joined threads for file system

Created:
Dec. 14, 2016, 4:38 p.m. by Eric

Modified:
April 3, 2017, 3:35 p.m.

Reviewers:
sergei, anton, Oleksandr, hub

CC:
Felix Dahlke

Visibility:
Public.

Description

Issue #3595 - Add an actual scheduler; use joined threads for file system Add `SchedulerT`, an actual scheduler, albeit not sophisticated. It still uses threads only once per task and still has no capability for delayed execution, but it does not detach threads. Its destructor joins all pending threads. It contains its own internal thread to perform the join, implemented as class `OperationRunner`. Added unit tests for the scheduler and its components. Add a scheduler instance to `JsEngine`. Convert all file system operations to use this new scheduler. Add a `TearDown()` function the base test class. Call a new function `WaitForQuietScheduler()` on the engine in it; it does not complete until all pending tasks have completed and their threads joined. All file system tests pass without alteration. Moved `StartImmediatelyInSingleUseThread()` out of a rump class `Scheduler`, which was removed. Renamed it `StartImmediatelyInSingleUseDetachedThread` to emphasize that the thread runs detached. -- Depends upon https://codereview.adblockplus.org/29367003/

Patch Set 1 : #

Total comments: 62

Created: Dec. 14, 2016, 5:38 p.m.

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+490 lines, -39 lines)			Patch
M	include/AdblockPlus/JsEngine.h	View	5 chunks	+34 lines, -1 line	10 comments	Download
M	libadblockplus.gyp	View	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/FileSystemJsObject.cpp	View	5 chunks	+5 lines, -5 lines	0 comments	Download
M	src/GlobalJsObject.cpp	View	2 chunks	+4 lines, -3 lines	0 comments	Download
M	src/JsEngine.cpp	View	4 chunks	+30 lines, -7 lines	4 comments	Download
M	src/Scheduler.h	View	2 chunks	+214 lines, -13 lines	31 comments	Download
M	src/Scheduler.cpp	View	2 chunks	+80 lines, -1 line	15 comments	Download
M	src/WebRequestJsObject.cpp	View	1 chunk	+1 line, -1 line	0 comments	Download
M	test/BaseJsTest.h	View	1 chunk	+2 lines, -8 lines	0 comments	Download
M	test/BaseJsTest.cpp	View	1 chunk	+13 lines, -0 lines	0 comments	Download
A	test/SchedulerTest.cpp	View	1 chunk	+106 lines, -0 lines	2 comments	Download

Messages

Total messages: 6

Expand All Messages | Collapse All Messages

Eric

This change goes a good way to getting detached threads under control. We have three ...

Dec. 14, 2016, 5:34 p.m. (2016-12-14 17:34:17 UTC) #1

This change goes a good way to getting detached threads under control. We have
three classes of detached threads at present: (1) file system, (2) web request,
(3) timeout. This changes addresses only item (1). 

The unit tests for the file system worked the first time after I cut over to the
new scheduler. Furthermore, I verified in the debugger that there were no
dangling threads just to make sure. I have high confidence this change causes no
problems.

The web requests didn't work the first time, though, so rather than delay
further I'm leaving them out of the scope of this change. Our implementation
calls back into the engine in its detached thread, which is more complicated
that what the file system operations do. I expect that the scheduler in this
change will work for web requests after some reworking of that operation, and I
also expect not to need to change the scheduler for this task.

Timeout, however, is a different story. When I used the new scheduler for
timeout, the unit tests effectively hanged. adblockpluscore sets up some
timeouts that are an hour long. Because the present timeout implementation uses
a sleeping thread (ugh!), there's no way of stopping the timeout. We can't use
an "immediate execution" policy for this and will need to augment the scheduler.

Issue #3595 mentions using `libuv` for asynchronous I/O. It became apparent to
me writing this that the structure of the code isn't ready for it yet. For the
file system operations, we have a callback to execute. Regardless of how we do
I/O, the callback needs a thread of execution, which would be a worker thread.
In order to use asynchronous I/O, some thread needs to receive the completion
notice, which would be a thread in the scheduler. The legacy code has only
detached (bad!) worker threads and no scheduler thread. This change introduces a
basic worker thread and a proper scheduler thread. It should be considered part
of the preparation needed to put in asynchronous I/O.

https://codereview.adblockplus.org/29367507/diff/29367520/include/AdblockPlus...
File include/AdblockPlus/JsEngine.h (right):

https://codereview.adblockplus.org/29367507/diff/29367520/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task,
ImmediateSingleUseThreadType);
Dummy argument anticipates needing delayed execution in order to implement
timeout.

sergei

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus/JsEngine.h File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus/JsEngine.h#newcode78 include/AdblockPlus/JsEngine.h:78: extern const ImmediateSingleUseThreadType ImmediateSingleUseThread; ///< Dummy constant for scheduling ...

Jan. 20, 2017, 1:08 p.m. (2017-01-20 13:08:14 UTC) #2

sergei

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#newcode172 src/Scheduler.h:172: * Could you please move that Scheduler with the ...

March 30, 2017, 1:14 p.m. (2017-03-30 13:14:41 UTC) #3

Eric

Added more reviewers. If I haven't added people that ought to see this, let me ...

March 30, 2017, 2:20 p.m. (2017-03-30 14:20:55 UTC) #4

Added more reviewers. If I haven't added people that ought to see this, let me
know. Moved Felix to Cc (where I thought I had put him originally, but perhaps
not).

I'll address some of the prior technical comments in another message.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h
File src/Scheduler.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:172: *
On 2017/03/30 13:14:41, sergei wrote:
> Could you please move that Scheduler with the functionality described in few
> lines above into a separate codereview? We will soon need it for android.

You might know what you want, but this description is far too short to make
sense of what you are actually asking. I'll also remind you that this review
depended upon a previous one, and your request might involve pulling in code
from the previous one.

> BTW, Scheduler should be even simpler, not a template class and it seems
instead
> of SingleUseWorker we can directly use std::thread.

No, it should not be simpler. If you want me to do this, the template stays. If
you understood this code better, you'd see that the template is necessary for
the scheduler to work as written.

Hard-coding the particular behaviors of std::thread is a bad idea. std::thread
runs its body upon construction, so if you use that interface, it becomes
impractical to use a thread pool later because the life-cycle of std::thread is
different from what you want with a thread pool. std::thread does not act as a
manageable resource (nor should it), but a proper scheduler needs to be able to
manage its thread resources. Using raw threads diminishes the code that's
already written.

SingleUseWorker above is a wrapper around a raw thread that converts it into a
resource. It's also why it has a Run() method, because it's acting as a resource
and not a raw thread. One of the characteristics of a resource is that it
separates (1) life-cycle, for example, acquire and release from (2) usage, here
running a function body. std::thread does not have that kind of interface, so it
is *necessary* to add it.

----

Furthermore, this review was number 2 of 12 (eventually) in a series that fixed
all the structural problems with threads and engine allocation, completing all
the substantive work for both #3595 and #3593 (they are interrelated, which
isn't obvious). The usage of the Worker template argument is slated for
modification, which is yet another reason not to use raw threads. Right now, the
only usages of a worker resource are `Run()` and `Join()`, but it's slated for a
`Cancel()` method.

In detail, review 9 of 12 (if I'm counting correctly) allows interruption of
timer from SetTimeout. This works by changing the executor for timer bodies to
contain an condition variable. Wake up the condition variable before its timer
fires and you forestall execution of the body. During development that executor
began as a class specific to timers, but the plan is for some of its interface
to migrate into the worker interface so that the scheduler can interrupt and
cancel pending work.

Eric

Addressed Sergei's comments. Usually I get complaints about over-documenting, but working through these responses I ...

March 30, 2017, 5:16 p.m. (2017-03-30 17:16:59 UTC) #5

Addressed Sergei's comments.

Usually I get complaints about over-documenting, but working through these
responses I think there may not be enough here. Please let me know if anything
remains unclear. This code is subtle and there are lots of small possible
modifications that would lead instantly to defects.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
File include/AdblockPlus/JsEngine.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:141: * - timing policy: start execution
immediately.
On 2017/01/20 13:08:12, sergei wrote:
> I would remove the comment about timing policy because no one can guarantee
that
> execution will be started immediately.

"Immediately" simply means "at the present time, without delay".

I would have thought it was obvious that this was not an assertion about the
real-time behavior of execution, since we are not in an RTOS (real-time
operating system) environment, but apparently not.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:147: *   The schedule policy--create a new thread
and discard it afterwards.
On 2017/01/20 13:08:12, sergei wrote:
> I would remove that parameter because there is no choice and this parameters
has
> no influence.

This parameter is a marker, acting both as documentation and disambiguation by
function signature. There's not a second version of `Schedule()` in the present
code, but there has been in development code. When I've rewritten this function,
I've created a second marker class and another implementation and then moved
over callers from old to new incrementally, which greatly speeds development.
And because it's a parameter, not a new function name, it play well with
template instantiation.

Once the scheduler can be considered finished, then it will be time to get rid
of this argument. Even in the initial series of 12 review, it wasn't time yet.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task,
ImmediateSingleUseThreadType);
On 2017/01/20 13:08:13, sergei wrote:
> Although it's expected that task can be copied internally, it would be still
> better to pass it as const reference.

In the present code, it's passed by value intentionally. We have to invoke the
copy constructor because we'll eventually run it out of scope, and the cleanest
way of doing this is to do it as part of the call.

In later code, I changed this to an rvalue argument. The reason for that is to
put responsibility for allocation inside the scheduler instead of relying upon
its callers to do the right thing. The caller creates an argument and calls it
with std::move. If I had figured this out before, I would have done it that way
from the outset.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:291: // Forward declaration for PImpl idiom
On 2017/01/20 13:08:13, sergei wrote:
> We don't need this comment.

Yes, we do. Dangling declarations with zero documentation are always confusing.

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp
File src/JsEngine.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne...
src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique
after we upgrade out of VS2012
On 2017/01/20 13:08:13, sergei wrote:
> We don't need this comment about std::make_unique

Is that because we're planning never to upgrade from VS2012?

Comments like this allow later code maintenance using `grep`.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp
File src/Scheduler.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:50: */
On 2017/01/20 13:08:13, sergei wrote:
> That comment is good however I don't think we need it

Behavior is always clearer *after* you read the comment.

It's also not a trivial comment. There could be other ways of implementing
`ThreadMain` and you have to avoid race conditions upon construction, where this
class gets a queued task but no notification, which would cause the object to
freeze up and never run anything.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:75: if (!isRunning)
On 2017/01/20 13:08:13, sergei wrote:
> Why not to put isRunning into while condition?

Because it's not correct. It opens up an opportunity for a race condition.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:83: // Assert `m` is unlocked
On 2017/01/20 13:08:13, sergei wrote:
> I would remove that comment.

Nope; it stays. Tracking when the mutex is locked and when it's unlocked is
critical to auditing this code. It wasn't there when I first drafted the code.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:103: if (th.joinable())
On 2017/01/20 13:08:13, sergei wrote:
> This implementation with if (th.joinable()) should be in Join method.

Yes, you need to be able to join the thread either explicitly or in the
destructor. I fixed this in a later revision by moving this code to Join() and
calling Join() here in the destructor.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:111: th = std::move(std::thread(f));
On 2017/01/20 13:08:13, sergei wrote:
> Do we really need to explicitly say std::move here?

It avoids a second copy, since there's already one in the call by value. 

I don't think there's a semantic rule in the C++ specification that makes the
`move` call redundant, but there might be.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h
File src/Scheduler.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:81: * If a user wants to execute an operation in a different
thread,
On 2017/01/20 13:08:14, sergei wrote:
> We don't need this part "If a user wants to execute an operation in a
different
> thread ...".

Documenting responsibilities is part of what makes documentation good. The line
you want to remove is documenting a responsibility that this class does not
assume.

Responsibility allocation is part of the design, so it's got to be part of the
documentation. For where this started, see
    https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:85: class OperationRunner
On 2017/01/20 13:08:14, sergei wrote:
> Why not to call it ActiveObject?

Because it's not a generic active object. It's tailored for use inside the
scheduler.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:99: std::queue<std::function<void()>> queue;
On 2017/01/20 13:08:14, sergei wrote:
> Basically, it's a synchronized queue which has already appeared several times.
> So, it would be better to have it as a separate class. However, currently,
it's
> not important.

It's not a separate class because the mutex is used for more than queue
management. You need atomicity between queue operations and notification, for
instance. If you separate them, you get race conditions.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:133: class SingleUseWorker
On 2017/01/20 13:08:13, sergei wrote:
> It looks like there was a lot of discussions, finally decided that it's better
> if a thread class has no member start/Run/whatever, but here we again
introduce
> it...

It doesn't look from my perspective that there were any discussions at all. When
did these purported discussions happen? Why was I not included?

Besides, this isn't a "thread class". If you want a name for it, it's a "worker
class" and acts as a resource, which I described in my previous message.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:175: *   - Separate schedulers for each `JsEngine` without any
shared facilities.
On 2017/01/20 13:08:14, sergei wrote:
> I would remove this string because whether there is a "Separate schedulers for
> each `JsEngine`" is not defined yet, it rather looks like an advantage and
> scheduler should not be aware about any JsEngine.

Since you've not delved into this code the way I have, I'll forgive this
misapprehension. If you want to be able to destroy JsEngine instances, engines
and schedulers must absolutely be aware of each other.

This line documents the particular relationship between the present scheduler
and the present engine.

I don't see any reason as of today why we cannot eventually use a single
scheduler for multiple engine instances. Schedulers generically require their
own thread(s), so it would be better eventually to use a singleton instance, but
we're not there yet. Thus the documentation.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:176: *   - Working threads are single-use. No provision for a
thread pool.
On 2017/01/20 13:08:13, sergei wrote:
> That's also questionable.

It is absolutely not questionable that this class, as written, uses only
single-use threads and has no provision for a thread pool. Did you not see the
"limitations and inefficiencies" point just three lines above.

Documenting deficiencies is important for code maintenance. It's impractical
and, more significantly, must slower to try to make everything perfect at each
iteration. That practice creates huge friction.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:188: std::mutex m;
On 2017/01/20 13:08:14, sergei wrote:
> it should be rather std::recursive_mutex because if Worker does not run in a a
> thread and instead does the everything immediately, thus in the same thread,
> there will be an exception.

No, it should not be recursive mutex. That would be masking an error. There's no
correct usage scenario where the mutex would ever be called recursively. And for
the record, of all the errors I encountered developing this code, not once did I
ever have the kind of exception you're positing.

Since this is apparently not an obvious point, when you run a task through this
scheduler, the second thing it does (after locking the mutex) is to obtain a new
worker, and that worker is always a separate thread. See the `emplace` statement
in `Run()` below; it invokes the default constructor for `Worker`.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:253: void Run(std::function<void()> body)
On 2017/01/20 13:08:13, sergei wrote:
> It would be easier to read if argument "body" is renamed to something like a
> "op" (from operation) or "call".

I've used "body" consistently for a function that's subject to scheduling and
management.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); });
On 2017/01/20 13:08:13, sergei wrote:
> I'm not sure that it's a wise decision to assume that Worker runs in some
> another thread.

This is not just an assumption. It's a design principle.

If you don't want a task to be scheduled, don't use the scheduler. If you
schedule it, it runs in some arbitrary thread different that the one it's called
in.

Perhaps you don't recall, but the high-level goal in using asynchrony at all is
to support JavaScript callbacks without re-entering v8. In order to do that, you
*must* restart the call stack from the top, and the only mechanism we have
available for that is a thread. If you were to implement a Worker without a
thread inside somewhere, it would be incorrect in the only usage we have for it.

> In this change it's OK but looking at the final code it seems
> very questionable.

Apparently you don't understand how it works, so please indicate what you don't
understand so that I can expand upon the documentation.

Ever since this code was working, I've had no problems with dangling threads for
tasks run through this scheduler. The mechanism here is correct, reliable, and
efficient.

> So, I  would rather have it changed here, in this codereview.
> "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather some
> kind of a next function which is executed after worker has finished, which
> happens not exactly right after a call of "body".

That's the way the code works already. `JoinAndRemove` is called within the
monitor thread and not within whatever thread is inside the worker. What does
happen immediately upon completion of the body is enqueuing a `JoinAndRemove`
operation onto the monitor.

Also note that the scheduler takes full responsibility for the entire life cycle
of its tasks. It starts them up initially and detects when they're finished.
Workers have no awareness that they're being used by schedulers, so the
scheduler does not have to present any kind of interface to the worker. This is
far simpler than any alternative where the worker notifies the scheduler that
it's finished.

> It's clear if consider a case
> when we use facilities for asynchronous IO.

This mechanism works just fine for asynchronous I/O. I rewrote all of it to use
this mechanism. You say that you've read my code, but if you have, statement
like this lead me believe you did so at best only cursorily or didn't
understand.

https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest.cpp
File test/SchedulerTest.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest....
test/SchedulerTest.cpp:101: auto g = f;
On 2017/01/20 13:08:14, sergei wrote:
> why do we need g?

It's explicitly a copy of f. Perhaps unnecessary, but it means we can be certain
that we're running 100 different function objects once each rather than a single
function object 100 times. It's written this way so as not to rely on particular
calling conventions of Run(). This is a stress test for `Run()`, so feeding its
argument distinct objects is a bit better.

sergei

April 3, 2017, 3:35 p.m. (2017-04-03 15:35:34 UTC) #6

The functionality I want is
https://gist.github.com/abby-sergz/12a2c1d25fc93c17838c878a11d7fe89, I'm pretty
sure that the commit with it does not require so much changes as in the current
code review, like changes in JsEngine or in some *JsObject.cpp. I think it will
be fair to let you create a codereview with it because you have already
submitted it here, though it should be shaped a bit.

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
File include/AdblockPlus/JsEngine.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus...
include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task,
ImmediateSingleUseThreadType);
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Although it's expected that task can be copied internally, it would be still
> > better to pass it as const reference.
> 
> In the present code, it's passed by value intentionally. We have to invoke the
> copy constructor because we'll eventually run it out of scope, and the
cleanest
> way of doing this is to do it as part of the call.
> 
> In later code, I changed this to an rvalue argument. The reason for that is to
> put responsibility for allocation inside the scheduler instead of relying upon
> its callers to do the right thing. The caller creates an argument and calls it
> with std::move. If I had figured this out before, I would have done it that
way
> from the outset.

Acknowledged.

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp
File src/JsEngine.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne...
src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique
after we upgrade out of VS2012
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > We don't need this comment about std::make_unique
> 
> Is that because we're planning never to upgrade from VS2012?
> 
> Comments like this allow later code maintenance using `grep`.
Maybe, though I think that std::make_shared are much more important than
std::make_unique in general. So, I would rather expect that anyway one will have
to go through the code and simple change all places, including std::make_unique
where it's needed or everywhere.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp
File src/Scheduler.cpp (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:50: */
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > That comment is good however I don't think we need it
> 
> Behavior is always clearer *after* you read the comment.
> 
> It's also not a trivial comment. There could be other ways of implementing
> `ThreadMain` and you have to avoid race conditions upon construction, where
this
> class gets a queued task but no notification, which would cause the object to
> freeze up and never run anything.

I'm talking about implementation without race conditions and other bugs.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:75: if (!isRunning)
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Why not to put isRunning into while condition?
> 
> Because it's not correct. It opens up an opportunity for a race condition.

Could you please explain when that race condition occurs?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:86: if (op)
Why not to prevent putting of empty functions in the queue?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:91: catch (std::exception& e)
it generates a warning about unused local variable.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n...
src/Scheduler.cpp:111: th = std::move(std::thread(f));
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > Do we really need to explicitly say std::move here?
> 
> It avoids a second copy, since there's already one in the call by value.
Copy of what? JIC, std::thread cannot be copied.

> 
> I don't think there's a semantic rule in the C++ specification that makes the
> `move` call redundant, but there might be.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h
File src/Scheduler.h (right):

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:81: * If a user wants to execute an operation in a different
thread,
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > We don't need this part "If a user wants to execute an operation in a
> different
> > thread ...".
> 
> Documenting responsibilities is part of what makes documentation good. The
line
> you want to remove is documenting a responsibility that this class does not
> assume.
> 
> Responsibility allocation is part of the design, so it's got to be part of the
> documentation. For where this started, see
>     https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card
Sorry, but I have not seen on the provisioned link anything explaining the
reason to have such comment.
If you think that this comment should be then where are comments like
- if a user wants to execute an operation immediately ...
- if a user wants to execute an operation in 5 min ...
- if a user wants to execute an operation in 3 min ...
...
- if a user wants to execute an operation and show a green button
...
?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:85: class OperationRunner
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > Why not to call it ActiveObject?
> 
> Because it's not a generic active object. It's tailored for use inside the
> scheduler.

Maybe it's not a generic active object but it is a simple active object which
can be used as a base for any another active object.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:99: std::queue<std::function<void()>> queue;
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > Basically, it's a synchronized queue which has already appeared several
times.
> > So, it would be better to have it as a separate class. However, currently,
> it's
> > not important.
> 
> It's not a separate class because the mutex is used for more than queue
> management. You need atomicity between queue operations and notification, for
> instance. If you separate them, you get race conditions.

Could you please point to that race condition in the example I have posted?

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:133: class SingleUseWorker
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > It looks like there was a lot of discussions, finally decided that it's
better
> > if a thread class has no member start/Run/whatever, but here we again
> introduce
> > it...
> 
> It doesn't look from my perspective that there were any discussions at all.
When
> did these purported discussions happen? Why was I not included?
There were a lot of talks regarding designing a thread class in different C++
communities like boost and Qt and maybe during considering proposals for C++11,
strangely I cannot find any link, but one of main conclusions was that it's a
bad idea to have such method like run or start on a thread class because in this
case one can call it several times on the same object. Therefore there is only
one way to start a thread which is by passing a callable to the constructor or
to a run method accepting callable. Of course there is still plenty ways to use
it a wrong way but, because member methods start have proven themselves on
practice, the recommendation for designers of a thread class was to try another
interface and see whether it helps.

> 
> Besides, this isn't a "thread class". If you want a name for it, it's a
"worker
> class" and acts as a resource, which I described in my previous message.
OK, clear, I would propose to introduce that abstraction when it's really
needed.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:172: *
On 2017/03/30 14:20:55, Eric wrote:
> On 2017/03/30 13:14:41, sergei wrote:
> > Could you please move that Scheduler with the functionality described in few
> > lines above into a separate codereview? We will soon need it for android.
> 
> You might know what you want, but this description is far too short to make
> sense of what you are actually asking. I'll also remind you that this review
> depended upon a previous one, and your request might involve pulling in code
> from the previous one.
So, what are the difficulties to create a codereview only with SchedulerT (I
think the name should be changed) and its dependencies?

> 
> > BTW, Scheduler should be even simpler, not a template class and it seems
> instead
> > of SingleUseWorker we can directly use std::thread.
> 
> No, it should not be simpler. If you want me to do this, the template stays.
If
> you understood this code better, you'd see that the template is necessary for
> the scheduler to work as written.
Apparently the template is not necessary in my example, so the template is not
necessary.

> 
> Hard-coding the particular behaviors of std::thread is a bad idea. std::thread
> runs its body upon construction, so if you use that interface, it becomes
> impractical to use a thread pool later because the life-cycle of std::thread
is
> different from what you want with a thread pool. std::thread does not act as a
> manageable resource (nor should it), but a proper scheduler needs to be able
to
> manage its thread resources. Using raw threads diminishes the code that's
> already written.
I think that thread pool should be a completely different class because with the
current approach when a new instance of a worker is created for each task and
with `monitor` the implementation of the thread pool is not going to be simple,
the same for implementation of timers.
BTW, so far there is no need in thread pool, so I would like to ask to not
complicate the code before it's needed.

> 
> SingleUseWorker above is a wrapper around a raw thread that converts it into a
> resource. It's also why it has a Run() method, because it's acting as a
resource
> and not a raw thread. One of the characteristics of a resource is that it
> separates (1) life-cycle, for example, acquire and release from (2) usage,
here
> running a function body. std::thread does not have that kind of interface, so
it
> is *necessary* to add it.
I consider std::thread also as resource and so far the interface of std::thread
is enough.
> 
> ----
> 
> Furthermore, this review was number 2 of 12 (eventually) in a series that
fixed
> all the structural problems with threads and engine allocation, completing all
> the substantive work for both #3595 and #3593 (they are interrelated, which
> isn't obvious). The usage of the Worker template argument is slated for
> modification, which is yet another reason not to use raw threads. Right now,
the
> only usages of a worker resource are `Run()` and `Join()`, but it's slated for
a
> `Cancel()` method.
That's good but why not to introduce that abstraction in a separate commit?
Smaller commits are always better. What concerns the number of reviews, I don't
think it's an issue because anyway these codereviews should be adapted to
whatever state of master, and I think that the easiest way is to simply pick
small portions with simple functionality and rebase the rest afterwards.

> 
> In detail, review 9 of 12 (if I'm counting correctly) allows interruption of
> timer from SetTimeout. This works by changing the executor for timer bodies to
> contain an condition variable. Wake up the condition variable before its timer
> fires and you forestall execution of the body. During development that
executor
> began as a class specific to timers, but the plan is for some of its interface
> to migrate into the worker interface so that the scheduler can interrupt and
> cancel pending work.
I see the point but for present I would like to don't pull that abstraction. I
doubt that the proposed implementation in
https://codereview.adblockplus.org/29370876/ is the right way we should go now
(because there are more elegant approaches) and maybe later (I think about
clearTimeout).
Cancelling functionality in general has a lot of questions, for instance I would
implement it slightly differently, most likely even on another level, and it
highly depends on requirements. So far, I would propose to create corresponding
issues and discuss it there. For example,
- we will definitely need `clearTimeout` however it's not very important right
now because it's never used.
- `XMLHttpRequest.abort()` is similar to the previous one
- so far, we don't use any cancellable API to work with file system, so no need
for such issue
- an issue with arguments against current cancelling of timers when we are
destroying JsEngine and it would be good to see a reasoned proposal
- cancelling of running file system operations and web requests when JsEngine is
being destroyed. Though I think it should be a part of another issue. I don't
mind to discuss it in https://issues.adblockplus.org/ticket/3595, because the
latter is actually a meta issue for subsequent work, although the description
should be adjusted to its generic title, even more the things mentioned in "What
to change" are already done.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:175: *   - Separate schedulers for each `JsEngine` without any
shared facilities.
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > I would remove this string because whether there is a "Separate schedulers
for
> > each `JsEngine`" is not defined yet, it rather looks like an advantage and
> > scheduler should not be aware about any JsEngine.
> 
> Since you've not delved into this code the way I have, I'll forgive this
> misapprehension. If you want to be able to destroy JsEngine instances, engines
> and schedulers must absolutely be aware of each other.
> 
> This line documents the particular relationship between the present scheduler
> and the present engine.
> 
> I don't see any reason as of today why we cannot eventually use a single
> scheduler for multiple engine instances. Schedulers generically require their
> own thread(s), so it would be better eventually to use a singleton instance,
but
> we're not there yet. Thus the documentation.

I think that scheduler should not be so direct member of JsEngine, I would like
to provide users of libadblockplus with possibility to use their own scheduler
for asynchronous operations. It's useful because, e.g. on MS Windows only one
worker thread is enough to support all our asynchronous operations. It means
changing of interface of injected interfaces like WebRequest and FileSystem to
have asynchronous methods and having a separate instances of scheduler in each
default implementation. All asynchronous operations should be cancelled when the
implementation is destroyed by JsEngine.
So, should JsEngine and scheduler be aware about each other? - No. JsEngine
should take into account that it schedules asynchronous tasks through certain
interfaces and that these tasks either completed or stopped and cleaned after
destroying of a corresponding interface. Neither scheduler nor injected
interfaces should know where they are used.

I can expect here some reaction like, "I have sent XX codereviews, why did you
not tell it earlier/stopped me, where are issues with that, etc". Apparently on
my question https://issues.adblockplus.org/ticket/3595#comment:2 instead of
opinions and following discussion, I have received
https://codereview.adblockplus.org/29367003/ and
https://codereview.adblockplus.org/29367507/ which are actually good but should
be slightly adjusted, so I planned they will be changed during the reviewing
process, and then within a pretty shot time span a bunch of following
codereviews. BTW, there are also good things in them but I would like to pick
them gradually.
I think that before doing such much work it would be better to firstly discuss
it.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:176: *   - Working threads are single-use. No provision for a
thread pool.
On 2017/03/30 17:16:59, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > That's also questionable.
> 
> It is absolutely not questionable that this class, as written, uses only
> single-use threads and has no provision for a thread pool. Did you not see the
> "limitations and inefficiencies" point just three lines above.
Why should it have a thread pool? It's similar to the example in another comment
thread, where is a comment that it does not show a big green button?
> 
> Documenting deficiencies is important for code maintenance. It's impractical
> and, more significantly, must slower to try to make everything perfect at each
> iteration. That practice creates huge friction.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:188: std::mutex m;
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:14, sergei wrote:
> > it should be rather std::recursive_mutex because if Worker does not run in a
a
> > thread and instead does the everything immediately, thus in the same thread,
> > there will be an exception.
> 
> No, it should not be recursive mutex. That would be masking an error. There's
no
> correct usage scenario where the mutex would ever be called recursively.
Acknowledged.

> And for
> the record, of all the errors I encountered developing this code, not once did
I
> ever have the kind of exception you're positing.
> 
> Since this is apparently not an obvious point, when you run a task through
this
> scheduler, the second thing it does (after locking the mutex) is to obtain a
new
> worker, and that worker is always a separate thread. See the `emplace`
statement
> in `Run()` below; it invokes the default constructor for `Worker`.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:226: task->Join();
here is a race condition, it can happen that move-assignment in
SingleUseWorker::Run has not happened yet but body has already finished and
monitor has already reached that point and inside SingleUseWorker::Join it
accessed SingleUseWorker::th which is default constructed yet.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:253: void Run(std::function<void()> body)
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > It would be easier to read if argument "body" is renamed to something like a
> > "op" (from operation) or "call".
> 
> I've used "body" consistently for a function that's subject to scheduling and
> management.

I still think that another name is better here, body of what is used here? Such
words like task, operation, function, callable, executable, runnable fit much
better here than body, IMO.

https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new...
src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); });
On 2017/03/30 17:16:58, Eric wrote:
> On 2017/01/20 13:08:13, sergei wrote:
> > I'm not sure that it's a wise decision to assume that Worker runs in some
> > another thread.
> 
> This is not just an assumption. It's a design principle.
> 
> If you don't want a task to be scheduled, don't use the scheduler. If you
> schedule it, it runs in some arbitrary thread different that the one it's
called
> in.
> 
> Perhaps you don't recall, but the high-level goal in using asynchrony at all
is
> to support JavaScript callbacks without re-entering v8. In order to do that,
you
> *must* restart the call stack from the top, and the only mechanism we have
> available for that is a thread. If you were to implement a Worker without a
> thread inside somewhere, it would be incorrect in the only usage we have for
it.
> 
> > In this change it's OK but looking at the final code it seems
> > very questionable.
> 
> Apparently you don't understand how it works, so please indicate what you
don't
> understand so that I can expand upon the documentation.
OK, let's say I want to implement timers using SetTimer from Win API and I'm
allowed to create only one thread for all timers per instance of Scheduler. What
will be the implementation of template class Worker?
JIC, It's merely a theoretical question, I'm not going to use Win API for
timers.

> 
> Ever since this code was working, I've had no problems with dangling threads
for
> tasks run through this scheduler. The mechanism here is correct, reliable, and
> efficient.
> 
> > So, I  would rather have it changed here, in this codereview.
> > "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather
some
> > kind of a next function which is executed after worker has finished, which
> > happens not exactly right after a call of "body".
> 
> That's the way the code works already. `JoinAndRemove` is called within the
> monitor thread and not within whatever thread is inside the worker. What does
> happen immediately upon completion of the body is enqueuing a `JoinAndRemove`
> operation onto the monitor.
> 
> Also note that the scheduler takes full responsibility for the entire life
cycle
> of its tasks. It starts them up initially and detects when they're finished.
> Workers have no awareness that they're being used by schedulers, so the
> scheduler does not have to present any kind of interface to the worker. This
is
> far simpler than any alternative where the worker notifies the scheduler that
> it's finished.
> 
> > It's clear if consider a case
> > when we use facilities for asynchronous IO.
> 
> This mechanism works just fine for asynchronous I/O. I rewrote all of it to
use
> this mechanism. You say that you've read my code, but if you have, statement
> like this lead me believe you did so at best only cursorily or didn't
> understand.
What I really don't understand is the reason you are always repeating that I
didn't understand something. Please stay professional.

Expand All Messages | Collapse All Messages