|
|
DescriptionIssue #3595 - Add an actual scheduler; use joined threads for file system
Add `SchedulerT`, an actual scheduler, albeit not sophisticated. It still uses
threads only once per task and still has no capability for delayed execution,
but it does not detach threads. Its destructor joins all pending threads. It
contains its own internal thread to perform the join, implemented as class
`OperationRunner`. Added unit tests for the scheduler and its components.
Add a scheduler instance to `JsEngine`. Convert all file system operations to
use this new scheduler.
Add a `TearDown()` function the base test class. Call a new function
`WaitForQuietScheduler()` on the engine in it; it does not complete until all
pending tasks have completed and their threads joined. All file system tests
pass without alteration.
Moved `StartImmediatelyInSingleUseThread()` out of a rump class `Scheduler`,
which was removed. Renamed it `StartImmediatelyInSingleUseDetachedThread` to
emphasize that the thread runs detached.
--
Depends upon https://codereview.adblockplus.org/29367003/
Patch Set 1 : #
Total comments: 62
MessagesTotal messages: 6
This change goes a good way to getting detached threads under control. We have three classes of detached threads at present: (1) file system, (2) web request, (3) timeout. This changes addresses only item (1). The unit tests for the file system worked the first time after I cut over to the new scheduler. Furthermore, I verified in the debugger that there were no dangling threads just to make sure. I have high confidence this change causes no problems. The web requests didn't work the first time, though, so rather than delay further I'm leaving them out of the scope of this change. Our implementation calls back into the engine in its detached thread, which is more complicated that what the file system operations do. I expect that the scheduler in this change will work for web requests after some reworking of that operation, and I also expect not to need to change the scheduler for this task. Timeout, however, is a different story. When I used the new scheduler for timeout, the unit tests effectively hanged. adblockpluscore sets up some timeouts that are an hour long. Because the present timeout implementation uses a sleeping thread (ugh!), there's no way of stopping the timeout. We can't use an "immediate execution" policy for this and will need to augment the scheduler. Issue #3595 mentions using `libuv` for asynchronous I/O. It became apparent to me writing this that the structure of the code isn't ready for it yet. For the file system operations, we have a callback to execute. Regardless of how we do I/O, the callback needs a thread of execution, which would be a worker thread. In order to use asynchronous I/O, some thread needs to receive the completion notice, which would be a thread in the scheduler. The legacy code has only detached (bad!) worker threads and no scheduler thread. This change introduces a basic worker thread and a proper scheduler thread. It should be considered part of the preparation needed to put in asynchronous I/O. https://codereview.adblockplus.org/29367507/diff/29367520/include/AdblockPlus... File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367520/include/AdblockPlus... include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task, ImmediateSingleUseThreadType); Dummy argument anticipates needing delayed execution in order to implement timeout.
https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:78: extern const ImmediateSingleUseThreadType ImmediateSingleUseThread; ///< Dummy constant for scheduling policy Since it's not used I would propose to remove any ImmediateSingleUseThread*. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:141: * - timing policy: start execution immediately. I would remove the comment about timing policy because no one can guarantee that execution will be started immediately. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:147: * The schedule policy--create a new thread and discard it afterwards. I would remove that parameter because there is no choice and this parameters has no influence. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task, ImmediateSingleUseThreadType); Although it's expected that task can be copied internally, it would be still better to pass it as const reference. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:291: // Forward declaration for PImpl idiom We don't need this comment. https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp File src/JsEngine.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne... src/JsEngine.cpp:31: SchedulerImpl() {} // = default; We don't need this line. https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne... src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique after we upgrade out of VS2012 We don't need this comment about std::make_unique https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp File src/Scheduler.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:50: */ That comment is good however I don't think we need it because ThreadMain will be written that way or similar anyway. The worker thread can be paused for instance between lines 68 and 69, and we add items into queue with notifying the condition variable. Afterwards the worker thread continues the execution, of course it should process already pending items as soon as possible. So it's not related to construction or starting of a thread, it's just how it should be. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:75: if (!isRunning) Why not to put isRunning into while condition? we set and read isRunning only from worker thread, we don't need to have a lock to access it. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:83: // Assert `m` is unlocked I would remove that comment. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:103: if (th.joinable()) This implementation with if (th.joinable()) should be in Join method. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:111: th = std::move(std::thread(f)); Do we really need to explicitly say std::move here? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:81: * If a user wants to execute an operation in a different thread, We don't need this part "If a user wants to execute an operation in a different thread ...". https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:85: class OperationRunner Why not to call it ActiveObject? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:99: std::queue<std::function<void()>> queue; Basically, it's a synchronized queue which has already appeared several times. So, it would be better to have it as a separate class. However, currently, it's not important. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:133: class SingleUseWorker It looks like there was a lot of discussions, finally decided that it's better if a thread class has no member start/Run/whatever, but here we again introduce it... https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:175: * - Separate schedulers for each `JsEngine` without any shared facilities. I would remove this string because whether there is a "Separate schedulers for each `JsEngine`" is not defined yet, it rather looks like an advantage and scheduler should not be aware about any JsEngine. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:176: * - Working threads are single-use. No provision for a thread pool. That's also questionable. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:188: std::mutex m; it should be rather std::recursive_mutex because if Worker does not run in a a thread and instead does the everything immediately, thus in the same thread, there will be an exception. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:253: void Run(std::function<void()> body) It would be easier to read if argument "body" is renamed to something like a "op" (from operation) or "call". https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); }); I'm not sure that it's a wise decision to assume that Worker runs in some another thread. In this change it's OK but looking at the final code it seems very questionable. So, I would rather have it changed here, in this codereview. "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather some kind of a next function which is executed after worker has finished, which happens not exactly right after a call of "body". It's clear if consider a case when we use facilities for asynchronous IO. https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest.cpp File test/SchedulerTest.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest.... test/SchedulerTest.cpp:101: auto g = f; why do we need g?
https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:172: * Could you please move that Scheduler with the functionality described in few lines above into a separate codereview? We will soon need it for android. BTW, Scheduler should be even simpler, not a template class and it seems instead of SingleUseWorker we can directly use std::thread.
Added more reviewers. If I haven't added people that ought to see this, let me know. Moved Felix to Cc (where I thought I had put him originally, but perhaps not). I'll address some of the prior technical comments in another message. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:172: * On 2017/03/30 13:14:41, sergei wrote: > Could you please move that Scheduler with the functionality described in few > lines above into a separate codereview? We will soon need it for android. You might know what you want, but this description is far too short to make sense of what you are actually asking. I'll also remind you that this review depended upon a previous one, and your request might involve pulling in code from the previous one. > BTW, Scheduler should be even simpler, not a template class and it seems instead > of SingleUseWorker we can directly use std::thread. No, it should not be simpler. If you want me to do this, the template stays. If you understood this code better, you'd see that the template is necessary for the scheduler to work as written. Hard-coding the particular behaviors of std::thread is a bad idea. std::thread runs its body upon construction, so if you use that interface, it becomes impractical to use a thread pool later because the life-cycle of std::thread is different from what you want with a thread pool. std::thread does not act as a manageable resource (nor should it), but a proper scheduler needs to be able to manage its thread resources. Using raw threads diminishes the code that's already written. SingleUseWorker above is a wrapper around a raw thread that converts it into a resource. It's also why it has a Run() method, because it's acting as a resource and not a raw thread. One of the characteristics of a resource is that it separates (1) life-cycle, for example, acquire and release from (2) usage, here running a function body. std::thread does not have that kind of interface, so it is *necessary* to add it. ---- Furthermore, this review was number 2 of 12 (eventually) in a series that fixed all the structural problems with threads and engine allocation, completing all the substantive work for both #3595 and #3593 (they are interrelated, which isn't obvious). The usage of the Worker template argument is slated for modification, which is yet another reason not to use raw threads. Right now, the only usages of a worker resource are `Run()` and `Join()`, but it's slated for a `Cancel()` method. In detail, review 9 of 12 (if I'm counting correctly) allows interruption of timer from SetTimeout. This works by changing the executor for timer bodies to contain an condition variable. Wake up the condition variable before its timer fires and you forestall execution of the body. During development that executor began as a class specific to timers, but the plan is for some of its interface to migrate into the worker interface so that the scheduler can interrupt and cancel pending work.
Addressed Sergei's comments. Usually I get complaints about over-documenting, but working through these responses I think there may not be enough here. Please let me know if anything remains unclear. This code is subtle and there are lots of small possible modifications that would lead instantly to defects. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:141: * - timing policy: start execution immediately. On 2017/01/20 13:08:12, sergei wrote: > I would remove the comment about timing policy because no one can guarantee that > execution will be started immediately. "Immediately" simply means "at the present time, without delay". I would have thought it was obvious that this was not an assertion about the real-time behavior of execution, since we are not in an RTOS (real-time operating system) environment, but apparently not. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:147: * The schedule policy--create a new thread and discard it afterwards. On 2017/01/20 13:08:12, sergei wrote: > I would remove that parameter because there is no choice and this parameters has > no influence. This parameter is a marker, acting both as documentation and disambiguation by function signature. There's not a second version of `Schedule()` in the present code, but there has been in development code. When I've rewritten this function, I've created a second marker class and another implementation and then moved over callers from old to new incrementally, which greatly speeds development. And because it's a parameter, not a new function name, it play well with template instantiation. Once the scheduler can be considered finished, then it will be time to get rid of this argument. Even in the initial series of 12 review, it wasn't time yet. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task, ImmediateSingleUseThreadType); On 2017/01/20 13:08:13, sergei wrote: > Although it's expected that task can be copied internally, it would be still > better to pass it as const reference. In the present code, it's passed by value intentionally. We have to invoke the copy constructor because we'll eventually run it out of scope, and the cleanest way of doing this is to do it as part of the call. In later code, I changed this to an rvalue argument. The reason for that is to put responsibility for allocation inside the scheduler instead of relying upon its callers to do the right thing. The caller creates an argument and calls it with std::move. If I had figured this out before, I would have done it that way from the outset. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:291: // Forward declaration for PImpl idiom On 2017/01/20 13:08:13, sergei wrote: > We don't need this comment. Yes, we do. Dangling declarations with zero documentation are always confusing. https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp File src/JsEngine.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne... src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique after we upgrade out of VS2012 On 2017/01/20 13:08:13, sergei wrote: > We don't need this comment about std::make_unique Is that because we're planning never to upgrade from VS2012? Comments like this allow later code maintenance using `grep`. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp File src/Scheduler.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:50: */ On 2017/01/20 13:08:13, sergei wrote: > That comment is good however I don't think we need it Behavior is always clearer *after* you read the comment. It's also not a trivial comment. There could be other ways of implementing `ThreadMain` and you have to avoid race conditions upon construction, where this class gets a queued task but no notification, which would cause the object to freeze up and never run anything. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:75: if (!isRunning) On 2017/01/20 13:08:13, sergei wrote: > Why not to put isRunning into while condition? Because it's not correct. It opens up an opportunity for a race condition. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:83: // Assert `m` is unlocked On 2017/01/20 13:08:13, sergei wrote: > I would remove that comment. Nope; it stays. Tracking when the mutex is locked and when it's unlocked is critical to auditing this code. It wasn't there when I first drafted the code. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:103: if (th.joinable()) On 2017/01/20 13:08:13, sergei wrote: > This implementation with if (th.joinable()) should be in Join method. Yes, you need to be able to join the thread either explicitly or in the destructor. I fixed this in a later revision by moving this code to Join() and calling Join() here in the destructor. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:111: th = std::move(std::thread(f)); On 2017/01/20 13:08:13, sergei wrote: > Do we really need to explicitly say std::move here? It avoids a second copy, since there's already one in the call by value. I don't think there's a semantic rule in the C++ specification that makes the `move` call redundant, but there might be. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:81: * If a user wants to execute an operation in a different thread, On 2017/01/20 13:08:14, sergei wrote: > We don't need this part "If a user wants to execute an operation in a different > thread ...". Documenting responsibilities is part of what makes documentation good. The line you want to remove is documenting a responsibility that this class does not assume. Responsibility allocation is part of the design, so it's got to be part of the documentation. For where this started, see https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:85: class OperationRunner On 2017/01/20 13:08:14, sergei wrote: > Why not to call it ActiveObject? Because it's not a generic active object. It's tailored for use inside the scheduler. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:99: std::queue<std::function<void()>> queue; On 2017/01/20 13:08:14, sergei wrote: > Basically, it's a synchronized queue which has already appeared several times. > So, it would be better to have it as a separate class. However, currently, it's > not important. It's not a separate class because the mutex is used for more than queue management. You need atomicity between queue operations and notification, for instance. If you separate them, you get race conditions. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:133: class SingleUseWorker On 2017/01/20 13:08:13, sergei wrote: > It looks like there was a lot of discussions, finally decided that it's better > if a thread class has no member start/Run/whatever, but here we again introduce > it... It doesn't look from my perspective that there were any discussions at all. When did these purported discussions happen? Why was I not included? Besides, this isn't a "thread class". If you want a name for it, it's a "worker class" and acts as a resource, which I described in my previous message. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:175: * - Separate schedulers for each `JsEngine` without any shared facilities. On 2017/01/20 13:08:14, sergei wrote: > I would remove this string because whether there is a "Separate schedulers for > each `JsEngine`" is not defined yet, it rather looks like an advantage and > scheduler should not be aware about any JsEngine. Since you've not delved into this code the way I have, I'll forgive this misapprehension. If you want to be able to destroy JsEngine instances, engines and schedulers must absolutely be aware of each other. This line documents the particular relationship between the present scheduler and the present engine. I don't see any reason as of today why we cannot eventually use a single scheduler for multiple engine instances. Schedulers generically require their own thread(s), so it would be better eventually to use a singleton instance, but we're not there yet. Thus the documentation. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:176: * - Working threads are single-use. No provision for a thread pool. On 2017/01/20 13:08:13, sergei wrote: > That's also questionable. It is absolutely not questionable that this class, as written, uses only single-use threads and has no provision for a thread pool. Did you not see the "limitations and inefficiencies" point just three lines above. Documenting deficiencies is important for code maintenance. It's impractical and, more significantly, must slower to try to make everything perfect at each iteration. That practice creates huge friction. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:188: std::mutex m; On 2017/01/20 13:08:14, sergei wrote: > it should be rather std::recursive_mutex because if Worker does not run in a a > thread and instead does the everything immediately, thus in the same thread, > there will be an exception. No, it should not be recursive mutex. That would be masking an error. There's no correct usage scenario where the mutex would ever be called recursively. And for the record, of all the errors I encountered developing this code, not once did I ever have the kind of exception you're positing. Since this is apparently not an obvious point, when you run a task through this scheduler, the second thing it does (after locking the mutex) is to obtain a new worker, and that worker is always a separate thread. See the `emplace` statement in `Run()` below; it invokes the default constructor for `Worker`. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:253: void Run(std::function<void()> body) On 2017/01/20 13:08:13, sergei wrote: > It would be easier to read if argument "body" is renamed to something like a > "op" (from operation) or "call". I've used "body" consistently for a function that's subject to scheduling and management. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); }); On 2017/01/20 13:08:13, sergei wrote: > I'm not sure that it's a wise decision to assume that Worker runs in some > another thread. This is not just an assumption. It's a design principle. If you don't want a task to be scheduled, don't use the scheduler. If you schedule it, it runs in some arbitrary thread different that the one it's called in. Perhaps you don't recall, but the high-level goal in using asynchrony at all is to support JavaScript callbacks without re-entering v8. In order to do that, you *must* restart the call stack from the top, and the only mechanism we have available for that is a thread. If you were to implement a Worker without a thread inside somewhere, it would be incorrect in the only usage we have for it. > In this change it's OK but looking at the final code it seems > very questionable. Apparently you don't understand how it works, so please indicate what you don't understand so that I can expand upon the documentation. Ever since this code was working, I've had no problems with dangling threads for tasks run through this scheduler. The mechanism here is correct, reliable, and efficient. > So, I would rather have it changed here, in this codereview. > "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather some > kind of a next function which is executed after worker has finished, which > happens not exactly right after a call of "body". That's the way the code works already. `JoinAndRemove` is called within the monitor thread and not within whatever thread is inside the worker. What does happen immediately upon completion of the body is enqueuing a `JoinAndRemove` operation onto the monitor. Also note that the scheduler takes full responsibility for the entire life cycle of its tasks. It starts them up initially and detects when they're finished. Workers have no awareness that they're being used by schedulers, so the scheduler does not have to present any kind of interface to the worker. This is far simpler than any alternative where the worker notifies the scheduler that it's finished. > It's clear if consider a case > when we use facilities for asynchronous IO. This mechanism works just fine for asynchronous I/O. I rewrote all of it to use this mechanism. You say that you've read my code, but if you have, statement like this lead me believe you did so at best only cursorily or didn't understand. https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest.cpp File test/SchedulerTest.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/test/SchedulerTest.... test/SchedulerTest.cpp:101: auto g = f; On 2017/01/20 13:08:14, sergei wrote: > why do we need g? It's explicitly a copy of f. Perhaps unnecessary, but it means we can be certain that we're running 100 different function objects once each rather than a single function object 100 times. It's written this way so as not to rely on particular calling conventions of Run(). This is a stress test for `Run()`, so feeding its argument distinct objects is a bit better.
The functionality I want is https://gist.github.com/abby-sergz/12a2c1d25fc93c17838c878a11d7fe89, I'm pretty sure that the commit with it does not require so much changes as in the current code review, like changes in JsEngine or in some *JsObject.cpp. I think it will be fair to let you create a codereview with it because you have already submitted it here, though it should be shaped a bit. https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... File include/AdblockPlus/JsEngine.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/include/AdblockPlus... include/AdblockPlus/JsEngine.h:149: void Schedule(std::function<void()> task, ImmediateSingleUseThreadType); On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > Although it's expected that task can be copied internally, it would be still > > better to pass it as const reference. > > In the present code, it's passed by value intentionally. We have to invoke the > copy constructor because we'll eventually run it out of scope, and the cleanest > way of doing this is to do it as part of the call. > > In later code, I changed this to an rvalue argument. The reason for that is to > put responsibility for allocation inside the scheduler instead of relying upon > its callers to do the right thing. The caller creates an argument and calls it > with std::move. If I had figured this out before, I would have done it that way > from the outset. Acknowledged. https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp File src/JsEngine.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/JsEngine.cpp#ne... src/JsEngine.cpp:91: scheduler(new SchedulerImpl()) // use std::make_unique after we upgrade out of VS2012 On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > We don't need this comment about std::make_unique > > Is that because we're planning never to upgrade from VS2012? > > Comments like this allow later code maintenance using `grep`. Maybe, though I think that std::make_shared are much more important than std::make_unique in general. So, I would rather expect that anyway one will have to go through the code and simple change all places, including std::make_unique where it's needed or everywhere. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp File src/Scheduler.cpp (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:50: */ On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > That comment is good however I don't think we need it > > Behavior is always clearer *after* you read the comment. > > It's also not a trivial comment. There could be other ways of implementing > `ThreadMain` and you have to avoid race conditions upon construction, where this > class gets a queued task but no notification, which would cause the object to > freeze up and never run anything. I'm talking about implementation without race conditions and other bugs. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:75: if (!isRunning) On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > Why not to put isRunning into while condition? > > Because it's not correct. It opens up an opportunity for a race condition. Could you please explain when that race condition occurs? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:86: if (op) Why not to prevent putting of empty functions in the queue? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:91: catch (std::exception& e) it generates a warning about unused local variable. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.cpp#n... src/Scheduler.cpp:111: th = std::move(std::thread(f)); On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > Do we really need to explicitly say std::move here? > > It avoids a second copy, since there's already one in the call by value. Copy of what? JIC, std::thread cannot be copied. > > I don't think there's a semantic rule in the C++ specification that makes the > `move` call redundant, but there might be. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h File src/Scheduler.h (right): https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:81: * If a user wants to execute an operation in a different thread, On 2017/03/30 17:16:59, Eric wrote: > On 2017/01/20 13:08:14, sergei wrote: > > We don't need this part "If a user wants to execute an operation in a > different > > thread ...". > > Documenting responsibilities is part of what makes documentation good. The line > you want to remove is documenting a responsibility that this class does not > assume. > > Responsibility allocation is part of the design, so it's got to be part of the > documentation. For where this started, see > https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card Sorry, but I have not seen on the provisioned link anything explaining the reason to have such comment. If you think that this comment should be then where are comments like - if a user wants to execute an operation immediately ... - if a user wants to execute an operation in 5 min ... - if a user wants to execute an operation in 3 min ... ... - if a user wants to execute an operation and show a green button ... ? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:85: class OperationRunner On 2017/03/30 17:16:59, Eric wrote: > On 2017/01/20 13:08:14, sergei wrote: > > Why not to call it ActiveObject? > > Because it's not a generic active object. It's tailored for use inside the > scheduler. Maybe it's not a generic active object but it is a simple active object which can be used as a base for any another active object. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:99: std::queue<std::function<void()>> queue; On 2017/03/30 17:16:59, Eric wrote: > On 2017/01/20 13:08:14, sergei wrote: > > Basically, it's a synchronized queue which has already appeared several times. > > So, it would be better to have it as a separate class. However, currently, > it's > > not important. > > It's not a separate class because the mutex is used for more than queue > management. You need atomicity between queue operations and notification, for > instance. If you separate them, you get race conditions. Could you please point to that race condition in the example I have posted? https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:133: class SingleUseWorker On 2017/03/30 17:16:59, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > It looks like there was a lot of discussions, finally decided that it's better > > if a thread class has no member start/Run/whatever, but here we again > introduce > > it... > > It doesn't look from my perspective that there were any discussions at all. When > did these purported discussions happen? Why was I not included? There were a lot of talks regarding designing a thread class in different C++ communities like boost and Qt and maybe during considering proposals for C++11, strangely I cannot find any link, but one of main conclusions was that it's a bad idea to have such method like run or start on a thread class because in this case one can call it several times on the same object. Therefore there is only one way to start a thread which is by passing a callable to the constructor or to a run method accepting callable. Of course there is still plenty ways to use it a wrong way but, because member methods start have proven themselves on practice, the recommendation for designers of a thread class was to try another interface and see whether it helps. > > Besides, this isn't a "thread class". If you want a name for it, it's a "worker > class" and acts as a resource, which I described in my previous message. OK, clear, I would propose to introduce that abstraction when it's really needed. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:172: * On 2017/03/30 14:20:55, Eric wrote: > On 2017/03/30 13:14:41, sergei wrote: > > Could you please move that Scheduler with the functionality described in few > > lines above into a separate codereview? We will soon need it for android. > > You might know what you want, but this description is far too short to make > sense of what you are actually asking. I'll also remind you that this review > depended upon a previous one, and your request might involve pulling in code > from the previous one. So, what are the difficulties to create a codereview only with SchedulerT (I think the name should be changed) and its dependencies? > > > BTW, Scheduler should be even simpler, not a template class and it seems > instead > > of SingleUseWorker we can directly use std::thread. > > No, it should not be simpler. If you want me to do this, the template stays. If > you understood this code better, you'd see that the template is necessary for > the scheduler to work as written. Apparently the template is not necessary in my example, so the template is not necessary. > > Hard-coding the particular behaviors of std::thread is a bad idea. std::thread > runs its body upon construction, so if you use that interface, it becomes > impractical to use a thread pool later because the life-cycle of std::thread is > different from what you want with a thread pool. std::thread does not act as a > manageable resource (nor should it), but a proper scheduler needs to be able to > manage its thread resources. Using raw threads diminishes the code that's > already written. I think that thread pool should be a completely different class because with the current approach when a new instance of a worker is created for each task and with `monitor` the implementation of the thread pool is not going to be simple, the same for implementation of timers. BTW, so far there is no need in thread pool, so I would like to ask to not complicate the code before it's needed. > > SingleUseWorker above is a wrapper around a raw thread that converts it into a > resource. It's also why it has a Run() method, because it's acting as a resource > and not a raw thread. One of the characteristics of a resource is that it > separates (1) life-cycle, for example, acquire and release from (2) usage, here > running a function body. std::thread does not have that kind of interface, so it > is *necessary* to add it. I consider std::thread also as resource and so far the interface of std::thread is enough. > > ---- > > Furthermore, this review was number 2 of 12 (eventually) in a series that fixed > all the structural problems with threads and engine allocation, completing all > the substantive work for both #3595 and #3593 (they are interrelated, which > isn't obvious). The usage of the Worker template argument is slated for > modification, which is yet another reason not to use raw threads. Right now, the > only usages of a worker resource are `Run()` and `Join()`, but it's slated for a > `Cancel()` method. That's good but why not to introduce that abstraction in a separate commit? Smaller commits are always better. What concerns the number of reviews, I don't think it's an issue because anyway these codereviews should be adapted to whatever state of master, and I think that the easiest way is to simply pick small portions with simple functionality and rebase the rest afterwards. > > In detail, review 9 of 12 (if I'm counting correctly) allows interruption of > timer from SetTimeout. This works by changing the executor for timer bodies to > contain an condition variable. Wake up the condition variable before its timer > fires and you forestall execution of the body. During development that executor > began as a class specific to timers, but the plan is for some of its interface > to migrate into the worker interface so that the scheduler can interrupt and > cancel pending work. I see the point but for present I would like to don't pull that abstraction. I doubt that the proposed implementation in https://codereview.adblockplus.org/29370876/ is the right way we should go now (because there are more elegant approaches) and maybe later (I think about clearTimeout). Cancelling functionality in general has a lot of questions, for instance I would implement it slightly differently, most likely even on another level, and it highly depends on requirements. So far, I would propose to create corresponding issues and discuss it there. For example, - we will definitely need `clearTimeout` however it's not very important right now because it's never used. - `XMLHttpRequest.abort()` is similar to the previous one - so far, we don't use any cancellable API to work with file system, so no need for such issue - an issue with arguments against current cancelling of timers when we are destroying JsEngine and it would be good to see a reasoned proposal - cancelling of running file system operations and web requests when JsEngine is being destroyed. Though I think it should be a part of another issue. I don't mind to discuss it in https://issues.adblockplus.org/ticket/3595, because the latter is actually a meta issue for subsequent work, although the description should be adjusted to its generic title, even more the things mentioned in "What to change" are already done. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:175: * - Separate schedulers for each `JsEngine` without any shared facilities. On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:14, sergei wrote: > > I would remove this string because whether there is a "Separate schedulers for > > each `JsEngine`" is not defined yet, it rather looks like an advantage and > > scheduler should not be aware about any JsEngine. > > Since you've not delved into this code the way I have, I'll forgive this > misapprehension. If you want to be able to destroy JsEngine instances, engines > and schedulers must absolutely be aware of each other. > > This line documents the particular relationship between the present scheduler > and the present engine. > > I don't see any reason as of today why we cannot eventually use a single > scheduler for multiple engine instances. Schedulers generically require their > own thread(s), so it would be better eventually to use a singleton instance, but > we're not there yet. Thus the documentation. I think that scheduler should not be so direct member of JsEngine, I would like to provide users of libadblockplus with possibility to use their own scheduler for asynchronous operations. It's useful because, e.g. on MS Windows only one worker thread is enough to support all our asynchronous operations. It means changing of interface of injected interfaces like WebRequest and FileSystem to have asynchronous methods and having a separate instances of scheduler in each default implementation. All asynchronous operations should be cancelled when the implementation is destroyed by JsEngine. So, should JsEngine and scheduler be aware about each other? - No. JsEngine should take into account that it schedules asynchronous tasks through certain interfaces and that these tasks either completed or stopped and cleaned after destroying of a corresponding interface. Neither scheduler nor injected interfaces should know where they are used. I can expect here some reaction like, "I have sent XX codereviews, why did you not tell it earlier/stopped me, where are issues with that, etc". Apparently on my question https://issues.adblockplus.org/ticket/3595#comment:2 instead of opinions and following discussion, I have received https://codereview.adblockplus.org/29367003/ and https://codereview.adblockplus.org/29367507/ which are actually good but should be slightly adjusted, so I planned they will be changed during the reviewing process, and then within a pretty shot time span a bunch of following codereviews. BTW, there are also good things in them but I would like to pick them gradually. I think that before doing such much work it would be better to firstly discuss it. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:176: * - Working threads are single-use. No provision for a thread pool. On 2017/03/30 17:16:59, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > That's also questionable. > > It is absolutely not questionable that this class, as written, uses only > single-use threads and has no provision for a thread pool. Did you not see the > "limitations and inefficiencies" point just three lines above. Why should it have a thread pool? It's similar to the example in another comment thread, where is a comment that it does not show a big green button? > > Documenting deficiencies is important for code maintenance. It's impractical > and, more significantly, must slower to try to make everything perfect at each > iteration. That practice creates huge friction. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:188: std::mutex m; On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:14, sergei wrote: > > it should be rather std::recursive_mutex because if Worker does not run in a a > > thread and instead does the everything immediately, thus in the same thread, > > there will be an exception. > > No, it should not be recursive mutex. That would be masking an error. There's no > correct usage scenario where the mutex would ever be called recursively. Acknowledged. > And for > the record, of all the errors I encountered developing this code, not once did I > ever have the kind of exception you're positing. > > Since this is apparently not an obvious point, when you run a task through this > scheduler, the second thing it does (after locking the mutex) is to obtain a new > worker, and that worker is always a separate thread. See the `emplace` statement > in `Run()` below; it invokes the default constructor for `Worker`. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:226: task->Join(); here is a race condition, it can happen that move-assignment in SingleUseWorker::Run has not happened yet but body has already finished and monitor has already reached that point and inside SingleUseWorker::Join it accessed SingleUseWorker::th which is default constructed yet. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:253: void Run(std::function<void()> body) On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > It would be easier to read if argument "body" is renamed to something like a > > "op" (from operation) or "call". > > I've used "body" consistently for a function that's subject to scheduling and > management. I still think that another name is better here, body of what is used here? Such words like task, operation, function, callable, executable, runnable fit much better here than body, IMO. https://codereview.adblockplus.org/29367507/diff/29367521/src/Scheduler.h#new... src/Scheduler.h:259: tt->Run([this, body, tt]() { WorkerMain(body, tt); }); On 2017/03/30 17:16:58, Eric wrote: > On 2017/01/20 13:08:13, sergei wrote: > > I'm not sure that it's a wise decision to assume that Worker runs in some > > another thread. > > This is not just an assumption. It's a design principle. > > If you don't want a task to be scheduled, don't use the scheduler. If you > schedule it, it runs in some arbitrary thread different that the one it's called > in. > > Perhaps you don't recall, but the high-level goal in using asynchrony at all is > to support JavaScript callbacks without re-entering v8. In order to do that, you > *must* restart the call stack from the top, and the only mechanism we have > available for that is a thread. If you were to implement a Worker without a > thread inside somewhere, it would be incorrect in the only usage we have for it. > > > In this change it's OK but looking at the final code it seems > > very questionable. > > Apparently you don't understand how it works, so please indicate what you don't > understand so that I can expand upon the documentation. OK, let's say I want to implement timers using SetTimer from Win API and I'm allowed to create only one thread for all timers per instance of Scheduler. What will be the implementation of template class Worker? JIC, It's merely a theoretical question, I'm not going to use Win API for timers. > > Ever since this code was working, I've had no problems with dangling threads for > tasks run through this scheduler. The mechanism here is correct, reliable, and > efficient. > > > So, I would rather have it changed here, in this codereview. > > "monitor.Run([this, task]() { JoinAndRemove(task); });" should be rather some > > kind of a next function which is executed after worker has finished, which > > happens not exactly right after a call of "body". > > That's the way the code works already. `JoinAndRemove` is called within the > monitor thread and not within whatever thread is inside the worker. What does > happen immediately upon completion of the body is enqueuing a `JoinAndRemove` > operation onto the monitor. > > Also note that the scheduler takes full responsibility for the entire life cycle > of its tasks. It starts them up initially and detects when they're finished. > Workers have no awareness that they're being used by schedulers, so the > scheduler does not have to present any kind of interface to the worker. This is > far simpler than any alternative where the worker notifies the scheduler that > it's finished. > > > It's clear if consider a case > > when we use facilities for asynchronous IO. > > This mechanism works just fine for asynchronous I/O. I rewrote all of it to use > this mechanism. You say that you've read my code, but if you have, statement > like this lead me believe you did so at best only cursorily or didn't > understand. What I really don't understand is the reason you are always repeating that I didn't understand something. Please stay professional. |