Issue 29323363: Issue 2868 - Don't pull the notification repo when parsing

Issue 29323363: Issue 2868 - Don't pull the notification repo when parsing (Closed)

Created:
Aug. 7, 2015, 6:34 a.m. by Felix Dahlke

Modified:
Aug. 7, 2015, 7:05 p.m.

Reviewers:
Sebastian Noack, mathias

CC:
Wladimir Palant

Visibility:
Public.

Description

Issue 2868 - Don't pull the notification repo when parsing We need to keep it up-to-date on the infrastructure side, see: https://issues.adblockplus.org/ticket/2869

Patch Set 1 #

Created: Aug. 7, 2015, 6:34 a.m.

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+0 lines, -1 line)			Patch
	M	sitescripts/notifications/parser.py	View		1 chunk	+0 lines, -1 line	0 comments	Download

Messages

Total messages: 10

Expand All Messages | Collapse All Messages

Sebastian Noack

If you only want to prevent multiple threads from concurrently running hg pull, how about ...

Aug. 7, 2015, 7:58 a.m. (2015-08-07 07:58:06 UTC) #2

Felix Dahlke

On 2015/08/07 07:58:06, Sebastian Noack wrote: > If you only want to prevent multiple threads ...

Aug. 7, 2015, 12:55 p.m. (2015-08-07 12:55:38 UTC) #3

On 2015/08/07 07:58:06, Sebastian Noack wrote:
> If you only want to prevent multiple threads from concurrently running hg
pull,
> how about using synchronization?
> 
>   hg_pull_pending = False
>   hg_pull_finished = threading.Condition()
> 
>   def pull_once():
>     with hg_pull_finished:
>       if hg_pull_pending:
>         hg_pull_finished.wait()
>         return
>       hg_pull_pending = True
> 
>     subprocess.call(["hg", "-R", repo, "pull", "-q"])
> 
>     with hg_pull_finished:
>       hg_pull_pending = False
>       hg_pull_finished.notify_all()
> 
> This will only run "hg pull" if isn't called by a different thread yet. If it
is
> we don't call it again, but simply wait until the pending call is finished. I
> don't have a too strong opinion though. But the advantages of this approach
> would be:
> 
> 1. The repo is automatically pulled during testing and development.
> 2. The result is more predictable in production, as you don't need to consider
> two independent intervals (the cache expiration, and the cronjob pulling the
> repo)
> 3. You don't need to keep in mind that the repo is / needs to be updated by
> other means.
> 4. We make sure that the repo is updated where we access it. Hence we have
> higher code locality and don't need to rely on external mechanisms.

I initially liked that, but actually, I'm worried about making this code too
complex. It already depends on the notifications repository being present - why
shouldn't it depend on that repository being up to date, too? It doesn't
actually change so often that we'd have to check it continuously. We're also
doing the same with the sitescripts repository - we just assumes it's present
and up-to-date. I think in the long run, if we'd move away from syncing cron
jobs and would trigger deployments from a CI server, e.g. after tests pass, it'd
be a pretty nice way to do it.

The main downside I see is that sitescripts and infrastructure are more
entangled this way. They already are to quite some degree, making sitescripts
difficult to test and understand in isolation. I'm not entirely sure how to best
tackle that, but I feel that moving logic that's more understandably expressed
in infrastructure to sitescripts is not the best way to achieve that.

mathias

On 2015/08/07 12:55:38, Felix Dahlke wrote: > The main downside I see is that sitescripts ...

Aug. 7, 2015, 1:17 p.m. (2015-08-07 13:17:32 UTC) #4

Sebastian Noack

As I said, it was just a suggestion, I'm not convinced of myself yet. But ...

Aug. 7, 2015, 1:36 p.m. (2015-08-07 13:36:25 UTC) #5

mathias

On 2015/08/07 13:36:25, Sebastian Noack wrote: > As I said, it was just a suggestion, ...

Aug. 7, 2015, 1:44 p.m. (2015-08-07 13:44:09 UTC) #6

Sebastian Noack

Well, the idea was to avoid having an additional cronjob to update the repository, by ...

Aug. 7, 2015, 2:08 p.m. (2015-08-07 14:08:17 UTC) #7

mathias

LGTM (although I was not a reviewer in this ticket before, I now somehow ended ...

Aug. 7, 2015, 2:37 p.m. (2015-08-07 14:37:21 UTC) #8

Sebastian Noack

Acquiring a lock, to check/set a variable, compared to calling a subprocess that will cause ...

Aug. 7, 2015, 3:06 p.m. (2015-08-07 15:06:12 UTC) #9

Felix Dahlke

Aug. 7, 2015, 7:02 p.m. (2015-08-07 19:02:27 UTC) #10

On 2015/08/07 15:06:12, Sebastian Noack wrote:
> Acquiring a lock, to check/set a variable, compared to calling a subprocess
that
> will cause network IO, certainly doesn't add any measurable overhead. But
sure,
> if we get that subprocess completely out of this code path, that would
certainly
> remove some overhead. Though I think it probably doesn't matter too much here,
> as we (have to) rely on heavy caching anyway, to make this perform somehow. As
> far as I understood it, the problem merely was that we have to avoid running
too
> many concurring "hg pull" subprocesses during cache warm up. So simply not
> calling "hg pull" if it is already running seemed an adequate solution. I also
> listed some advantages above. But Maybe I miss some point. However, either
way,
> I don't care too much how this issue is tackeled. So if everybody is happy
with
> doing so in infrastructure fine with me. LGTM.

Alright, I'd prefer to go with that one for the sake of progress. To clarify:
What I meant is that removing the hg pull here and moving it to infrastructure
_increases_ entanglement between sitescripts and infrastructure. But I suppose
we shouldn't tackle that general issue on a higher level, not with this
particular issue here.

Expand All Messages | Collapse All Messages