Better explain logcontext in run_in_background(...) and run_as_background_process(...) (#18900)

Also adds a section in the docs explaining the `sentinel` logcontext. Spawning from https://github.com/element-hq/synapse/pull/18870 ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` - `Background process re-entered without a proc` Twisted trial tests: 1. Run full Twisted trial test suite. 1. Check the logs for `Test starting with non-sentinel logging context ...`
2025-09-10 10:22:53 -05:00 · 2025-09-10 10:22:53 -05:00 · 9cc4001778
commit 9cc4001778
parent c68c5dd07b
4 changed files with 146 additions and 36 deletions
--- a/changelog.d/18900.misc
+++ b/changelog.d/18900.misc
@ -0,0 +1 @@
+Better explain how we manage the logcontext in `run_in_background(...)` and `run_as_background_process(...)`.
--- a/docs/log_contexts.md
+++ b/docs/log_contexts.md
@ -59,6 +59,28 @@ def do_request_handling():
    logger.debug("phew")
 ```

+### The `sentinel` context
+
+The default logcontext is `synapse.logging.context.SENTINEL_CONTEXT`, which is an empty
+sentinel value to represent the root logcontext. This is what is used when there is no
+other logcontext set. The phrase "clear/reset the logcontext" means to set the current
+logcontext to the `sentinel` logcontext.
+
+No CPU/database usage metrics are recorded against the `sentinel` logcontext.
+
+Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
+logcontext as we want to know which server the logs came from. In practice, this is not
+always the case yet especially outside of request handling.
+
+Global things outside of Synapse (e.g. Twisted reactor code) should run in the
+`sentinel` logcontext. It's only when it calls into application code that a logcontext
+gets activated. This means the reactor should be started in the `sentinel` logcontext,
+and any time an awaitable yields control back to the reactor, it should reset the
+logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
+current logcontext to the reactor (which would then get picked up and associated with
+the next thing the reactor does).
+
+
 ## Using logcontexts with awaitables

 Awaitables break the linear flow of code so that there is no longer a single entry point
--- a/synapse/logging/context.py
+++ b/synapse/logging/context.py
@ -227,7 +227,16 @@ LoggingContextOrSentinel = Union["LoggingContext", "_Sentinel"]


 class _Sentinel:
-    """Sentinel to represent the root context"""
+    """
+    Sentinel to represent the root context
+
+    This should only be used for tasks outside of Synapse like when we yield control
+    back to the Twisted reactor (event loop) so we don't leak the current logging
+    context to other tasks that are scheduled next in the event loop.
+
+    Nothing from the Synapse homeserver should be logged with the sentinel context. i.e.
+    we should always know which server the logs are coming from.
+    """

    __slots__ = ["previous_context", "finished", "request", "tag"]

@ -616,9 +625,17 @@ class LoggingContextFilter(logging.Filter):


 class PreserveLoggingContext:
-    """Context manager which replaces the logging context
+    """
+    Context manager which replaces the logging context

-    The previous logging context is restored on exit."""
+    The previous logging context is restored on exit.
+
+    `make_deferred_yieldable` is pretty equivalent to using `with
+    PreserveLoggingContext():` (using the default sentinel context), i.e. it clears the
+    logcontext before awaiting (and so before execution passes back to the reactor) and
+    restores the old context once the awaitable completes (execution passes from the
+    reactor back to the code).
+    """

    __slots__ = ["_old_context", "_new_context"]

@ -784,6 +801,14 @@ def run_in_background(
    return from the function, and that the sentinel context is set once the
    deferred returned by the function completes.

+    To explain how the log contexts work here:
+     - When this function is called, the current context is stored ("original"), we kick
+       off the background task, and we restore that original context before returning
+     - When the background task finishes, we don't want to leak our context into the
+       reactor which would erroneously get attached to the next operation picked up by
+       the event loop. We add a callback to the deferred which will clear the logging
+       context after it finishes and yields control back to the reactor.
+
    Useful for wrapping functions that return a deferred or coroutine, which you don't
    yield or await on (for instance because you want to pass it to
    deferred.gatherResults()).
@ -795,8 +820,13 @@ def run_in_background(
    `f` doesn't raise any deferred exceptions, otherwise a scary-looking
    CRITICAL error about an unhandled error will be logged without much
    indication about where it came from.
+
+    Returns:
+        Deferred which returns the result of func, or `None` if func raises.
+        Note that the returned Deferred does not follow the synapse logcontext
+        rules.
    """
-    current = current_context()
+    calling_context = current_context()
    try:
        res = f(*args, **kwargs)
    except Exception:
@ -806,6 +836,9 @@ def run_in_background(

    # `res` may be a coroutine, `Deferred`, some other kind of awaitable, or a plain
    # value. Convert it to a `Deferred`.
+    #
+    # Wrapping the value in a deferred has the side effect of executing the coroutine,
+    # if it is one. If it's already a deferred, then we can just use that.
    d: "defer.Deferred[R]"
    if isinstance(res, typing.Coroutine):
        # Wrap the coroutine in a `Deferred`.
@ -820,20 +853,24 @@ def run_in_background(
        # `res` is a plain value. Wrap it in a `Deferred`.
        d = defer.succeed(res)

+    # The deferred has already completed
    if d.called and not d.paused:
        # The function should have maintained the logcontext, so we can
        # optimise out the messing about
        return d

-    # The function may have reset the context before returning, so
-    # we need to restore it now.
-    ctx = set_current_context(current)
+    # The function may have reset the context before returning, so we need to restore it
+    # now.
+    #
+    # Our goal is to have the caller logcontext unchanged after firing off the
+    # background task and returning.
+    set_current_context(calling_context)

-    # The original context will be restored when the deferred
-    # completes, but there is nothing waiting for it, so it will
-    # get leaked into the reactor or some other function which
-    # wasn't expecting it. We therefore need to reset the context
-    # here.
+    # The original logcontext will be restored when the deferred completes, but
+    # there is nothing waiting for it, so it will get leaked into the reactor (which
+    # would then get picked up by the next thing the reactor does). We therefore
+    # need to reset the logcontext here (set the `sentinel` logcontext) before
+    # yielding control back to the reactor.
    #
    # (If this feels asymmetric, consider it this way: we are
    # effectively forking a new thread of execution. We are
@ -841,7 +878,7 @@ def run_in_background(
    # which is supposed to have a single entry and exit point. But
    # by spawning off another deferred, we are effectively
    # adding a new exit point.)
-    d.addBoth(_set_context_cb, ctx)
+    d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
    return d


@ -859,20 +896,34 @@ def run_coroutine_in_background(
    coroutine directly rather than a function. We can do this because coroutines
    do not run until called, and so calling an async function without awaiting
    cannot change the log contexts.
-    """

-    current = current_context()
+    This is an ergonomic helper so we can do this:
+    ```python
+    run_coroutine_in_background(func1(arg1))
+    ```
+    Rather than having to do this:
+    ```python
+    run_in_background(lambda: func1(arg1))
+    ```
+    """
+    calling_context = current_context()
+
+    # Wrap the coroutine in a deferred, which will have the side effect of executing the
+    # coroutine in the background.
    d = defer.ensureDeferred(coroutine)

-    # The function may have reset the context before returning, so
-    # we need to restore it now.
-    ctx = set_current_context(current)
+    # The function may have reset the context before returning, so we need to restore it
+    # now.
+    #
+    # Our goal is to have the caller logcontext unchanged after firing off the
+    # background task and returning.
+    set_current_context(calling_context)

-    # The original context will be restored when the deferred
-    # completes, but there is nothing waiting for it, so it will
-    # get leaked into the reactor or some other function which
-    # wasn't expecting it. We therefore need to reset the context
-    # here.
+    # The original logcontext will be restored when the deferred completes, but
+    # there is nothing waiting for it, so it will get leaked into the reactor (which
+    # would then get picked up by the next thing the reactor does). We therefore
+    # need to reset the logcontext here (set the `sentinel` logcontext) before
+    # yielding control back to the reactor.
    #
    # (If this feels asymmetric, consider it this way: we are
    # effectively forking a new thread of execution. We are
@ -880,7 +931,7 @@ def run_coroutine_in_background(
    # which is supposed to have a single entry and exit point. But
    # by spawning off another deferred, we are effectively
    # adding a new exit point.)
-    d.addBoth(_set_context_cb, ctx)
+    d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
    return d


@ -888,24 +939,43 @@ T = TypeVar("T")


 def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]":
-    """Given a deferred, make it follow the Synapse logcontext rules:
-
-    If the deferred has completed, essentially does nothing (just returns another
-    completed deferred with the result/failure).
-
-    If the deferred has not yet completed, resets the logcontext before
-    returning a deferred. Then, when the deferred completes, restores the
-    current logcontext before running callbacks/errbacks.
-
-    (This is more-or-less the opposite operation to run_in_background.)
    """
+    Given a deferred, make it follow the Synapse logcontext rules:
+
+    - If the deferred has completed, essentially does nothing (just returns another
+      completed deferred with the result/failure).
+    - If the deferred has not yet completed, resets the logcontext before returning a
+      incomplete deferred. Then, when the deferred completes, restores the current
+      logcontext before running callbacks/errbacks.
+
+    This means the resultant deferred can be awaited without leaking the current
+    logcontext to the reactor (which would then get erroneously picked up by the next
+    thing the reactor does), and also means that the logcontext is preserved when the
+    deferred completes.
+
+    (This is more-or-less the opposite operation to run_in_background in terms of how it
+    handles log contexts.)
+
+    Pretty much equivalent to using `with PreserveLoggingContext():`, i.e. it clears the
+    logcontext before awaiting (and so before execution passes back to the reactor) and
+    restores the old context once the awaitable completes (execution passes from the
+    reactor back to the code).
+    """
+    # The deferred has already completed
    if deferred.called and not deferred.paused:
        # it looks like this deferred is ready to run any callbacks we give it
        # immediately. We may as well optimise out the logcontext faffery.
        return deferred

-    # ok, we can't be sure that a yield won't block, so let's reset the
-    # logcontext, and add a callback to the deferred to restore it.
+    # Our goal is to have the caller logcontext unchanged after they yield/await the
+    # returned deferred.
+    #
+    # When the caller yield/await's the returned deferred, it may yield
+    # control back to the reactor. To avoid leaking the current logcontext to the
+    # reactor (which would then get erroneously picked up by the next thing the reactor
+    # does) while the deferred runs in the reactor event loop, we reset the logcontext
+    # and add a callback to the deferred to restore it so the caller's logcontext is
+    # active when the deferred completes.
    prev_context = set_current_context(SENTINEL_CONTEXT)
    deferred.addBoth(_set_context_cb, prev_context)
    return deferred
--- a/synapse/metrics/background_process_metrics.py
+++ b/synapse/metrics/background_process_metrics.py
@ -228,6 +228,11 @@ def run_as_background_process(
    clock.looping_call and friends (or for firing-and-forgetting in the middle of a
    normal synapse async function).

+    Because the returned Deferred does not follow the synapse logcontext rules, awaiting
+    the result of this function will result in the log context being cleared (bad). In
+    order to properly await the result of this function and maintain the current log
+    context, use `make_deferred_yieldable`.
+
    Args:
        desc: a description for this background process type
        server_name: The homeserver name that this background process is being run for
@ -280,6 +285,18 @@ def run_as_background_process(
                    name=desc, **{SERVER_NAME_LABEL: server_name}
                ).dec()

+    # To explain how the log contexts work here:
+    #  - When this function is called, the current context is stored (using
+    #    `PreserveLoggingContext`), we kick off the background task, and we restore the
+    #    original context before returning (also part of `PreserveLoggingContext`).
+    #  - When the background task finishes, we don't want to leak our background context
+    #    into the reactor which would erroneously get attached to the next operation
+    #    picked up by the event loop. We use `PreserveLoggingContext` to set the
+    #    `sentinel` context and means the new `BackgroundProcessLoggingContext` will
+    #    remember the `sentinel` context as its previous context to return to when it
+    #    exits and yields control back to the reactor.
+    #
+    # TODO: Why can't we simplify to using `return run_in_background(run)`?
    with PreserveLoggingContext():
        # Note that we return a Deferred here so that it can be used in a
        # looping_call and other places that expect a Deferred.
				`@ -0,0 +1 @@`
				Better explain how we manage the logcontext in `run_in_background(...)` and `run_as_background_process(...)`.