Better explain logcontext in run_in_background(...) and run_as_background_process(...) (#18900)

Also adds a section in the docs explaining the `sentinel` logcontext.

Spawning from https://github.com/element-hq/synapse/pull/18870


### Testing strategy

1. Run Synapse normally and with `daemonize: true`: `poetry run
synapse_homeserver --config-path homeserver.yaml`
 1. Execute some requests
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `Expected previous context`
    - `utime went backwards!`/`stime went backwards!`
- `Called stop on logcontext POST-0 without recording a start rusage`
    - `Background process re-entered without a proc`

Twisted trial tests:

 1. Run full Twisted trial test suite.
1. Check the logs for `Test starting with non-sentinel logging context ...`
This commit is contained in:
Eric Eastwood 2025-09-10 10:22:53 -05:00 committed by GitHub
parent c68c5dd07b
commit 9cc4001778
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 146 additions and 36 deletions

1
changelog.d/18900.misc Normal file
View File

@ -0,0 +1 @@
Better explain how we manage the logcontext in `run_in_background(...)` and `run_as_background_process(...)`.

View File

@ -59,6 +59,28 @@ def do_request_handling():
logger.debug("phew")
```
### The `sentinel` context
The default logcontext is `synapse.logging.context.SENTINEL_CONTEXT`, which is an empty
sentinel value to represent the root logcontext. This is what is used when there is no
other logcontext set. The phrase "clear/reset the logcontext" means to set the current
logcontext to the `sentinel` logcontext.
No CPU/database usage metrics are recorded against the `sentinel` logcontext.
Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
logcontext as we want to know which server the logs came from. In practice, this is not
always the case yet especially outside of request handling.
Global things outside of Synapse (e.g. Twisted reactor code) should run in the
`sentinel` logcontext. It's only when it calls into application code that a logcontext
gets activated. This means the reactor should be started in the `sentinel` logcontext,
and any time an awaitable yields control back to the reactor, it should reset the
logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
current logcontext to the reactor (which would then get picked up and associated with
the next thing the reactor does).
## Using logcontexts with awaitables
Awaitables break the linear flow of code so that there is no longer a single entry point

View File

@ -227,7 +227,16 @@ LoggingContextOrSentinel = Union["LoggingContext", "_Sentinel"]
class _Sentinel:
"""Sentinel to represent the root context"""
"""
Sentinel to represent the root context
This should only be used for tasks outside of Synapse like when we yield control
back to the Twisted reactor (event loop) so we don't leak the current logging
context to other tasks that are scheduled next in the event loop.
Nothing from the Synapse homeserver should be logged with the sentinel context. i.e.
we should always know which server the logs are coming from.
"""
__slots__ = ["previous_context", "finished", "request", "tag"]
@ -616,9 +625,17 @@ class LoggingContextFilter(logging.Filter):
class PreserveLoggingContext:
"""Context manager which replaces the logging context
"""
Context manager which replaces the logging context
The previous logging context is restored on exit."""
The previous logging context is restored on exit.
`make_deferred_yieldable` is pretty equivalent to using `with
PreserveLoggingContext():` (using the default sentinel context), i.e. it clears the
logcontext before awaiting (and so before execution passes back to the reactor) and
restores the old context once the awaitable completes (execution passes from the
reactor back to the code).
"""
__slots__ = ["_old_context", "_new_context"]
@ -784,6 +801,14 @@ def run_in_background(
return from the function, and that the sentinel context is set once the
deferred returned by the function completes.
To explain how the log contexts work here:
- When this function is called, the current context is stored ("original"), we kick
off the background task, and we restore that original context before returning
- When the background task finishes, we don't want to leak our context into the
reactor which would erroneously get attached to the next operation picked up by
the event loop. We add a callback to the deferred which will clear the logging
context after it finishes and yields control back to the reactor.
Useful for wrapping functions that return a deferred or coroutine, which you don't
yield or await on (for instance because you want to pass it to
deferred.gatherResults()).
@ -795,8 +820,13 @@ def run_in_background(
`f` doesn't raise any deferred exceptions, otherwise a scary-looking
CRITICAL error about an unhandled error will be logged without much
indication about where it came from.
Returns:
Deferred which returns the result of func, or `None` if func raises.
Note that the returned Deferred does not follow the synapse logcontext
rules.
"""
current = current_context()
calling_context = current_context()
try:
res = f(*args, **kwargs)
except Exception:
@ -806,6 +836,9 @@ def run_in_background(
# `res` may be a coroutine, `Deferred`, some other kind of awaitable, or a plain
# value. Convert it to a `Deferred`.
#
# Wrapping the value in a deferred has the side effect of executing the coroutine,
# if it is one. If it's already a deferred, then we can just use that.
d: "defer.Deferred[R]"
if isinstance(res, typing.Coroutine):
# Wrap the coroutine in a `Deferred`.
@ -820,20 +853,24 @@ def run_in_background(
# `res` is a plain value. Wrap it in a `Deferred`.
d = defer.succeed(res)
# The deferred has already completed
if d.called and not d.paused:
# The function should have maintained the logcontext, so we can
# optimise out the messing about
return d
# The function may have reset the context before returning, so
# we need to restore it now.
ctx = set_current_context(current)
# The function may have reset the context before returning, so we need to restore it
# now.
#
# Our goal is to have the caller logcontext unchanged after firing off the
# background task and returning.
set_current_context(calling_context)
# The original context will be restored when the deferred
# completes, but there is nothing waiting for it, so it will
# get leaked into the reactor or some other function which
# wasn't expecting it. We therefore need to reset the context
# here.
# The original logcontext will be restored when the deferred completes, but
# there is nothing waiting for it, so it will get leaked into the reactor (which
# would then get picked up by the next thing the reactor does). We therefore
# need to reset the logcontext here (set the `sentinel` logcontext) before
# yielding control back to the reactor.
#
# (If this feels asymmetric, consider it this way: we are
# effectively forking a new thread of execution. We are
@ -841,7 +878,7 @@ def run_in_background(
# which is supposed to have a single entry and exit point. But
# by spawning off another deferred, we are effectively
# adding a new exit point.)
d.addBoth(_set_context_cb, ctx)
d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
return d
@ -859,20 +896,34 @@ def run_coroutine_in_background(
coroutine directly rather than a function. We can do this because coroutines
do not run until called, and so calling an async function without awaiting
cannot change the log contexts.
"""
current = current_context()
This is an ergonomic helper so we can do this:
```python
run_coroutine_in_background(func1(arg1))
```
Rather than having to do this:
```python
run_in_background(lambda: func1(arg1))
```
"""
calling_context = current_context()
# Wrap the coroutine in a deferred, which will have the side effect of executing the
# coroutine in the background.
d = defer.ensureDeferred(coroutine)
# The function may have reset the context before returning, so
# we need to restore it now.
ctx = set_current_context(current)
# The function may have reset the context before returning, so we need to restore it
# now.
#
# Our goal is to have the caller logcontext unchanged after firing off the
# background task and returning.
set_current_context(calling_context)
# The original context will be restored when the deferred
# completes, but there is nothing waiting for it, so it will
# get leaked into the reactor or some other function which
# wasn't expecting it. We therefore need to reset the context
# here.
# The original logcontext will be restored when the deferred completes, but
# there is nothing waiting for it, so it will get leaked into the reactor (which
# would then get picked up by the next thing the reactor does). We therefore
# need to reset the logcontext here (set the `sentinel` logcontext) before
# yielding control back to the reactor.
#
# (If this feels asymmetric, consider it this way: we are
# effectively forking a new thread of execution. We are
@ -880,7 +931,7 @@ def run_coroutine_in_background(
# which is supposed to have a single entry and exit point. But
# by spawning off another deferred, we are effectively
# adding a new exit point.)
d.addBoth(_set_context_cb, ctx)
d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
return d
@ -888,24 +939,43 @@ T = TypeVar("T")
def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]":
"""Given a deferred, make it follow the Synapse logcontext rules:
If the deferred has completed, essentially does nothing (just returns another
completed deferred with the result/failure).
If the deferred has not yet completed, resets the logcontext before
returning a deferred. Then, when the deferred completes, restores the
current logcontext before running callbacks/errbacks.
(This is more-or-less the opposite operation to run_in_background.)
"""
Given a deferred, make it follow the Synapse logcontext rules:
- If the deferred has completed, essentially does nothing (just returns another
completed deferred with the result/failure).
- If the deferred has not yet completed, resets the logcontext before returning a
incomplete deferred. Then, when the deferred completes, restores the current
logcontext before running callbacks/errbacks.
This means the resultant deferred can be awaited without leaking the current
logcontext to the reactor (which would then get erroneously picked up by the next
thing the reactor does), and also means that the logcontext is preserved when the
deferred completes.
(This is more-or-less the opposite operation to run_in_background in terms of how it
handles log contexts.)
Pretty much equivalent to using `with PreserveLoggingContext():`, i.e. it clears the
logcontext before awaiting (and so before execution passes back to the reactor) and
restores the old context once the awaitable completes (execution passes from the
reactor back to the code).
"""
# The deferred has already completed
if deferred.called and not deferred.paused:
# it looks like this deferred is ready to run any callbacks we give it
# immediately. We may as well optimise out the logcontext faffery.
return deferred
# ok, we can't be sure that a yield won't block, so let's reset the
# logcontext, and add a callback to the deferred to restore it.
# Our goal is to have the caller logcontext unchanged after they yield/await the
# returned deferred.
#
# When the caller yield/await's the returned deferred, it may yield
# control back to the reactor. To avoid leaking the current logcontext to the
# reactor (which would then get erroneously picked up by the next thing the reactor
# does) while the deferred runs in the reactor event loop, we reset the logcontext
# and add a callback to the deferred to restore it so the caller's logcontext is
# active when the deferred completes.
prev_context = set_current_context(SENTINEL_CONTEXT)
deferred.addBoth(_set_context_cb, prev_context)
return deferred

View File

@ -228,6 +228,11 @@ def run_as_background_process(
clock.looping_call and friends (or for firing-and-forgetting in the middle of a
normal synapse async function).
Because the returned Deferred does not follow the synapse logcontext rules, awaiting
the result of this function will result in the log context being cleared (bad). In
order to properly await the result of this function and maintain the current log
context, use `make_deferred_yieldable`.
Args:
desc: a description for this background process type
server_name: The homeserver name that this background process is being run for
@ -280,6 +285,18 @@ def run_as_background_process(
name=desc, **{SERVER_NAME_LABEL: server_name}
).dec()
# To explain how the log contexts work here:
# - When this function is called, the current context is stored (using
# `PreserveLoggingContext`), we kick off the background task, and we restore the
# original context before returning (also part of `PreserveLoggingContext`).
# - When the background task finishes, we don't want to leak our background context
# into the reactor which would erroneously get attached to the next operation
# picked up by the event loop. We use `PreserveLoggingContext` to set the
# `sentinel` context and means the new `BackgroundProcessLoggingContext` will
# remember the `sentinel` context as its previous context to return to when it
# exits and yields control back to the reactor.
#
# TODO: Why can't we simplify to using `return run_in_background(run)`?
with PreserveLoggingContext():
# Note that we return a Deferred here so that it can be used in a
# looping_call and other places that expect a Deferred.