As per recent proposals in MSC4140, remove authentication for
restarting/cancelling/sending a delayed event, and give each of those
actions its own endpoint. (The original consolidated endpoint is still
supported for backwards compatibility.)
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Half-Shot <will@half-shot.uk>
This was unintentionally changed in
https://github.com/element-hq/synapse/pull/19068.
There is no real bug here. Without this PR, we just printed an empty
string for the `sentinel` logcontext whereas the prior art behavior was
to print `sentinel` which this PR restores.
Found while staring at the logs in
https://github.com/element-hq/synapse/issues/19165
### Reproduction strategy
1. Configure Synapse with
[logging](df802882bb/docs/sample_log_config.yaml)
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Notice the `asyncio - 64 - DEBUG - - Using selector: EpollSelector`
log line (notice empty string `- -`)
1. With this PR, the log line will be `asyncio - 64 - DEBUG - sentinel -
Using selector: EpollSelector` (notice `sentinel`)
This arises mostly from my recent experience adding a stream for Thread
Subscriptions
and trying to help others add their own streams.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Spawning a background process comes with a bunch of overhead, so let's
try to reduce the number of background processes we need to spawn when
handling inbound fed.
Currently, we seem to be doing roughly one per command. Instead, lets
keep the background process alive for a bit waiting for a new command to
come in.
I noticed this in some profiling. Basically, we prune the ratelimiters
by copying and iterating over every entry every 60 seconds. Instead,
let's use a wheel timer to track when we should potentially prune a
given key, and then we a) check fewer keys, and b) can run more
frequently. Hopefully this should mean we don't have a large pause
everytime we prune a ratelimiter with lots of keys.
Also fixes a bug where we didn't prune entries that were added via
`record_action` and never subsequently updated. This affected the media
and joins-per-room ratelimiter.
This regressed in https://github.com/element-hq/synapse/pull/19121. I
moved things in https://github.com/element-hq/synapse/pull/19121 because
I thought that it made sense to redirect anything printed to
`stdout`/`stderr` to the logs as early as possible. But we actually want
to log any immediately apparent problems during initialization to
`stderr` in the terminal so that they are obvious and visible to the
operator.
Now, I've moved `redirect_stdio_to_logs()` back to where it was
previously along with some proper comment context for why we have it
there.
- Move `register_start` (calls `os._exit(1)`) out of `setup` (our
composable function)
- We want to avoid `exit(...)` because we use these composable functions
in Synapse Pro for small hosts where we have multiple Synapse instances
running in the same process. We don't want a problem from one homeserver
tenant causing the entire Python process to exit and affect all of the
other homeserver tenants.
- Continuation of https://github.com/element-hq/synapse/pull/19116
- Align our app entrypoints: `homeserver` (main), `generic_worker`
(worker), and `admin_cmd`
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.
"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Move exception handling up the stack (avoid `exit(1)` in our composable
functions)
Relevant to Synapse Pro for small hosts as we don't want to exit the
entire Python process and affect all homeserver tenants.
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.
"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Be mindful that Synapse can be run alongside other code in the same
Python process. We shouldn't overwrite fields on given log record unless
we know it's relevant to Synapse.
(no clobber)
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
The schema lint tries to make sure we don't add or remove indices in
schema files (rather than as background updates), *unless* the table was
created in the same schema file.
The regex to pull out the `CREATE TABLE` SQL incorrectly didn't
recognise `IF NOT EXISTS`.
There is a test delta file that shows that we accept different types of
`CREATE TABLE` and `CREATE INDEX` statements, as well as an index
creation that doesn't have a matching create table (to show that we do
still catch it). The test delta should be removed before merge.
Fix lost logcontext when using `timeout_deferred(...)` and things
actually timeout.
Fix https://github.com/element-hq/synapse/issues/19087 (our HTTP client
times out requests using `timeout_deferred(...)`
Fix https://github.com/element-hq/synapse/issues/19066 (`/sync` uses
`notifier.wait_for_events()` which uses `timeout_deferred(...)` under
the hood)
### When/why did these lost logcontext warnings start happening?
```
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later but found POST-2453
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later was lost
```
In https://github.com/element-hq/synapse/pull/18828, we switched
`timeout_deferred(...)` from using `reactor.callLater(...)` to
[`clock.call_later(...)`](3b59ac3b69/synapse/util/clock.py (L224-L313))
under the hood. This meant it started dealing with logcontexts but our
`time_it_out()` callback didn't follow our [Synapse logcontext
rules](3b59ac3b69/docs/log_contexts.md).
Be mindful that it's possible to run Synapse multiple times in the same
Python process. So we only need to do some part of the logging setup
once.
- We only need to setup the global log record factory and context filter
once
- We only need to redirect Twisted logging once
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Be mindful that Synapse can be run alongside other code in the same
Python process. We shouldn't clobber other `SIGHUP` handlers as only one
can be set at time.
(no clobber)
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Relevant to logging as we use a `SIGHUP` to reload log config in
Synapse.
This is a normal
problem where we `await` a deferred without wrapping it in
`make_deferred_yieldable(...)`. But I've opted to replace the usage of
`deferLater` with something more standard for the Synapse codebase.
Part of https://github.com/element-hq/synapse/issues/18905
It's unclear why we're only now seeing these failures happen with the
changes from https://github.com/element-hq/synapse/pull/19057
Example failures seen in
https://github.com/element-hq/synapse/actions/runs/18477454390/job/52645183606?pr=19057
```
builtins.AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw task-_resumable_task-0-IBzAmHUoepQfLnEA. In other words, another task shouldn't have leaked their logcontext to us.
```
It has been available since Pillow 6, and Synapse is now pinned on
Pillow >=10.0.1.
Found this while looking at Debian-shipped dependencies, and figured
this may as well be updated.
It is often useful when investigating a space to get information about
that space and it's children. This PR adds an Admin API to return
information about a space and it's children, regardless of room
membership. Will not fetch information over federation about remote
rooms that the server is not participating in.
I couldn't really find any documentation regarding how to setup TLS
communication between Synapse and Redis, so I looked through the source
code and found it. I figured I should go ahead and document it here.
These errors are harmless and are a long-standing issue that is just now
being logged, see https://github.com/element-hq/synapse/issues/19042
```
2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com
2025-10-10 16:30:00.167
2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com
2025-10-10 16:30:00.167
2025-10-10 15:30:00,025 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_event_example.com already registered for server example.com
2025-10-10 16:29:15.560
2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com
2025-10-10 16:29:15.560
2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com
```
(more sane standard location for this sort of thing)
The one difference here is that previously, `start_doing_background_updates
()` only ran on the main Synapse instance. But since it now lives in
`start_background_tasks()`, it will run on the worker that supposed to
`run_background_tasks`. Doesn't seem like a problem though.
This means we
can move the open registration config validation from `setup()` to
`HomeServerConfig.validate_config()` (much more sane).
Spawning from looking at this area of code in
https://github.com/element-hq/synapse/pull/19015
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/221
### Partial startup problem
In the context of Synapse Pro for Small Hosts, since the Twisted reactor
is already running (from the `multi_synapse` shard process itself), when
provisioning a homeserver tenant, the `reactor.callWhenRunning(...)`
callbacks will be invoked immediately. This includes the Synapse's
[`start`](0615b64bb4/synapse/app/homeserver.py (L418-L429))
callback which sets up everything (including listeners, background
tasks, etc). If we encounter an error at this point, we are partially
setup but the exception will [bubble back to
us](8be122186b/multi_synapse/app/shard.py (L114-L121))
without us having a handle to the homeserver yet so we can't call
`hs.shutdown()` and clean everything up.
### What does this PR do?
Structures Synapse so we split creating the homeserver instance from
setting everything up. This way we have access to `hs` if anything goes
wrong during setup and can subsequently `hs.shutdown()` to clean
everything up.
This was originally removed in
https://github.com/element-hq/synapse/pull/18886 but it looks like it
snuck back in https://github.com/element-hq/synapse/pull/18828 during a
[bad
merge](4cd3d9172e).
Noticed while looking at Synapse setup and startup (just by happen
stance).
I don't think this has adverse effects on Synapse actually working and
`start_background_tasks()` can be called multiple times.
### Is there a good way to audit all of these merges?
As I would like to see the conflicts for each merge.
This works but it's still hard to notice anything is wrong:
```
git log --remerge-diff <commit-sha>
```
> shows the difference from mechanical merge result and the result that
is actually recorded in a merge commit
via
https://stackoverflow.com/questions/15277708/how-do-you-see-show-a-git-merge-conflict-resolution-that-was-done-given-a-mer/71181334#71181334
The following better. Specify the version range to the commit right
before the merge to the merge. And can even specify which file to look
at to make it more obvious with the hindsight we have now.
```
git log --remerge-diff <merge-commit-sha>~1..<merge-commit-sha> -- synapse/server.py
```
Example:
```
git log --remerge-diff 4cd3d9172ed7b87e509746851a376c861a27820e~1..4cd3d9172ed7b87e509746851a376c861a27820e -- synapse/server.py
```
See https://github.com/matrix-org/synapse/pull/12973 where we previously
used `version_string="Synapse/" +
get_distribution_version_string("matrix-synapse")` everywhere; and then
updated to use `version_string=f"Synapse/{SYNAPSE_VERSION}"` for every
other place except `synapse/app/homeserver.py` (why?!?!?!). This seems
more like a typo than something on purpose especially without any
context in the comments or PR. The whole point of that PR was trying to
solve the missing git info in version strings.
For reference, here is what both variables look like for me locally on
the latest `develop`:
- `SYNAPSE_VERSION`: `1.139.0 (b=develop,1d2ddbc76e,dirty)`
- `VERSION`: `1.139.0`
Only reason we may want to do this is to hide the branch name (some
sensitive name that exposes a security fix, etc). But we don't hide
anything:
`https://matrix.org/_matrix/federation/v1/version`
```json
{
"server": {
"name": "Synapse",
"version": "1.139.0rc3 (b=matrix-org-hotfixes-priv,f538ed5ac3)"
}
}
```
On `matrix.org`, the `Server` response header is masked as `cloudflare`
which would otherwise show `1.139.0rc3` for everything from the main
process.
---
This is spawning from looking at the way we setup and start Synapse for
homeserver tenant provisioning in the Synapse Pro for Small Hosts
project (https://github.com/element-hq/synapse-small-hosts/issues/221)
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Eric Eastwood <erice@element.io>
Add debug logs wherever we change current logcontext (`LoggingContext`).
I've had to make this same set of changes over and over as I've been
debugging things so it seems useful enough to include by default.
Instead of tracing things at the `set_current_context(...)` level, I've
added the debug logging on all of the utilities that utilize
`set_current_context(...)`. It's much easier to reason about the log
context changing because of `PreserveLoggingContext` changing things
than an opaque `set_current_context(...)` call.
Revert https://github.com/element-hq/synapse/pull/18849
Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.
Fix https://github.com/element-hq/synapse/issues/19004
### Why revert?
For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.
Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. https://github.com/element-hq/synapse/pull/18804).
See
https://github.com/element-hq/synapse/issues/19004#issuecomment-3358052171
for more info.
### Does this require backporting and patch releases?
No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.
### Maintain cross-links between main trace and background process work
In order to maintain the functionality introduced in https://github.com/element-hq/synapse/pull/18932 (cross-links between the background process trace and currently active trace), we also needed a small change.
Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.
Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
Prefer the utils over raw logcontext manipulation.
Spawning from adding some logcontext debug logs in
https://github.com/element-hq/synapse/pull/18966 and since we're not
logging at the `set_current_context(...)` level (see reasoning there),
this removes some usage of `set_current_context(...)`.
Spawning from adding some logcontext debug logs in
https://github.com/element-hq/synapse/pull/18966 and since we're not
logging at the `set_current_context(...)` level (see reasoning there),
this removes some usage of `set_current_context(...)`.
Specifically, `MockClock.call_later(...)` doesn't handle logcontexts
correctly. It uses the calling logcontext as the callback context
(wrong, as the logcontext could finish before the callback finishes) and
it didn't reset back to the sentinel context before handing back to the
reactor. It was like this since it was [introduced 10+ years
ago](38da9884e7).
Instead of fixing the implementation which would just be a copy of our
normal `Clock`, we can just remove `MockClock`