Commit Graph

3956 Commits

Author SHA1 Message Date
Andrew Morgan
2443760d0d
Update KeyUploadServlet to handle case where client sends device_keys: null (#19023) 2025-10-07 16:23:55 +01:00
Till
42bbff8294
Validate the body of requests to /keys/upload (#17097)
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Eric Eastwood <erice@element.io>
2025-10-07 11:27:53 +01:00
Eric Eastwood
6835e7be0d
Wrap the Rust HTTP client with make_deferred_yieldable (#18903)
Wrap the Rust HTTP client with `make_deferred_yieldable` so downstream
usage doesn't need to use `PreserveLoggingContext()` or
`make_deferred_yieldable`.

> it seems like we should have some wrapper around it that uses
[`make_deferred_yieldable(...)`](40edb10a98/docs/log_contexts.md (where-you-create-a-new-awaitable-make-it-follow-the-rules))
to make things right so we don't have to do this in the downstream code.
>
> *-- @MadLittleMods,
https://github.com/element-hq/synapse/pull/18357#discussion_r2294941827*

Spawning from wanting to [remove `PreserveLoggingContext()` from the
codebase](https://github.com/element-hq/synapse/pull/18870) and thinking
that we [shouldn't have to pollute all downstream usage with
`PreserveLoggingContext()` or
`make_deferred_yieldable`](https://github.com/element-hq/synapse/pull/18357#discussion_r2294941827)

Part of https://github.com/element-hq/synapse/issues/18905 (Remove
`sentinel` logcontext where we log in Synapse)
2025-10-02 13:00:50 -05:00
Eric Eastwood
06a84f4fe0
Revert "Switch to OpenTracing's ContextVarsScopeManager (#18849)" (#19007)
Revert https://github.com/element-hq/synapse/pull/18849

Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.

Fix https://github.com/element-hq/synapse/issues/19004

### Why revert?

For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.

Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. https://github.com/element-hq/synapse/pull/18804).

See
https://github.com/element-hq/synapse/issues/19004#issuecomment-3358052171
for more info.


### Does this require backporting and patch releases?

No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.



### Maintain cross-links between main trace and background process work

In order to maintain the functionality introduced in https://github.com/element-hq/synapse/pull/18932 (cross-links between the background process trace and currently active trace), we also needed a small change.

Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.

Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
2025-10-02 11:27:26 -05:00
Andrew Morgan
5fff5a1893 Merge branch 'develop' of github.com:element-hq/synapse into develop 2025-10-01 09:40:38 +01:00
Devon Hudson
396de6544a
Cleanly shutdown SynapseHomeServer object (#18828)
This PR aims to allow for a clean shutdown of the `SynapseHomeServer`
object so that it can be fully deleted and cleaned up by garbage
collection without shutting down the entire python process.

Fix https://github.com/element-hq/synapse-small-hosts/issues/50

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-10-01 02:42:09 +00:00
Eric Eastwood
5adb08f3c9
Remove MockClock() (#18992)
Spawning from adding some logcontext debug logs in
https://github.com/element-hq/synapse/pull/18966 and since we're not
logging at the `set_current_context(...)` level (see reasoning there),
this removes some usage of `set_current_context(...)`.

Specifically, `MockClock.call_later(...)` doesn't handle logcontexts
correctly. It uses the calling logcontext as the callback context
(wrong, as the logcontext could finish before the callback finishes) and
it didn't reset back to the sentinel context before handing back to the
reactor. It was like this since it was [introduced 10+ years
ago](38da9884e7).
Instead of fixing the implementation which would just be a copy of our
normal `Clock`, we can just remove `MockClock`
2025-09-30 11:27:29 -05:00
Andrew Morgan
2aab171042 Remove unstable prefixes for MSC2732
This MSC was accepted in 2022. We shouldn't need to continue supporting the unstable field names.
2025-09-30 17:10:32 +01:00
Eric Eastwood
5143f93dc9
Fix server_name in logging context for multiple Synapse instances in one process (#18868)
### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48

### Prior art

Previously, we exposed `server_name` by providing a static logging
`MetadataFilter` that injected the values:


205d9e4fc4/synapse/config/logger.py (L216)

While this can work fine for the normal case of one Synapse instance per
Python process, this configures things globally and isn't compatible
when we try to start multiple Synapse instances because each subsequent
tenant will overwrite the previous tenant.


### What does this PR do?

We remove the `MetadataFilter` and replace it by tracking the
`server_name` in the `LoggingContext` and expose it with our existing
[`LoggingContextFilter`](205d9e4fc4/synapse/logging/context.py (L584-L622))
that we already use to expose information about the `request`.

This means that the `server_name` value follows wherever we log as
expected even when we have multiple Synapse instances running in the
same process.


### A note on logcontext

Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log
something, we won't know which server sent the log. We've been fixing up
`sentinel` logcontext usage as tracked by
https://github.com/element-hq/synapse/issues/18905

Any further `sentinel` logcontext usage we find in the future can be
fixed piecemeal as normal.


d2a966f922/docs/log_contexts.md (L71-L81)


### Testing strategy

1. Adjust your logging config to include `%(server_name)s` in the format
    ```yaml
    formatters:
        precise:
format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d -
%(levelname)s - %(request)s - %(message)s'
    ```
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Make some requests (`curl
http://localhost:8008/_matrix/client/versions`, etc)
1. Open the homeserver logs and notice the `server_name` in the logs as
expected. `unknown_server_from_sentinel_context` is expected for the
`sentinel` logcontext (things outside of Synapse).
2025-09-26 17:10:48 -05:00
Eric Eastwood
2f2b854ac1
Fix logcontext handling in timeout_deferred tests (#18974)
Related to https://github.com/element-hq/synapse/issues/18905

These fixes were split off from
https://github.com/element-hq/synapse/pull/18828 where @devonh was
seeing some test failures because `timeout_deferred(...)` is being
updated to use `Clock` utilities instead of raw `reactor` methods. This
test was failing in that branch/PR until we made this new version that
handles the logcontexts properly.

While the previous version of this test does pass on `develop`, it was
using what appears completely wrong assertions, assumptions, and bad
patterns to make it happen (see diff comments below)

---

Test originally introduced in https://github.com/matrix-org/synapse/pull/4407
2025-09-26 11:10:02 -05:00
Travis Ralston
d2a966f922
Use signature support from policy servers when available (#18934)
Opening on Kegan's behalf


[MSC4284](https://github.com/matrix-org/matrix-spec-proposals/pull/4284)
has already been opened accordingly.

---------

Co-authored-by: Kegan Dougal <7190048+kegsay@users.noreply.github.com>
Co-authored-by: Eric Eastwood <erice@element.io>
2025-09-25 19:30:24 +00:00
Hugh Nimmo-Smith
fd8fa97b6a
Document and fix room_config param when user_may_create_room callback is invoked for a room upgrade (#18721)
Co-authored-by: Eric Eastwood <erice@element.io>
2025-09-24 21:42:19 +00:00
Eric Eastwood
5266e423e2
Explain how Deferred callbacks interact with logcontexts (#18914)
Spawning from
https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321

> It turns out `Deferred.cancel()` is a lot like
`Deferred.callback()`/`errback()` in that it will trash the logging
context:
> it can resume a coroutine, which will restore its own logging context,
then run:
> 
>  - until it blocks, setting the sentinel context
>  - or until it terminates, setting the context it was started with
> 
> So we need to wrap it in `with PreserveLoggingContext():`, like we do
with `.callback()`:
> 
> ```python
> with PreserveLoggingContext():
>     self.render_deferred.cancel()
> ```
>
> *-- @squahtx,
https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321*
2025-09-24 16:20:42 -05:00
Eric Eastwood
0458f691b6
Fix run_coroutine_in_background(...) incorrectly handling logcontext (#18964)
Regressed in
https://github.com/element-hq/synapse/pull/18900#discussion_r2331554278
(see conversation there for more context)


### How is this a regression?

> To give this an update with more hindsight; this logic *was* redundant
with the early return and it is safe to remove this complexity

> 
> It seems like this actually has to do with completed vs incomplete
deferreds...
> 
> To explain how things previously worked *without* the early-return
shortcut:
> 
> With the normal case of **incomplete awaitable**, we store the
`calling_context` and the `f` function is called and runs until it
yields to the reactor. Because `f` follows the logcontext rules, it sets
the `sentinel` logcontext. Then in `run_in_background(...)`, we restore
the `calling_context`, store the current `ctx` (which is `sentinel`) and
return. When the deferred completes, we restore `ctx` (which is
`sentinel`) before yielding to the reactor again (all good
)
> 
> With the other case where we see a **completed awaitable**, we store
the `calling_context` and the `f` function is called and runs to
completion (no logcontext change). *This is where the shortcut would
kick in but I'm going to continue explaining as if we commented out the
shortcut.* -- Then in `run_in_background(...)`, we restore the
`calling_context`, store the current `ctx` (which is same as the
`calling_context`). Because the deferred is already completed, our extra
callback is called immediately and we restore `ctx` (which is same as
the `calling_context`). Since we never yield to the reactor, the
`calling_context` is perfect as that's what we want again (all good
)
> 
> ---
> 
> But this also means that our early-return shortcut is no longer just
an optimization and is *necessary* to act correctly in the **completed
awaitable** case as we want to return with the `calling_context` and not
reset to the `sentinel` context. I've updated the comment in
https://github.com/element-hq/synapse/pull/18964 to explain the
necessity as it's currently just described as an optimization.
> 
> But because we made the same change to
`run_coroutine_in_background(...)` which didn't have the same
early-return shortcut, we regressed the correct behavior  . This is
being fixed in https://github.com/element-hq/synapse/pull/18964
>
>
> *-- @MadLittleMods,
https://github.com/element-hq/synapse/pull/18900#discussion_r2373582917*

### How did we find this problem?

Spawning from @wrjlewis
[seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io)
`Starting metrics collection 'typing.get_new_events' from sentinel
context: metrics will be lost` in the logs:

<details>
<summary>More logs</summary>

```
synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/
```

</details>

From the logs there, we can see things relating to
`typing.get_new_events` and
`/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led
me to trying out Sliding Sync with the typing extension enabled and
allowed me to reproduce the problem locally. Sliding Sync is a unique
scenario as it's the only place we use `gather_optional_coroutines(...)`
-> `run_coroutine_in_background(...)` (introduced in
https://github.com/element-hq/synapse/pull/17884) to exhibit this
behavior.


### Testing strategy

1. Configure Synapse to enable
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186):
Simplified Sliding Sync which is actually under
[MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575)
    ```yaml
    experimental_features:
      msc3575_enabled: true
    ```
1. Start synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Make a Sliding Sync request with one of the extensions enabled
    ```http
POST
http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
    {
      "lists": {},
      "room_subscriptions": {
            "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
                "required_state": [],
                "timeline_limit": 1
            }
        },
        "extensions": {
            "typing": {
                "enabled": true
            }
        }
    }
    ```
1. Open your homeserver logs and notice warnings about `Starting ...
from sentinel context: metrics will be lost`
2025-09-24 15:24:47 +00:00
Tulir Asokan
512b3f50cf
Update MSC4326 error code (#18947) 2025-09-24 11:57:24 +01:00
Shay
35c9cbb09d
Add an Admin API to query a piece of local or cached remote media by ID (#18911) 2025-09-23 16:25:56 -05:00
Andrew Morgan
ddc7627b22
Fix performance regression related to delayed events processing (#18926) 2025-09-23 09:47:30 +01:00
Eric Eastwood
5be7679dd9
Split loading config vs homeserver setup (#18933)
This allows us to get access to `server_name` so we can use it when
creating the `LoggingContext("main")` in the future (pre-requisite for
https://github.com/element-hq/synapse/pull/18868).

This also allows us more flexibility to parse config however we want and
setup a Synapse homeserver. Like what we do in [Synapse Pro for Small
Hosts](https://github.com/element-hq/synapse-small-hosts).

Split out from https://github.com/element-hq/synapse/pull/18868
2025-09-22 14:53:02 -05:00
Eric Eastwood
e7d98d3429
Remove sentinel logcontext in Clock utilities (looping_call, looping_call_now, call_later) (#18907)
Part of https://github.com/element-hq/synapse/issues/18905

Lints for ensuring we use `Clock.call_later` instead of
`reactor.callLater`, etc are coming in
https://github.com/element-hq/synapse/pull/18944

### Testing strategy

 1. Configure Synapse to log at the `DEBUG` level
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Wait 10 seconds for the [database profiling
loop](9cc4001778/synapse/storage/database.py (L711))
to execute
1. Notice the logcontext being used for the `Total database time` log
line

Before (`sentinel`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - sentinel - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```

After (`looping_call`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - looping_call - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```
2025-09-22 14:51:13 -05:00
Eric Eastwood
9a88d25f8e
Fix run_in_background not be awaited properly causing LoggingContext problems (#18937)
Basically, searching for any instance of `run_in_background(...)` and
making sure we wrap the deferred in `make_deferred_yieldable(...)` if we
try to `await` the result to make it follow the [Synapse logcontext
rules](https://github.com/element-hq/synapse/blob/develop/docs/log_contexts.md).

Turns out, we only have this problem in some tests (phew)

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:55:45 -05:00
Eric Eastwood
5a9ca1e3d9
Introduce Clock.call_when_running(...) to include logcontext by default (#18944)
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.

Background:

>  Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` 
>  logcontext as we want to know which server the logs came from. In practice, this is not 
>  always the case yet especially outside of request handling. 
>   
>  Global things outside of Synapse (e.g. Twisted reactor code) should run in the 
>  `sentinel` logcontext. It's only when it calls into application code that a logcontext 
>  gets activated. This means the reactor should be started in the `sentinel` logcontext, 
>  and any time an awaitable yields control back to the reactor, it should reset the 
>  logcontext to be the `sentinel` logcontext. This is important to avoid leaking the 
>  current logcontext to the reactor (which would then get picked up and associated with 
>  the next thing the reactor does). 
>
> *-- `docs/log_contexts.md`

Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:27:59 -05:00
Tulir Asokan
d80f515622
Update MSC4190 support (#18946) 2025-09-22 14:45:05 +01:00
Eric Eastwood
6f9fab1089
Fix open redirect in legacy SSO flow (idp) (#18909)
- Validate the `idp` parameter to only accept the ones that are known in
the config file
- URL-encode the `idp` parameter for safety's sake (this is the main
fix)

Fix https://github.com/matrix-org/internal-config/issues/1651 (internal
link)

Regressed in https://github.com/element-hq/synapse/pull/17972
2025-09-17 13:54:47 -05:00
Hugh Nimmo-Smith
e1036ffa48
Add get_media_upload_limits_for_user and on_media_upload_limit_exceeded callbacks to module API (#18848)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-09-12 12:26:19 +01:00
Andrew Morgan
8c98cf7e55
Remove usage of deprecated pkg_resources interface (#18910) 2025-09-12 10:57:04 +01:00
reivilibre
ada3a3b2b3
Add experimental support for MSC4308: Thread Subscriptions extension to Sliding Sync when MSC4306 and MSC4186 are enabled. (#18695)
Closes: #18436

Implements:
https://github.com/matrix-org/matrix-spec-proposals/pull/4308

Follows: #18674

Adds an extension to Sliding Sync and a companion
endpoint needed for backpaginating missed thread subscription changes,
as described in MSC4308

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-09-11 14:45:04 +01:00
reivilibre
c68c5dd07b
Update push rules for experimental MSC4306: Thread Subscriptions to follow newer draft. (#18846)
Follows: #18762

Implements: MSC4306

Closes: #18431
Closes: #18437

Move the MSC4306 push rules to a new kind `postcontent` 

Prevent users from creating user-defined `postcontent` rules 

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-09-09 18:37:04 +01:00
Eric Eastwood
ca655e4020
Start background tasks after we fork the process (daemonize) (#18886)
Spawning from https://github.com/element-hq/synapse/pull/18871

[This change](6ce2f3e59d)
was originally used to fix CPU time going backwards when we `daemonize`.

While, we don't seem to run into this problem on `develop`, I still
think this is a good change to make. We don't need background tasks
running on a process that will soon be forcefully exited and where the
reactor isn't even running yet. We now kick off the background tasks
(`run_as_background_process`) after we have forked the process and
started the reactor.

Also as simple note, we don't need background tasks running in both halves of a fork.
2025-09-09 10:10:34 -05:00
reivilibre
6fe8137a4a
Configure Synapse to run MSC4306: Thread Subscriptions Complement tests. (#18819)
Pairs with: https://github.com/matrix-org/complement/pull/795

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-09-09 11:40:10 +01:00
reivilibre
dfccde9f60
Remove obsolete and experimental /sync/e2ee endpoint. (#18583)
Introduced in: https://github.com/element-hq/synapse/pull/17167

The endpoint was part of experiments for MSC3575 but does not feature in
that MSC.

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-09-09 09:28:45 +01:00
Eric Eastwood
bff4a11b3f
Re-introduce: Fix LaterGauge metrics to collect from all servers (#18791)
Re-introduce: https://github.com/element-hq/synapse/pull/18751 that was
reverted in https://github.com/element-hq/synapse/pull/18789 (explains
why the PR was reverted in the first place).

- Adds a `cleanup` pattern that cleans up metrics from each homeserver
in the tests. Previously, the list of hooks built up until our CI
machines couldn't operate properly, see
https://github.com/element-hq/synapse/pull/18789
- Fix long-standing issue with `synapse_background_update_status`
metrics only tracking the last database listed in the config (see
https://github.com/element-hq/synapse/pull/18791#discussion_r2261706749)
2025-09-02 12:14:27 -05:00
Quentin Gliech
537e14169e
Support stable endpoint and scopes from the MSC3861 family (#18549)
This adds stable APIs for both MSC2965 and MSC2967
2025-09-02 13:55:12 +02:00
Eric Eastwood
27fc3389f3
Switch to OpenTracing's ContextVarsScopeManager (#18849)
Switch to OpenTracing's `ContextVarsScopeManager` instead of our own
custom `LogContextScopeManager`.

This is now possible because the linked Twisted issue from the comment
in our custom `LogContextScopeManager` is resolved:
https://twistedmatrix.com/trac/ticket/10301

This PR is spawning from exploring different possibilities to solve the
`scope` loss problem I was encountering in
https://github.com/element-hq/synapse/pull/18804#discussion_r2268254424.
This appears to solve the problem and I've added the additional test
from there to this PR 
2025-08-27 11:41:00 -05:00
Eric Eastwood
5c20a60f0b
Update tests to ensure all database tables are emptied when purging a room (#18794)
Spawning from wanting to confirm my replies in
https://github.com/element-hq/synapse/issues/18489

We're now using the same source of truth of the list of tables being
purged in the tests. For example, we weren't testing that
`local_current_membership` was cleared out before because the lists were
out of sync.
2025-08-13 15:05:41 -05:00
Andrew Morgan
526b875e03
Make the Make Room Admin API choose to puppet the room creator in v12 rooms (#18805) 2025-08-12 10:11:03 +00:00
Devon Hudson
9301baa5f8
Fix hydra tests 2025-08-11 11:32:57 -06:00
Devon Hudson
81f815ee33 Remove unnecessary extra deferred 2025-08-11 16:43:45 +01:00
Devon Hudson
3108fa32d3 Use internal function to appease olddeps test 2025-08-11 16:43:45 +01:00
Devon Hudson
e65a6fc58a Fix FakeClock to match new sleep signature 2025-08-11 16:43:45 +01:00
Devon Hudson
bd8f12f9c6 Fix broken test 2025-08-11 16:43:45 +01:00
Kegan Dougal
0eb7252a23 Support for room version 12 2025-08-11 16:43:45 +01:00
Eric Eastwood
ff03a51cb0
Revert "Fix LaterGauge metrics to collect from all servers (#18751)" (#18789)
This PR reverts https://github.com/element-hq/synapse/pull/18751

### Why revert?

@reivilibre
[found](https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$u9OEmMxaFYUzWHhCk1A_r50Y0aGrtKEhepF7WxWJkUA?via=matrix.org&via=node.marinchik.ink&via=element.io)
that our CI was failing in bizarre ways (thanks for stepping up to dive
into this 🙇). Examples:

- `twisted.internet.error.ProcessTerminated: A process has ended with a
probable error condition: process ended by signal 9.`
- `twisted.internet.error.ProcessTerminated: A process has ended with a
probable error condition: process ended by signal 15.`

<details>
<summary>More detailed part of the log</summary>


https://github.com/element-hq/synapse/actions/runs/16758038107/job/47500520633#step:9:6809
```
tests.util.test_wheel_timer.WheelTimerTestCase.test_single_insert_fetch
===============================================================================
Error: 
Traceback (most recent call last):
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/disttrial.py", line 371, in task
    await worker.run(case, result)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/worker.py", line 305, in run
    return await self.callRemote(workercommands.Run, testCase=testCaseId)  # type: ignore[no-any-return]
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1187, in __iter__
    yield self
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1092, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/protocols/amp.py", line 1968, in _massageError
    error.trap(RemoteAmpError)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 431, in trap
    self.raiseException()
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 455, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 9.

tests.util.test_macaroons.MacaroonGeneratorTestCase.test_guest_access_token
-------------------------------------------------------------------------------
Ran 4325 tests in 669.321s

FAILED (skips=159, errors=62, successes=4108)
while calling from thread
Traceback (most recent call last):
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/base.py", line 1064, in runUntilCurrent
    f(*a, **kw)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/base.py", line 790, in stop
    raise error.ReactorNotRunning("Can't stop reactor that isn't running.")
twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.

joining disttrial worker #0 failed
Traceback (most recent call last):
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks
    result = context.run(
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/worker.py", line 406, in exit
    await endDeferred
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1187, in __iter__
    yield self
twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 15.
```

</details>


With more debugging (thanks @devonh for also stepping in as maintainer),
we were finding that the CI was consistently failing at
`test_exposed_to_prometheus` which was a bit of smoke because of all of
the [metrics
changes](https://github.com/element-hq/synapse/issues/18592) that were
merged recently.

Locally, although I wasn't able to reproduce the bizarre errors, I could
easily see increased memory usage (~20GB vs ~2GB) and the
`test_exposed_to_prometheus` test taking a while to complete when
running a full test run (`SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial
tests`).

<img width="1485" height="78" alt="Lots of memory usage"
src="https://github.com/user-attachments/assets/811e2a96-75e5-4a3c-966c-00dc0512cea9"
/>

After updating `test_exposed_to_prometheus` to dump the
`latest_metrics_response = generate_latest(REGISTRY)`, I could see that
it's a massive 3.2GB response. Inspecting the contents, we can see 4.1M
(4,137,123) entries for just
`synapse_background_update_status{server_name="test"} 3.0` which is a
`LaterGauge`. I don't think we have 4.1M test cases so it's also unclear
why we end up with so many samples but it does make sense that we do see
a lot of duplicates because each `HomeserverTestCase` will create a
homeserver for each test case that will `LaterGauge.register_hook(...)`
(part of the https://github.com/element-hq/synapse/pull/18751 changes).

`tests/storage/databases/main/test_metrics.py`
```python
        latest_metrics_response = generate_latest(REGISTRY)
        with open("/tmp/synapse-test-metrics", "wb") as f:
            f.write(latest_metrics_response)
```

After reverting the https://github.com/element-hq/synapse/pull/18751
changes, running the full test suite locally doesn't result in memory
spikes and seems to run normally.



### Dev notes

Discussion in the
[`#synapse-dev:matrix.org`](https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$vkMATs04yqZggVVd6Noop5nU8M2DVoTkrAWshw7u1-w?via=matrix.org&via=node.marinchik.ink&via=element.io)
room.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [ ] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [ ] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-08-06 22:14:40 +00:00
reivilibre
6514381b02
Implement the push rules for experimental MSC4306: Thread Subscriptions. (#18762)
Follows: #18756

Implements: MSC4306

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-08-06 15:33:52 +01:00
reivilibre
8306cee06a
Update implementation of MSC4306: Thread Subscriptions to include automatic subscription conflict prevention as introduced in later drafts. (#18756)
Follows: #18674

Implements new drafts of MSC4306

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
2025-08-05 18:22:53 +00:00
Eric Eastwood
076db0ab49
Fix LaterGauge metrics to collect from all servers (#18751)
Fix `LaterGauge` metrics to collect from all servers

Follow-up to https://github.com/element-hq/synapse/pull/18714

Previously, our `LaterGauge` metrics did include the `server_name` label
as expected but we were only seeing the last server being reported in
some cases. Any `LaterGauge` that we were creating multiple times was
only reporting the last instance.

This PR updates all `LaterGauge` to be created once and then we use
`LaterGauge.register_hook(...)` to add in the metric callback as before.
This works now because we store a list of callbacks instead of just one.

I noticed this problem thanks to some [tests in the Synapse Pro for
Small Hosts](https://github.com/element-hq/synapse-small-hosts/pull/173)
repo that sanity check all metrics to ensure that we can see each metric
includes data from multiple servers.


### Testing strategy

1. This is only noticeable when you run multiple Synapse instances in
the same process.
 1. TODO

(see test that was added)

### Dev notes

Previous non-global `LaterGauge`:

```
synapse_federation_send_queue_xxx
synapse_federation_transaction_queue_pending_destinations
synapse_federation_transaction_queue_pending_pdus
synapse_federation_transaction_queue_pending_edus
synapse_handlers_presence_user_to_current_state_size
synapse_handlers_presence_wheel_timer_size
synapse_notifier_listeners
synapse_notifier_rooms
synapse_notifier_users
synapse_replication_tcp_resource_total_connections
synapse_replication_tcp_command_queue
synapse_background_update_status
synapse_federation_known_servers
synapse_scheduler_running_tasks
```



### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-08-05 15:28:55 +00:00
Erik Johnston
20615115fb
Make .sleep(..) return a coroutine (#18772)
This helps ensure that mypy can catch places where we don't await on it,
like in #18763.

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-08-05 09:30:52 +01:00
Quentin Gliech
7ed55666b5
Stabilise MAS integration (#18759)
This can be reviewed commit by commit

There are a few improvements over the experimental support:

- authorisation of Synapse <-> MAS requests is simplified, with a single
shared secret, removing the need for provisioning a client on the MAS
side
- the tests actually spawn a real server, allowing us to test the rust
introspection layer
- we now check that the device advertised in introspection actually
exist, making it so that when a user logs out, the tokens are
immediately invalidated, even if the cache doesn't expire
- it doesn't rely on discovery anymore, rather on a static endpoint
base. This means users don't have to override the introspection endpoint
to avoid internet roundtrips
- it doesn't depend on `authlib` anymore, as we simplified a lot the
calls done from Synapse to MAS

We still have to update the MAS documentation about the Synapse setup,
but that can be done later.

---------

Co-authored-by: reivilibre <oliverw@element.io>
2025-08-04 15:48:45 +02:00
Erik Johnston
72cd5cccf7
Make room upgrades faster for rooms with many bans (#18574)
We do this by a) not pulling out all membership events, and b) batch
inserting bans.

One blocking concern is that this bypasses the `update_membership`
function, which otherwise all other membership events go via. In this
case it's fine (having audited what it is doing), but I'm hesitant to
set the precedent of bypassing it, given it has a lot of logic in there.

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-08-04 10:42:52 +01:00
Eric Eastwood
e16fbdcdcc
Update metrics linting to be able to handle custom metrics (#18733)
Part of https://github.com/element-hq/synapse/issues/18592
2025-08-01 15:34:11 -05:00
Eric Eastwood
e43a1cec84
Fix cache metrics to collect from all servers (#18748)
Follow-up to https://github.com/element-hq/synapse/pull/18604

Previously, our cache metrics did include the `server_name` label as
expected but we were only seeing the last server being reported. This
was caused because we would
`CACHE_METRIC_REGISTRY.register_hook(metric_name, metric.collect)` where
the `metric_name` only took into account the cache name so it would be
overwritten every time we spawn a new server.

This PR updates the register logic to include the `server_name` so we
have a hook for every cache on every server as expected.

I noticed this problem thanks to some [tests in the Synapse Pro for
Small Hosts](https://github.com/element-hq/synapse-small-hosts/pull/173)
repo that sanity check all metrics to ensure that we can see each metric
includes data from multiple servers.
2025-08-01 12:29:58 -05:00