synapse

Author	SHA1	Message	Date
Eric Eastwood	5adb08f3c9	Remove `MockClock()` (#18992 ) Spawning from adding some logcontext debug logs in https://github.com/element-hq/synapse/pull/18966 and since we're not logging at the `set_current_context(...)` level (see reasoning there), this removes some usage of `set_current_context(...)`. Specifically, `MockClock.call_later(...)` doesn't handle logcontexts correctly. It uses the calling logcontext as the callback context (wrong, as the logcontext could finish before the callback finishes) and it didn't reset back to the sentinel context before handing back to the reactor. It was like this since it was [introduced 10+ years ago](`38da9884e7`). Instead of fixing the implementation which would just be a copy of our normal `Clock`, we can just remove `MockClock`	2025-09-30 11:27:29 -05:00
Eric Eastwood	5143f93dc9	Fix `server_name` in logging context for multiple Synapse instances in one process (#18868 ) ### Background As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process. "Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48 ### Prior art Previously, we exposed `server_name` by providing a static logging `MetadataFilter` that injected the values: `205d9e4fc4/synapse/config/logger.py (L216)` While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant. ### What does this PR do? We remove the `MetadataFilter` and replace it by tracking the `server_name` in the `LoggingContext` and expose it with our existing [`LoggingContextFilter`](`205d9e4fc4/synapse/logging/context.py (L584-L622)`) that we already use to expose information about the `request`. This means that the `server_name` value follows wherever we log as expected even when we have multiple Synapse instances running in the same process. ### A note on logcontext Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log something, we won't know which server sent the log. We've been fixing up `sentinel` logcontext usage as tracked by https://github.com/element-hq/synapse/issues/18905 Any further `sentinel` logcontext usage we find in the future can be fixed piecemeal as normal. `d2a966f922/docs/log_contexts.md (L71-L81)` ### Testing strategy 1. Adjust your logging config to include `%(server_name)s` in the format ```yaml formatters: precise: format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' ``` 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make some requests (`curl http://localhost:8008/_matrix/client/versions`, etc) 1. Open the homeserver logs and notice the `server_name` in the logs as expected. `unknown_server_from_sentinel_context` is expected for the `sentinel` logcontext (things outside of Synapse).	2025-09-26 17:10:48 -05:00
Andrew Morgan	ddc7627b22	Fix performance regression related to delayed events processing (#18926 )	2025-09-23 09:47:30 +01:00
Eric Eastwood	5a9ca1e3d9	Introduce `Clock.call_when_running(...)` to include logcontext by default (#18944 ) Introduce `Clock.call_when_running(...)` to wrap startup code in a logcontext, ensuring we can identify which server generated the logs. Background: > Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` > logcontext as we want to know which server the logs came from. In practice, this is not > always the case yet especially outside of request handling. > > Global things outside of Synapse (e.g. Twisted reactor code) should run in the > `sentinel` logcontext. It's only when it calls into application code that a logcontext > gets activated. This means the reactor should be started in the `sentinel` logcontext, > and any time an awaitable yields control back to the reactor, it should reset the > logcontext to be the `sentinel` logcontext. This is important to avoid leaking the > current logcontext to the reactor (which would then get picked up and associated with > the next thing the reactor does). > > *-- `docs/log_contexts.md` Also adds a lint to prefer `Clock.call_when_running(...)` over `reactor.callWhenRunning(...)` Part of https://github.com/element-hq/synapse/issues/18905	2025-09-22 10:27:59 -05:00
reivilibre	ada3a3b2b3	Add experimental support for MSC4308: Thread Subscriptions extension to Sliding Sync when MSC4306 and MSC4186 are enabled. (#18695 ) Closes: #18436 Implements: https://github.com/matrix-org/matrix-spec-proposals/pull/4308 Follows: #18674 Adds an extension to Sliding Sync and a companion endpoint needed for backpaginating missed thread subscription changes, as described in MSC4308 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-09-11 14:45:04 +01:00
Devon Hudson	9301baa5f8	Fix hydra tests	2025-08-11 11:32:57 -06:00
Devon Hudson	bd8f12f9c6	Fix broken test	2025-08-11 16:43:45 +01:00
Kegan Dougal	0eb7252a23	Support for room version 12	2025-08-11 16:43:45 +01:00
reivilibre	6514381b02	Implement the push rules for experimental MSC4306: Thread Subscriptions. (#18762 ) Follows: #18756 Implements: MSC4306 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-08-06 15:33:52 +01:00
reivilibre	8306cee06a	Update implementation of MSC4306: Thread Subscriptions to include automatic subscription conflict prevention as introduced in later drafts. (#18756 ) Follows: #18674 Implements new drafts of MSC4306 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org> Co-authored-by: Eric Eastwood <erice@element.io>	2025-08-05 18:22:53 +00:00
reivilibre	a31d53b28f	Use `twisted.internet.testing` module in tests instead of deprecated `twisted.test.proto_helpers`. (#18728 ) Follows: #18727 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>	2025-07-30 12:32:10 +01:00
Eric Eastwood	d4af2970f3	Refactor `Histogram` metrics to be homeserver-scoped (#18724 ) Bulk refactor `Histogram` metrics to be homeserver-scoped. We also add lints to make sure that new `Histogram` metrics don't sneak in without using the `server_name` label (`SERVER_NAME_LABEL`). Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the TODO metrics with the `server_name` label ### Todo - [x] Wait for https://github.com/element-hq/synapse/pull/18656 to merge ### Dev notes ``` LoggingDatabaseConnection make_conn make_pool make_fake_db_pool ``` ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-07-29 15:35:38 -05:00
Eric Eastwood	5106818bd0	Refactor `GaugeBucketCollector` metrics to be homeserver-scoped (#18715 ) Refactor `GaugeBucketCollector` metrics to be homeserver-scoped Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Adjust the number of [`msecs` in the `looping_call` so that `_read_forward_extremities`](`a82b8a966a/synapse/storage/databases/main/metrics.py (L79)`) runs immediately instead of after an hour. 1. Observe response includes the `synapse_forward_extremities` and `synapse_excess_extremity_events` metrics with the `server_name` label	2025-07-29 11:46:21 -05:00
Eric Eastwood	b7e7f537f1	Refactor background process metrics to be homeserver-scoped (#18670 ) Part of https://github.com/element-hq/synapse/issues/18592 Separated out of https://github.com/element-hq/synapse/pull/18656 because it's a bigger, unique piece of the refactor ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the background processs metrics (`synapse_background_process_start_count`, `synapse_background_process_db_txn_count_total`, etc) with the `server_name` label	2025-07-23 13:28:17 -05:00
reivilibre	875269eb53	Add experimental and incomplete support for MSC4306: Thread Subscriptions. (#18674 ) Implements: [MSC4306](https://github.com/matrix-org/matrix-spec-proposals/blob/rei/msc_thread_subscriptions/proposals/4306-thread-subscriptions.md) (partially) What's missing: - Changes to push rules Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>	2025-07-21 15:54:28 +01:00
Eric Eastwood	cda922830e	Clean up `MetricsResource` and Prometheus hacks (#18687 ) Clean up `MetricsResource`, Prometheus hacks (`_set_prometheus_client_use_created_metrics`), and better document why we care about having a separate `metrics` listener type. These clean-up changes have been split out from https://github.com/element-hq/synapse/pull/18584 since that PR was closed.	2025-07-17 11:57:19 -05:00
Eric Eastwood	88785dbaeb	Refactor cache metrics to be homeserver-scoped (#18604 ) (add `server_name` label to cache metrics). Part of https://github.com/element-hq/synapse/issues/18592	2025-07-16 16:04:57 -05:00
Andrew Morgan	be4c95baf1	Replace PyICU with Rust `icu_segmenter` crate (#18553 ) Co-authored-by: anoa's Codex Agent <codex@amorgan.xyz> Co-authored-by: Quentin Gliech <quenting@element.io>	2025-07-03 11:12:12 +01:00
Andrew Morgan	6791e6e250	Unbreak unit tests with Twisted `25.5.0` by add `parsePOSTFormSubmission` arg to `FakeSite` (#18577 ) Co-authored-by: anoa's Codex Agent <codex@amorgan.xyz>	2025-06-24 11:52:06 +01:00
Erik Johnston	33e0c25279	Clean up old `device_federation_inbox` rows (#18546 ) Fixes https://github.com/element-hq/synapse/issues/17370	2025-06-18 11:58:31 +00:00
Will Hunt	6e600c986e	Don't allow users to ignore themselves. (#18508 ) Fixes the self-ignore issues we've being seeing of reports of by ignoring bad requests from clients. Fixes https://github.com/element-hq/synapse/issues/11963 Fix https://github.com/element-hq/element-web/issues/29969 although this should also be fixed on the client to avoid confusing errors popping up while rejecting invites. Related to https://github.com/matrix-org/matrix-rust-sdk/issues/5073	2025-06-06 15:37:15 +01:00
Will Hunt	8010377a88	Add support for MSC4155 Invite filtering (#18288 ) This implements https://github.com/matrix-org/matrix-spec-proposals/pull/4155, which adds support for a new account data type that blocks an invite based on some conditions in the event contents. --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-06-05 11:49:09 +01:00
Devon Hudson	99cbd33630	Merge branch 'master' into develop	2025-05-20 09:36:05 -06:00
dependabot[bot]	9d43bec326	Bump ruff from 0.7.3 to 0.11.10 (#18451 ) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Morgan <andrew@amorgan.xyz> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-05-20 15:23:30 +01:00
Erik Johnston	67920c0aca	Fix up the topological ordering for events above `MAX_DEPTH` (#18447 ) Synapse previously did not correctly cap the max depth of an event to the max canonical json int. This can cause ordering issues for any events that were sent locally at the time. This background update goes and correctly caps the topological ordering to the new `MAX_DEPTH`. c.f. GHSA-v56r-hwv5-mxg6 --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-05-19 13:36:30 +01:00
Devon Hudson	89cb613a4e	Revert "Add total event, unencrypted message, and e2ee event counts to stats reporting" (#18346 ) Reverts element-hq/synapse#18260 It is causing a failure when building release debs for `debian:bullseye` with the following error: ``` sqlite3.OperationalError: near "RETURNING": syntax error ```	2025-04-16 16:41:41 +00:00
Andrew Morgan	a832375bfb	Add total event, unencrypted message, and e2ee event counts to stats reporting (#18260 ) Co-authored-by: Eric Eastwood <erice@element.io>	2025-04-15 07:49:08 -07:00
Devon Hudson	1efb826b54	Delete unreferenced state groups in background (#18254 ) This PR fixes #18154 to avoid de-deltaing state groups which resulted in DB size temporarily increasing until the DB was `VACUUM`'ed. As a result, less state groups will get deleted now. It also attempts to improve performance by not duplicating work when processing state groups it has already processed in previous iterations. ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [X] Pull request is based on the develop branch * [X] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [X] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Erik Johnston <erikj@element.io>	2025-03-21 17:09:49 +00:00
reivilibre	8295de87a7	Revert the background job to clear unreferenced state groups (that was introduced in v1.126.0rc1), due to a suspected issue that causes increased disk usage. (#18222 ) Revert "Add background job to clear unreferenced state groups (#18154)" This mechanism is suspected of inserting large numbers of rows into `state_groups_state`, thus unreasonably increasing disk usage. See: https://github.com/element-hq/synapse/issues/18217 This reverts commit `5121f9210c` (#18154). --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>	2025-03-07 15:44:13 +00:00
Devon Hudson	5121f9210c	Add background job to clear unreferenced state groups (#18154 ) Fixes #18150 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [X] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [X] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Erik Johnston <erikj@element.io>	2025-02-25 16:25:39 +00:00
Devon Hudson	ecad88f5c5	Cleanup deleted state group references (#18165 ) ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-02-18 14:44:38 +00:00
Erik Johnston	c46d452c7c	Fix bug where purging history could lead to increase in disk space usage (#18131 ) When purging history, we try and delete any state groups that become unreferenced (i.e. there are no longer any events that directly reference them). When we delete a state group that is referenced by another state group, we "de-delta" that state group so that it no longer refers to the state group that is deleted. There are two bugs with this approach that we fix here: 1. There is a common pattern where we end up storing two state groups when persisting a state event: the state before and after the new state event, where the latter is stored as a delta to the former. When deleting state groups we only deleted the "new" state and left (and potentially de-deltaed) the old state. This was due to a bug/typo when trying to find referenced state groups. 2. There are times where we store unreferenced state groups in the DB, during the purging of history these would not get rechecked and instead always de-deltaed. Instead, we should check for this case and delete any unreferenced state groups rather than de-deltaing them. The effect of the above bugs is that when purging history we'd end up with lots of unreferenced state groups that had been de-deltaed (i.e. stored as the full state). This can lead to dramatic increases in storage space used.	2025-02-03 19:04:19 +00:00
Erik Johnston	27dbb1b429	Add locking to more safely delete state groups: Part 2 (#18130 ) This actually makes it so that deleting state groups goes via the new mechanism. c.f. #18107	2025-02-03 17:58:55 +00:00
Erik Johnston	aa6e5c2ecb	Add locking to more safely delete state groups: Part 1 (#18107 ) Currently we don't really have anything that stops us from deleting state groups when an in-flight event references it. This is a fairly rare race currently, but we want to be able to more aggressively delete state groups so it is important to address this to ensure that the database remains valid. This implements the locking, but doesn't actually use it. See the class docstring of the new data store for an explanation for how this works. --------- Co-authored-by: Devon Hudson <devon.dmytro@gmail.com>	2025-02-03 17:29:15 +00:00
Eric Eastwood	aab3672037	Bust `_membership_stream_cache` cache when current state changes (#17732 ) This is particularly a problem in a state reset scenario where the membership might change without a corresponding event. This PR is targeting a scenario where a state reset happens which causes room membership to change. Previously, the cache would just hold onto stale data and now we properly bust the cache in this scenario. We have a few tests for these scenarios which you can see are now fixed because we can remove the `FIXME` where we were previously manually busting the cache in the test itself. This is a general Synapse thing so by it's nature it helps out Sliding Sync. Fix https://github.com/element-hq/synapse/issues/17368 Prerequisite for https://github.com/element-hq/synapse/issues/17929 --- Match when are busting `_curr_state_delta_stream_cache`	2025-01-08 10:11:09 -06:00
Devon Hudson	eda735e4bb	Remove support for python 3.8 (#17908 ) ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [X] Pull request is based on the develop branch * [X] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [X] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2024-11-06 19:36:01 +00:00
Eric Eastwood	a5e16a4ab5	Sliding Sync: Reset `forgotten` status when membership changes (like rejoining a room) (#17835 ) Reset `sliding_sync_membership_snapshots` -> `forgotten` status when membership changes (like rejoining a room). Fix https://github.com/element-hq/synapse/issues/17781 ### What was the problem before? Previously, if someone used `/forget` on one of their rooms, it would update `sliding_sync_membership_snapshots` as expected but when someone rejoined the room (or had any membership change), the upsert didn't overwrite and reset the `forgotten` status so it remained `forgotten` and invisible down the Sliding Sync endpoint.	2024-10-22 11:06:46 +01:00
Eric Eastwood	adda2a4613	Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) (#17718 ) Spawning from @kegsay [pointing out](https://matrix.to/#/!cnVVNLKqgUzNTOFQkz:matrix.org/$ExOO7J8uPUQSyH-9Uxc_QCa8jlXX9uK4VRtkSC0EI3o?via=element.io&via=matrix.org&via=jki.re) that the Sliding Sync endpoint doesn't handle a large room with a lot of state well on initial sync (requesting all state via `required_state: [ ["",""] ]`) (it just takes forever). After investigating further, the slow part is just `get_events_as_list(...)` fetching all of the current state ID's out for the room (which can be 100k+ events for rooms with a lot of membership). This is just a slow thing in Synapse in general and the same thing happens in Sync v2 or the `/state` endpoint. --- The only idea I had to improve things was to use `batch_iter` to only try fetching a fixed amount at a time instead of working with large maps, lists, and sets. This doesn't seem to have much effect though. There is already a `batch_iter(event_ids, 200)` in `_fetch_event_rows(...)` for when we actually have to touch the database and that's inside a queue to deduplicate work. I did notice one slight optimization to use `get_events_as_list(...)` directly instead of `get_events(...)`. `get_events(...)` just turns the result from `get_events_as_list(...)` into a dict and since we're just iterating over the events, we don't need the dict/map.	2024-10-14 13:47:35 +01:00
Eric Eastwood	c2e5e9e67c	Sliding Sync: Avoid fetching left rooms and add back `newly_left` rooms (#17725 ) Performance optimization: We can avoid fetching rooms that the user has left themselves (which could be a significant amount), then only add back rooms that the user has `newly_left` (left in the token range of an incremental sync). It's a lot faster to fetch less rooms than fetch them all and throw them away in most cases. Since the user only leaves a room (or is state reset out) once in a blue moon, we can avoid a lot of work. Based on @erikjohnston's branch, erikj/ss_perf --------- Co-authored-by: Erik Johnston <erik@matrix.org>	2024-09-19 10:07:18 -05:00
Eric Eastwood	16af80b8fb	Sliding Sync: Use Sliding Sync tables for sorting (#17693 ) Use Sliding Sync tables for sorting (`bulk_get_last_event_pos_in_room_before_stream_ordering(...)` -> `_bulk_get_max_event_pos(...)`)	2024-09-11 12:16:24 -05:00
Erik Johnston	596b96411b	Sliding sync: various fixups to the background update (#17652 )	2024-09-11 15:38:46 +01:00
Erik Johnston	588e5b521d	Sliding Sync: Retrieve fewer events from DB in sync (#17688 ) When using timeline limit of 1 we end up fetching 2 events from the DB purely to tell if the response was "limited" or not. Lets not do that.	2024-09-10 09:52:42 +01:00
Eric Eastwood	e1ed959a68	Sliding Sync: Get `bump_stamp` from new sliding sync tables because it's faster (#17658 ) Get `bump_stamp` from [new sliding sync tables](https://github.com/element-hq/synapse/pull/17512) which should be faster (performance) than flipping through the latest events in the room.	2024-09-09 16:41:25 +01:00
Eric Eastwood	26f81fb5be	Sliding Sync: Fix outlier re-persisting causing problems with sliding sync tables (#17635 ) Fix outlier re-persisting causing problems with sliding sync tables Follow-up to https://github.com/element-hq/synapse/pull/17512 When running on `matrix.org`, we discovered that a remote invite is first persisted as an `outlier` and then re-persisted again where it is de-outliered. The first the time, the `outlier` is persisted with one `stream_ordering` but when persisted again and de-outliered, it is assigned a different `stream_ordering` that won't end up being used. Since we call `_calculate_sliding_sync_table_changes()` before `_update_outliers_txn()` which fixes this discrepancy (always use the `stream_ordering` from the first time it was persisted), we're working with an unreliable `stream_ordering` value that will possibly be unused and not make it into the `events` table.	2024-08-30 08:53:57 +01:00
Erik Johnston	bb80894391	Fix background update for sliding sync (#17631 ) This reverts commit ab414f2ab8a294fbffb417003eeea0f14bbd6588. Introduced in https://github.com/element-hq/synapse/pull/17599	2024-08-29 16:58:53 +01:00
Eric Eastwood	1a6b718f8c	Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512 ) Pre-populate room data for quick filtering/sorting in the Sliding Sync API Spawning from https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578 This PR is acting as the Synapse version `N+1` step in the gradual migration being tracked by https://github.com/element-hq/synapse/issues/17623 Adding two new database tables: - `sliding_sync_joined_rooms`: A table for storing room meta data that the local server is still participating in. The info here can be shared across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the relevant room current state changes or a new event is sent in the room. - `sliding_sync_membership_snapshots`: A table for storing a snapshot of room meta data at the time of the local user's membership. Keyed on `(room_id, user_id)` and only updated when a user's membership in a room changes. Also adds background updates to populate these tables with all of the existing data. We want to have the guarantee that if a row exists in the sliding sync tables, we are able to rely on it (accurate data). And if a row doesn't exist, we use a fallback to get the same info until the background updates fill in the rows or a new event comes in triggering it to be fully inserted. This means we need a couple extra things in place until we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the `N+2` part of the gradual migration. For context on why we can't rely on the tables without these things see [1]. 1. On start-up, block until we clear out any rows for the rooms that have had events since the max-`stream_ordering` of the `sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of the `events` table). For `sliding_sync_membership_snapshots`, we can compare to the max-`stream_ordering` of `local_current_membership` - This accounts for when someone downgrades their Synapse version and then upgrades it again. This will ensure that we don't have any stale/out-of-date data in the `sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables since any new events sent in rooms would have also needed to be written to the sliding sync tables. For example a new event needs to bump `event_stream_ordering` in `sliding_sync_joined_rooms` table or some state in the room changing (like the room name). Or another example of someone's membership changing in a room affecting `sliding_sync_membership_snapshots`. 1. Add another background update that will catch-up with any rows that were just deleted from the sliding sync tables (based on the activity in the `events`/`local_current_membership`). The rooms that need recalculating are added to the `sliding_sync_joined_rooms_to_recalculate` table. 1. Making sure rows are fully inserted. Instead of partially inserting, we need to check if the row already exists and fully insert all data if not. All of this extra functionality can be removed once the `SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync tables so people can no longer downgrade (the `N+2` part of the gradual migration). <details> <summary><sup>[1]</sup></summary> For `sliding_sync_joined_rooms`, since we partially insert rows as state comes in, we can't rely on the existence of the row for a given `room_id`. We can't even rely on looking at whether the background update has finished. There could still be partial rows from when someone reverted their Synapse version after the background update finished, had some state changes (or new rooms), then upgraded again and more state changes happen leaving a partial row. For `sliding_sync_membership_snapshots`, we insert items as a whole except for the `forgotten` column ~~so we can rely on rows existing and just need to always use a fallback for the `forgotten` data. We can't use the `forgotten` column in the table for the same reasons above about `sliding_sync_joined_rooms`.~~ We could have an out-of-date membership from when someone reverted their Synapse version. (same problems as outlined for `sliding_sync_joined_rooms` above) Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7) </details> ### TODO - [x] Update `stream_ordering`/`bump_stamp` - [x] Handle remote invites - [x] Handle state resets - [x] Consider adding `sender` so we can filter `LEAVE` memberships and distinguish from kicks. - [x] We should add it to be able to tell leaves from kicks - [x] Consider adding `tombstone` state to help address https://github.com/element-hq/synapse/issues/17540 - [x] We should add it `tombstone_successor_room_id` - [x] Consider adding `forgotten` status to avoid extra lookup/table-join on `room_memberships` - [x] We should add it - [x] Background update to fill in values for all joined rooms and non-join membership - [x] Clean-up tables when room is deleted - [ ] Make sure tables are useful to our use case - First explored in https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables - Also explored in `76b5a576eb` - [x] Plan for how can we use this with a fallback - See plan discussed above in main area of the issue description - Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7) - [x] Plan for how we can rely on this new table without a fallback - Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add new tables and background update to backfill all rows. Since this is a new table, we don't have to add any `NOT VALID` constraints and validate them when the background update completes. Read from new tables with a fallback in cases where the rows aren't filled in yet. - Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump `SCHEMA_COMPAT_VERSION` to `87` because we don't want people to downgrade and miss writes while they are on an older version. Add a foreground update to finish off the backfill so we can read from new tables without the fallback. Application code can now rely on the new tables being populated. - Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj) ### Dev notes ``` SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase ``` ``` SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase ``` Reference: - [Development docs on background updates and worked examples of gradual migrations ](`1dfa59b238/docs/development/database_schema.md (background-updates)`) - A real example of a gradual migration: https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514 - Adding `rooms.creator` field that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/10697 - Adding `rooms.room_version` that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/6729 - Adding `room_stats_state.room_type` that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/13031 - Tables from MSC2716: `insertion_events`, `insertion_event_edges`, `insertion_event_extremities`, `batch_events` - `current_state_events` updated in `synapse/storage/databases/main/events.py` --- ``` persist_event (adds to queue) _persist_event_batch _persist_events_and_state_updates (assigns `stream_ordering` to events) _persist_events_txn _store_event_txn _update_metadata_tables_txn _store_room_members_txn _update_current_state_txn ``` --- > Concatenated Indexes [...] (also known as multi-column, composite or combined index) > > [...] key consists of multiple columns. > > We can take advantage of the fact that the first index column is always usable for searching > > -- https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys --- Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`), https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219 --- <details> <summary>SQL queries:</summary> Both of these are equivalent and work in SQLite and Postgres Options 1: ```sql WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS ( VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ) INSERT INTO sliding_sync_non_join_memberships (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) SELECT * FROM data_table WHERE membership != ? ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` Option 2: ```sql INSERT INTO sliding_sync_non_join_memberships (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) SELECT column1 as room_id, column2 as user_id, column3 as membership_event_id, column4 as membership, column5 as event_stream_ordering, {", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))} FROM ( VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ) as v WHERE membership != ? ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` If we don't need the `membership` condition, we could use: ```sql INSERT INTO sliding_sync_non_join_memberships (room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)}) VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` </details> ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Erik Johnston <erik@matrix.org>	2024-08-29 16:09:51 +01:00
Eric Eastwood	11db575218	Sliding Sync: Use `stream_ordering` based timeline pagination for incremental sync (#17510 ) Use `stream_ordering` based `timeline` pagination for incremental `/sync` in Sliding Sync. Previously, we were always using a `topological_ordering` but we should only be using that for historical scenarios (initial `/sync`, newly joined, or haven't sent the room down the connection before). This is slightly different than what the [spec suggests](https://spec.matrix.org/v1.10/client-server-api/#syncing) > Events are ordered in this API according to the arrival time of the event on the homeserver. This can conflict with other APIs which order events based on their partial ordering in the event graph. This can result in duplicate events being received (once per distinct API called). Clients SHOULD de-duplicate events based on the event ID when this happens. But we've had a [discussion below in this PR](https://github.com/element-hq/synapse/pull/17510#discussion_r1699105569) and this matches what Sync v2 already does and seems like it makes sense. Created a spec issue https://github.com/matrix-org/matrix-spec/issues/1917 to clarify this. Related issues: - https://github.com/matrix-org/matrix-spec/issues/1917 - https://github.com/matrix-org/matrix-spec/issues/852 - https://github.com/matrix-org/matrix-spec-proposals/pull/4033	2024-08-07 11:27:50 -05:00
Eric Eastwood	3fee32ed6b	Order `heroes` by `stream_ordering` (as spec'ed) (#17435 ) The spec specifically mentions `stream_ordering` but that's a Synapse specific concept. In any case, the essence of the spec is basically the first 5 members of the room which `stream_ordering` accomplishes. Split off from https://github.com/element-hq/synapse/pull/17419#discussion_r1671342794 ## Spec compliance > This should be the first 5 members of the room, ordered by stream ordering, which are joined or invited. The list must never include the client’s own user ID. When no joined or invited members are available, this should consist of the banned and left users. > > -- https://spec.matrix.org/v1.10/client-server-api/#_matrixclientv3sync_roomsummary Related to https://github.com/matrix-org/matrix-spec/issues/1334	2024-07-17 13:10:15 -05:00
Eric Eastwood	3fef535ff2	Add `rooms.bump_stamp` to Sliding Sync `/sync` for easier client-side sorting (#17395 ) `bump_stamp` corresponds to the `stream_ordering` of the latest `DEFAULT_BUMP_EVENT_TYPES` in the room. This helps clients sort more readily without them needing to pull in a bunch of the timeline to determine the last activity. `bump_event_types` is a thing because for example, we don't want display name changes to mark the room as unread and bump it to the top. For encrypted rooms, we just have to consider any activity as a bump because we can't see the content and the client has to figure it out for themselves. Outside of Synapse, `bump_stamp` is just a free-form counter so other implementations could use `received_ts`or `origin_server_ts` (see the [Security considerations section in MSC3575 about the potential pitfalls of using `origin_server_ts`](https://github.com/matrix-org/matrix-spec-proposals/blob/kegan/sync-v3/proposals/3575-sync.md#security-considerations)). It doesn't have any guarantee about always going up. In the Synapse case, it could go down if an event was redacted/removed (or purged in cases of retention policies). In the future, we could add `bump_event_types` as [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575) mentions if people need to customize the event types. --- In the Sliding Sync proxy, a similar [`timestamp` field was added](https://github.com/matrix-org/sliding-sync/pull/247) for the same purpose but the name is not obvious what it pertains to or what it's for. The `timestamp` field was also added to Ruma in https://github.com/ruma/ruma/pull/1622	2024-07-08 13:17:08 -05:00
Eric Eastwood	fa91655805	Return some room data in Sliding Sync `/sync` (#17320 ) - Timeline events - Stripped `invite_state` Based on [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575): Sliding Sync	2024-07-02 11:07:05 -05:00

1 2 3 4 5 ...

771 Commits