The main goal of this PR is to handle device list changes onto multiple
writers, off the main process, so that we can have logins happening
whilst Synapse is rolling-restarting.
This is quite an intrusive change, so I would advise to review this
commit by commit; I tried to keep the history as clean as possible.
There are a few things to consider:
- the `device_list_key` in stream tokens becomes a
`MultiWriterStreamToken`, which has a few implications in sync and on
the storage layer
- we had a split between `DeviceHandler` and `DeviceWorkerHandler` for
master vs. worker process. I've kept this split, but making it rather
writer vs. non-writer worker, using method overrides for doing
replication calls when needed
- there are a few operations that need to happen on a single worker at a
time. Instead of using cross-worker locks, for now I made them run on
the first writer on the list
---------
Co-authored-by: Eric Eastwood <erice@element.io>
You can now configure how much media can be uploaded by a user in a
given time period.
Note the first commit here is a refactor of create/upload content
function
This implements
https://github.com/matrix-org/matrix-spec-proposals/pull/3765 which is
already merged and, therefore, can use stable identifiers.
For `/publicRooms` and `/hierarchy`, the topic is read from the
eponymous field of the `current_state_events` table. Rather than
introduce further columns in this table, I changed the insertion /
update logic to write the plain-text topic from the rich topic into the
existing field. This will not take effect for existing rooms unless
their topic is changed. However, existing rooms shouldn't have rich
topics to begin with.
Similarly, for server-side search, I changed the insertion logic of the
`event_search` table to prefer the value from the rich topic. Again,
existing events shouldn't have rich topics and, therefore, don't need to
be migrated in the table.
Spec doc: https://spec.matrix.org/v1.15/client-server-api/#mroomtopic
Part of supporting Matrix v1.15:
https://spec.matrix.org/v1.15/client-server-api/#mroomtopic
Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org>
Co-authored-by: Eric Eastwood <erice@element.io>
This was correctly handled for the "fallback" case where the background
updates hadn't finished
---------
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
This PR makes a few radical changes to media. This now stores the SHA256
hash of each file stored in the database (excluding thumbnails, more on
that later). If a set of media is quarantined, any additional uploads of
the same file contents or any other files with the same hash will be
quarantined at the same time.
Currently this does NOT:
- De-duplicate media, although a future extension could be to do that.
- Run any background jobs to identify the hashes of older files. This
could also be a future extension, though the value of doing so is
limited to combat the abuse of recent media.
- Hash thumbnails. It's assumed that thumbnails are parented to some
form of media, so you'd likely be wanting to quarantine the media and
the thumbnail at the same time.
We do a few things in this PR to better support caching:
1. Change `Cache-Control` header to allow intermediary proxies to cache
media *only* if they revalidate on every request. This means that the
intermediary cache will still send the request to Synapse but with a
`If-None-Match` header, at which point Synapse can check auth and
respond with a 304 and empty content.
2. Add `ETag` response header to all media responses. We hardcode this
to `1` since all media is immutable (beyond being deleted).
3. Check for `If-None-Match` header (after checking for auth), and if it
matches then respond with a 304 and empty body.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
MSC4108 relies on ETag to determine if something has changed on the
rendez-vous channel.
Strong and correct ETag comparison works if the response body is
bit-for-bit identical, which isn't the case if a proxy in the middle
compresses the response on the fly.
This adds a `no-transform` directive to the `Cache-Control` header,
which tells proxies not to transform the response body.
Additionally, some proxies (nginx) will switch to `Transfer-Encoding:
chunked` if it doesn't know the Content-Length of the response, and
'weakening' the ETag if that's the case. I've added `Content-Length`
headers to all responses, to hopefully solve that.
This basically fixes QR-code login when nginx or cloudflare is involved,
with gzip/zstd/deflate compression enabled.
Some small cleanups after Python3.8 became EOL.
- Move some type imports from `typing_extensions` to `typing`
- Remove the `abi3-py38` feature from pyo3
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Quentin Gliech <quenting@element.io>
Regressed as part of https://github.com/element-hq/synapse/pull/18107
This does two things:
1. Only check if the state groups have been deleted when calculating the
event context (as that's when we will insert them). This avoids lots of
checks for read operations.
2. Don't lock the `state_groups` rows when doing the check. This adds
overhead, and it doesn't prevent any races.
Currently we don't really have anything that stops us from deleting
state groups when an in-flight event references it. This is a fairly
rare race currently, but we want to be able to more aggressively delete
state groups so it is important to address this to ensure that the
database remains valid.
This implements the locking, but doesn't actually use it.
See the class docstring of the new data store for an explanation for how
this works.
---------
Co-authored-by: Devon Hudson <devon.dmytro@gmail.com>
This also happens for rejecting an invite. Basically, any out-of-band membership transition where we first get the membership as an `outlier` and then rely on federation filling us in to de-outlier it.
This PR mainly addresses automated test flakiness, bots/scripts, and options within Synapse like [`auto_accept_invites`](https://element-hq.github.io/synapse/v1.122/usage/configuration/config_documentation.html#auto_accept_invites) that are able to react quickly (before federation is able to push us events), but also helps in generic scenarios where federation is lagging.
I initially thought this might be a Synapse consistency issue (see issues labeled with [`Z-Read-After-Write`](https://github.com/matrix-org/synapse/labels/Z-Read-After-Write)) but it seems to be an event auth logic problem. Workers probably do increase the number of possible race condition scenarios that make this visible though (replication and cache invalidation lag).
Fix https://github.com/element-hq/synapse/issues/15012
(probably fixes https://github.com/matrix-org/synapse/issues/15012 (https://github.com/element-hq/synapse/issues/15012))
Related to https://github.com/matrix-org/matrix-spec/issues/2062
Problems:
1. We don't consider [out-of-band membership](https://github.com/element-hq/synapse/blob/develop/docs/development/room-dag-concepts.md#out-of-band-membership-events) (outliers) in our `event_auth` logic even though we expose them in `/sync`.
1. (This PR doesn't address this point) Perhaps we should consider authing events in the persistence queue as events already in the queue could allow subsequent events to be allowed (events come through many channels: federation transaction, remote invite, remote join, local send). But this doesn't save us in the case where the event is more delayed over federation.
### What happened before?
I wrote some Complement test that stresses this exact scenario and reproduces the problem: https://github.com/matrix-org/complement/pull/757
```
COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestSynapseConsistency
```
We have `hs1` and `hs2` running in monolith mode (no workers):
1. `@charlie1:hs2` is invited and joins the room:
1. `hs1` invites `@charlie1:hs2` to a room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
1. `@charlie1:hs2` decides to join because it saw the invite down `/sync`. Because `hs2` is not yet in the room, this happens as a remote join `make_join`/`send_join` which comes back with all of the auth events needed to auth successfully and now `@charlie1:hs2` is successfully joined to the room.
1. `@charlie2:hs2` is invited and and tries to join the room:
1. `hs1` invites `@charlie2:hs2` to the room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
1. Because `hs2` is already participating in the room, we also see the invite come over federation in a transaction and we start processing it (not done yet, see below)
1. `@charlie2:hs2` decides to join because it saw the invite down `/sync`. Because `hs2`, is already in the room, this happens as a local join but we deny the event because our `event_auth` logic thinks that we have no membership in the room ❌ (expected to be able to join because we saw the invite down `/sync`)
1. We finally finish processing the `@charlie2:hs2` invite event from and de-outlier it.
- If this finished before we tried to join we would have been fine but this is the race condition that makes this situation visible.
Logs for `hs2`:
```
🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=False>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
📨 Notifying about new event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
✅ on_invite_request: handled event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
🧲 do_invite_join for @user-2-charlie1:hs2 in !sfZVBdLUezpPWetrol:hs1
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$bwv8LxFnqfpsw_rhR7OrTjtz09gaJ23MqstKOcs7ygA, type=m.room.member, state_key=@user-1-alice:hs1, membership=join, outlier=True>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>
📨 Notifying about new event <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>
...
🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
✅ on_invite_request: handled event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
📬 handling received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
📮 handle_new_client_event: handling <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False>
❌ Denying new event <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False> because 403: You are not invited to this room.
synapse.http.server - 130 - INFO - POST-16 - <SynapseRequest at 0x7f460c91fbf0 method='POST' uri='/_matrix/client/v3/join/%21sfZVBdLUezpPWetrol:hs1?server_name=hs1' clientproto='HTTP/1.0' site='8080'> SynapseError: 403 - You are not invited to this room.
📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
✅ handled received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
```
This is particularly a problem in a state reset scenario where the membership
might change without a corresponding event.
This PR is targeting a scenario where a state reset happens which causes
room membership to change. Previously, the cache would just hold onto
stale data and now we properly bust the cache in this scenario.
We have a few tests for these scenarios which you can see are now fixed
because we can remove the `FIXME` where we were previously manually
busting the cache in the test itself.
This is a general Synapse thing so by it's nature it helps out Sliding
Sync.
Fix https://github.com/element-hq/synapse/issues/17368
Prerequisite for https://github.com/element-hq/synapse/issues/17929
---
Match when are busting `_curr_state_delta_stream_cache`
Adds a query param `type` to `/_synapse/admin/v1/rooms/{room_id}/state`
that filters the state event query by state event type.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
- Fetch the number of invites the provided user has sent after a given
timestamp
- Fetch the number of rooms the provided user has joined after a given
timestamp, regardless if they have left/been banned from the rooms
subsequently
- Get report IDs of event reports where the provided user was the sender
of the reported event
This is an implementation of MSC4190, which allows appservices to manage
their user's devices without /login & /logout.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Fixes#17823
While we're at it, makes a change where the redactions are sent as the
admin if the user is not a member of the server (otherwise these fail
with a "User must be our own" message).
Reset `sliding_sync_membership_snapshots` -> `forgotten` status when
membership changes (like rejoining a room).
Fix https://github.com/element-hq/synapse/issues/17781
### What was the problem before?
Previously, if someone used `/forget` on one of their rooms, it would
update `sliding_sync_membership_snapshots` as expected but when someone
rejoined the room (or had any membership change), the upsert didn't
overwrite and reset the `forgotten` status so it remained `forgotten`
and invisible down the Sliding Sync endpoint.
Fixes https://github.com/element-hq/synapse/issues/17698
This handles `required_state` changes by checking if new state has been
added to the config, and if so fetching and returning that from the
current state.
This also takes care to ensure that given a state entry S that is added,
removed and then re-added that we do *not* send S down a second time if
there have been no changes to S in the current state. This is fine for
Rust SDK (as it just remembers all state), but we might decide not to do
this behaviour in the MSC. If we decide to always send down S then its
easy enough to rip out all the code.
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
- better validation on user input
- fix an early task completion
- when checking membership in rooms, check for rooms user has been
banned from as well
The notifier is quite inefficient when it has to wake up many user
streams all at once
From a silly benchmark this takes the time to notify 1M user streams
from ~30s to ~5s
This is basically exactly the same logic as for receipts. Essentially we
just need to track which room account data we have and haven't sent down
to clients, and use that when we pull stuff out.
I think this just needs a couple of extra tests written
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
Performance optimization: We can avoid fetching rooms that the user has
left themselves (which could be a significant amount), then only add
back rooms that the user has `newly_left` (left in the token range of an
incremental sync). It's a lot faster to fetch less rooms than fetch them
all and throw them away in most cases. Since the user only leaves a room
(or is state reset out) once in a blue moon, we can avoid a lot of work.
Based on @erikjohnston's branch, erikj/ss_perf
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
No need to sort if the range is large enough to cover all of the rooms
in the list. Previously, we would only do this optimization if the range
was exactly large enough.
Follow-up to https://github.com/element-hq/synapse/pull/17672
This appears to be enough to make Element Web work (or at least move it
on to the next hurdle)
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
Move filters tests to rest layer in order to test the new (with sliding
sync tables) and fallback paths that Sliding Sync can use.
Also found a bug in the new path because it's not being tested which is
also fixed in this PR. We now take into account `has_known_state` when
filtering.
Spawning from
https://github.com/element-hq/synapse/pull/17662#discussion_r1755574791.
This should have been done when we started using the new sliding sync
tables in https://github.com/element-hq/synapse/pull/17630
This PR changes `from pydantic import BaseModel` to `from
synapse._pydantic_compat import BaseModel` (as well as `constr`,
`conbytes`, `conint`, `confloat`).
It allows `check_pydantic_models.py` to mock those pydantic objects only
in the synapse module, and not interfere with pydantic objects in
external dependencies.
This should solve the CI problems for #17144, which breaks because
`check_pydantic_models.py` patches pydantic models from
[scim2-models](https://scim2-models.readthedocs.io/).
/cc @DMRobertson @gotmax23
fixes#17659
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
We need to bust the `get_sliding_sync_rooms_for_user`
cache when the room encryption is updated and any
other field that is used in the query.
Follow-up to https://github.com/element-hq/synapse/pull/17630
- Bust cache for membership change (cross-reference
`get_rooms_for_user`)
- Bust cache for room `encryption` (cross-reference
`get_room_encryption`)
- Bust cache for `forgotten` (cross-reference
`did_forget`/`get_forgotten_rooms_for_user`)
For rooms with a name we can skip fetching a full room summary, as we
don't need to calculate heroes, and instead just fetch the room counts
directly.
This also changes things to not return counts and heroes for non-joined
rooms. For left/banned rooms we were returning zero values anyway, and
for invite/knock rooms we don't really want to leak such information
(even if some of is included in the stripped state).
Speed up incremental sync by avoiding extra work. We first look at the
state delta changes and only fetch and calculate further derived things
if they have changed.
I thought ruff check would also format, but it doesn't.
This runs ruff format in CI and dev scripts. The first commit is just a
run of `ruff format .` in the root directory.
This will help mitigating any discrepancies between the issuer
configured and the one returned by the OIDC provider.
This also removes the need for configuring the `account_management_url`
explicitely, as it will now be loaded from the OIDC discovery, as per
MSC2965.
Because we may now fetch stuff for the .well-known/matrix/client
endpoint, this also transforms the client well-known resource to be
asynchronous.
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
76b5a576eb
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](1dfa59b238/docs/development/database_schema.md (background-updates))
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
When returning receipts in sliding sync for initial rooms we should
always include our own receipts in the room (even if they don't match
any timeline events).
Reviewable commit-by-commit.
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
This supersedes #17503, given the per-connection state is being heavily
rewritten it felt easier to recreate the PR on top of that work.
This correctly handles the case of timeline limits going up and down.
This does not handle changes in `required_state`, but that can be done
as a separate PR.
Based on #17575.
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
Previously, we just had very basic partial room exclusion based on
whether we were lazy-loading room members. Now with this PR, we added
`must_await_full_state(...)` with rules to check if we have a we're only
requesting `required_state` which is completely satisfied even with
partial state.
Partially-stated rooms should have all state events except for remote
membership events so if we require a remote membership event anywhere,
then we need to return `True`.
This triggers the client to start a new sliding sync connection. If we
don't do this and the client asks for the full range of rooms, we end up
sending down all rooms and their state from scratch (which can be very
slow)
This causes things like
https://github.com/element-hq/element-x-ios/issues/3115 after we restart
the server
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>