collabvm3-protocol/SPEC.md
2025-01-10 14:14:41 -05:00

2.5 KiB

Flow/protocol spec (TEMP)

Connected

Once connected to the VM WebSocket or WebTransport instance, the server will send the following messages wrapped as ServerMessage:

  • Server hello, which contains the protocol version.

The client will, upon recieving the server hello send a hello message with its protocol version. If the protocol version is too low or incompatible with the server, the server will simply close the connection.

Otherwise the server will send initalization data:

  • Chat history
  • User list (including role colors)
  • Your own user
  • Vote states (if votes are currently active)
  • Turn state (if applicable)
  • a list of Remoting display formats that the VM can use

Normal usage

The client is expected to send a SelectFormat message (wrapped in ClientMessage) with the format they want to use within 10 seconds.

Otherwise, the server disconnects with a Disconnect message and closes the WebSocket connection.

A client is free to select any format that the server itself has advertised to the client.

Once the client selects the format, the client will start getting Remoting display messages in the format they requested. Currently that is:

format enum description
LEGACY_UGLY_BAD_UGLY_JPEG Legacy JPEG bandwidth-waste.
STREAM_H264 An H.264 video stream.
STREAM_H265 An H.265 (HEVC) video stream.

If the client selects a video stream format, then the server will not send display size messages.

This is because the intraframes that are sent by the server upon the screen being resized or a user joining will contain the required parameters to size the display.

View mode

The client does not need to send a SelectFormat message; in fact it actually will be ignored if the client DOES send one, since it would defeat the purpose to handle it.

The client is required to send screenshot requests to look at the display, since Remoting messages are not sent.

This is useful for utility bots since they can simply hold a connection and then simply save the response; they do not need to implement JPEG decoding, H.264 decoding, or anything. They just save some data to disk or whatever. Easy!

If the user is kicked (or banned)

The server will send a Disconnect message with an appropiate reason and then close the connection. Otherwise if the connection closes it is assumed the user can simply reconnect.