hang

Hang is a real-time conferencing protocol built on top of moq-lite. A room consists of multiple participants who publish media tracks. All updates are live, such as a change in participants or media tracks.

Terminology

Hang is built on top of moq-lite [moql] and uses much of the same terminology. A quick recap:

Broadcast: A collection of Tracks from a single publisher.
Track: A series of Groups, each of which can be delivered and decoded out-of-order.
Group: A series of Frames, each of which must be delivered and decoded in-order.
Frame: A sized payload of bytes representing a single moment in time.

Hang introduces additional terminology:

Room: A collection of participants, publishing under a common prefix.
Participant: A moq-lite broadcaster that may produce any number of media tracks.
Catalog: A JSON document that describes each available media track, supporting live updates.
Container: A tiny header in front of each media payload containing the timestamp.

Discovery

The first requirement for a real-time conferencing application is to discover other participants in the same room. Hang does this using moq-lite's ANNOUNCE capabilities.

A room consists of a path. Any participants within the room MUST publish a broadcast with the room path as a prefix which SHOULD end with the .hang suffix.

For example:

text

/room123/alice.hang
/room123/bob.hang
/room456/zoe.hang

A participant issues an ANNOUNCE_PLEASE message to discover any other participants in the same room. The server (relay) will then respond with an ANNOUNCE message for any matching broadcasts, including their own.

For example:

text

ANNOUNCE_PLEASE prefix=/room/
ANNOUNCE suffix=alice.hang active=true
ANNOUNCE suffix=bob.hang   active=true

If a publisher no longer wants to participate, or is disconnected somehow, their presence will be unannounced. Publishers and subscribers SHOULD terminate any subscriptions once a participant is unannounced.

text

ANNOUNCE suffix=alice.hang active=false

Catalog

The catalog describes the available media tracks for a single participant. It's a JSON document that extends the W3C WebCodecs specification.

The catalog is published as a catalog.json track within the broadcast so it can be updated live as the participant's media tracks change. A participant MAY forgo publishing a catalog if it does not wish to publish any media tracks now and in the future.

The catalog track consists of multiple groups, one for each update. Each group contains a single frame with UTF-8 JSON.

A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism (via JSON Patch most likely).

Root

The root of the catalog is a JSON document with the following schema:

text

type Catalog = {
	"audio": AudioSchema | undefined,
	"video": VideoSchema | undefined,
	// ... any custom fields ...
}

Additional fields MAY be added based on the application. The catalog SHOULD be mostly static, delegating any dynamic content to other tracks.

For example, a "chat" section should include the name of a chat track, not individual chat messages. This way catalog updates are rare and a client MAY choose to not subscribe.

This specification currently only defines audio and video tracks.

Video

A video track contains the necessary information to decode a video stream.

text

type VideoSchema = {
	"renditions": Map<TrackName, VideoDecoderConfig>,
	"priority": u8,
	"display": {
		"width": number,
		"height": number,
	} | undefined,
	"rotation": number | undefined,
	"flip": boolean | undefined,
}

The renditions field contains a map of track names to video decoder configurations. See the WebCodecs specification for specifics and registered codecs. Any Uint8Array fields are hex-encoded as a string.

For example:

json

{
	"renditions": {
		"720p": {
			"codec": "avc1.64001f",
			"codedWidth": 1280,
			"codedHeight": 720,
			"bitrate": 6000000,
			"framerate": 30.0
		},
		"480p": {
			"codec": "avc1.64001e",
			"codedWidth": 848,
			"codedHeight": 480,
			"bitrate": 2000000,
			"framerate": 30.0
		}
	},
	"priority": 2,
	"display": {
		"width": 1280,
		"height": 720
	},
	"rotation": 0,
	"flip": false,
}

Audio

An audio track contains the necessary information to decode an audio stream.

text

type AudioSchema = {
	"renditions": Map<TrackName, AudioDecoderConfig>,
	"priority": u8,
}

The renditions field contains a map of track names to audio decoder configurations. See the WebCodecs specification for specifics and registered codecs. Any Uint8Array fields are hex-encoded as a string.

For example:

json

{
	"renditions": {
		"stereo": {
			"codec": "opus",
			"sampleRate": 48000,
			"numberOfChannels": 2,
			"bitrate": 128000
		},
		"mono": {
			"codec": "opus",
			"sampleRate": 48000,
			"numberOfChannels": 1,
			"bitrate": 64000
		}
	},
	"priority": 1,
}

Container

Audio and video tracks use a lightweight container to encapsulate the media payload.

Each moq-lite group MUST start with a keyframe. If codec does not support delta frames (ex. audio), then a group MAY consist of multiple keyframes. Otherwise, a group MUST consist of a single keyframe followed by zero or more delta frames.

Each frame starts with a timestamp, a QUIC variable-length integer (62-bit max) encoded in microseconds. The remainder of the payload is codec specific; see the WebCodecs specification for specifics.

For example, h.264 with no description field would be annex.b encoded, while h.264 with a description field would be AVCC encoded.

Security Considerations

TODO Security

IANA Considerations

This document has no IANA actions.

hang ​

Terminology ​

Discovery ​

Catalog ​

Root ​

Video ​

Audio ​

Container ​

Security Considerations ​

IANA Considerations ​

hang

Terminology

Discovery

Catalog

Root

Video

Audio

Container

Security Considerations

IANA Considerations