Project

General

Profile

WebRTC Conference » History » Version 34

Dan Pascu, 07/07/2017 10:07 PM

1 18 Adrian Georgescu
h1. SylkServer WebRTC Video Conference
2 2 Adrian Georgescu
3 20 Adrian Georgescu
https://webrtc-test.sipthor.net
4
5 4 Adrian Georgescu
6 2 Adrian Georgescu
h2. Design
7
8 12 Dan Pascu
Two types of conferences are being supported: ad-hoc conferences and moderated conferences.
9
10
h3. Ad-hoc conferences
11
12
An ad-hoc conference is a conference where all participants have the same status and no one is controlling what other are participants are doing. The participants are rendered in a matrix or up to 3x3 depending of how many participants are in the room. The layout switches automatically for everybody as participants join or leave.
13
14
The conference room has a fixed total bitrate configured by the server, that can be specified per room or globally with the max_bitrate setting in webrtcgateway.ini (see below). This bitrate is shared by all participants in the room, meaning that the more participants are in the room, the less bitrate each participant will use for the video stream they send, keeping the total room usage constant to the value configured by max_bitrate. The bitrate adjustment per participant is done automatically by sylkserver as participants join or leave the room, by diving the available bitrate among the number of participants. The end result of this is that each participant will send a fraction of max_bitrate (which is determined by the number of participants in the room) and will always receive a total combined of max_bitrate from all the participants in the room, no matter how many participants are in the room. The formula to compute the bitrate per participant is shown below:
15
16
<pre>
17
participant_send_bitrate = max_bitrate / max(number_of_participants - 1, 1)
18
</pre>
19
20
Using this formula we can make sure that each participant always receives max_bitrate traffic in incoming video streams, independent of the number of participants. The traffic send/received by each party can be expressed like (considering N to be the number of participants and N>1):
21
22
<pre>
23
participant_sent_traffic     = max_bitrate / (N - 1)
24
participant_received_traffic = max_bitrate
25
26 23 Dan Pascu
sylkserver_sent_traffic      = max_bitrate * N            (participant_received_traffic * N)
27
sylkserver_received_traffic  = max_bitrate * N / (N - 1)  (participant_sent_traffic     * N)
28 12 Dan Pascu
</pre>
29
30 14 Dan Pascu
h3. Moderated conferences
31
32
A moderated conference is a conference where a moderator can decide the flow of the conference. The moderator is the first participant to join the conference. The moderator has the ability to see a list with all the participants, can select 1 or 2 of them to be the active speakers and also has the ability to mute other participants (audio and/or video). The moderator can also change the active speakers at any time.
33 12 Dan Pascu
34 16 Dan Pascu
The other participants will see the selected active speakers in full-sized video and the other participants as thumbnails. They will not be able to choose which other participant to watch, the conference view in their browser will be controlled by the moderator that decides who is the active speaker that everybody else sees on their screen in full-sized video.
35
36
The active speakers selected by the moderator will have their bitrate set to either max_bitrate (for 1 active speaker) or max_bitrate/2 (for 2 active speakers), while everybody else will have their bitrate set to a low value (64kb/s), just enough to have them represented in small thumbnails on other participant's screens.
37 15 Dan Pascu
38 2 Adrian Georgescu
h2. Features
39
40 17 Dan Pascu
h3. Ad-hoc conferences
41 1 Adrian Georgescu
42 17 Dan Pascu
Ad-hoc conferences are best suited for conversations with family/friends, since bandwidth/bitrate is managed automatically and does not involve a dedicated person to control the flow of the conference. However they can also be used for any other video conferences that imply a free-flowing type of discussion where any participant can jump into the conversation at any time.
43 11 Adrian Georgescu
44 17 Dan Pascu
h3. Moderated conferences
45 11 Adrian Georgescu
46 17 Dan Pascu
Moderate conferences are best suited for a business environment, where participants have to make some sort of presentation in front of the other participants and a moderator is assigned to control the flow of the conference and give the microphone to the appropriate participant, while the others are just watching the active speaker. They can also be used for a conference with 2 active participants that are having a public debate on a subject, while every other participant is just watching it and eventually asking questions.
47 7 Adrian Georgescu
48 2 Adrian Georgescu
h2. Configuration
49
50 10 Dan Pascu
Sylkserver allows the maximum bitrate and video codec to be configured, globally or per room with the following settings in webrtcgateway.ini file:
51
52
<pre>
53
; Maximum video bitrate allowed per sender in a room in bits/s. This value is
54
; applied to any room that doesn't define its own. The value is any integer
55
; number between 64000 and 4194304. Default value is 2016000 (~2Mb/s).
56
; max_bitrate = 2016000
57
58
; The video codec to be used by all participants in a room. This value is
59
; applied to any room that doesn't define its own.
60
; Possible values are: h264, vp8 and vp9. Default is vp9.
61
; video_codec = vp9
62
</pre>
63
64
65 1 Adrian Georgescu
h2. Client support
66
67 25 Dan Pascu
h2. Things that were explored
68
69 28 Dan Pascu
In order to implement bandwidth management and CPU load optimizations we have explored a couple of things, some of which proved fruitful, while with others were abandoned or proved to be not very helpful for our goal.
70
71 29 Dan Pascu
The original idea we started with was to have each client send two video streams, one low resolution, one high resolution and let the other participants switch between them based on their need (use the high resolution video if the participant was viewed in full or the low resolution video if he was displayed as a thumbnail. As we progressed, we quickly discovered that this setup was a lot more complicated to manage than we have anticipated. Every participant would open 2 sessions to the conference room just to publish their low and high resolution streams, which made them appear duplicated in the conference. Special means needed to be employed to associate two such distinct sessions coming from the same device and present them as a single entity. This had to be done in each client, which meant that older clients would not be able to deal with this setup and they would automatically display every participant duplicated.
72
73
In addition this setup would increase the upload bandwidth of each participant 1.5 times, going against the idea of reducing the used bandwidth.
74
75
The advantages of this model were the reduced download bandwidth and reduced CPU utilization that resulted from only having to process one high resolution video stream while all the other video streams would be low resolution, which were overshadowed by the higher upload bandwidth being used, by the more complicated room management that was required to deal with devices connecting twice per participant in the room and by the inability to have older devices join such a conference room.
76
77
While we were working on this we also run into a technical limitation on Firefox, which was unable to provide 2 video streams of different resolutions at the same time. When we tried to obtain 2 video streams, one low resolution one high resolution, the moment we requested the second stream with a different resolution, the first stream's resolution was updated to match the second and we ended up with 2 streams with the same resolution. This was a limitation in Firefox that we couldn't overcome, so at this point in addition to the issues mentioned above with this mode, we were also facing the prospect of dropping Firefox support and only have our solution work with Chrome.
78
79
While we were contemplating our choices here we discovered that there was a mechanism by which a WEBRTC client could be constrained to limit its sending bandwidth and this mechanism could be employed dynamically during a call to make the device's sending bitrate high or low as desired without any need to renegotiate the session. This mechanism uses REMB packets which are control packets sent through RTCP and will make a browser adjust its send bitrate on the fly as requested. The good news was that both Chrome and Firefox supported this. This bit of information changed everything and we realized we could use this to build a better solution, which was a lot less complicated and more effective.
80
81
At the same time we realized that the initial model that the webrtc client used, where in a conference room the client would display one participant in full and the others as thumbnails, and then let the user switch which participant to view by clicking on a thumbnail to display that participant in full, was not very useful for a large category of uses, namely users having a group video chat with friends/family. In this case the user is not expected to click a thumbnail to switch to another participant in the call and only be able to see one participant at a time.
82
83
As a result we went on and decided to give up on the original idea with 2 streams of different resolution per participant and completely change our model. As a result of this we came up with the 2 models mentioned before: the ad-hoc conference model and the moderated conference model.
84
85
h3. The ad-hoc conference model
86
87
The ad-hoc conference mode was supposed to be used for a group chat with friends/family where one expects to see all the other participants on the screen at the same time and any participant can jump into the conversation at any time. In this model we decided to display all participants in a matrix, so everyone is visible at any time. Initially the matrix is just 1x1 when there are just 1 or 2 people in the room, but it can grow up to a 3x3 matrix that can accommodate up to 10 participants (9+yourself). This model proved to be favored by the idea of using REMB to limit send bitrate, because the more participants on screen, the smaller their video would be, which aligned perfectly with the idea of having a constant room bitrate that is shared by all participants: the more participants, the lower their bitrate would be and also the lower their video frame would be on screen compensating for the reduced quality of their video stream.
88
89
In order to compare the bandwidth used by this mode and the original model we attempted (the one with 2 video streams per participant, one lowres, one hires), lets consider the bitrate used by an HD stream (1280x720) playing at 30fps. This bitrate is ~2.0-2.4Mb/s, and let's call this B. We have found that for a thumbnail sized video stream of 180x120 pixels at 30 fps, the bitrate requirement was still very high, in the range or B/3 to B/2. As a result in the original model each participant had to send anywhere between 1.2*B to 1.5*B. At the same time, because only one participant was big on screen and all others were thumbnails, each participant would receive B + (N-1)*B/2 = B*(N+1)/2, where N is the number of participants. In the ad-hoc conference model, as mentioned before, each participant receives B and sends B/(N-1).
90
91
In order to compare these numbers, lets consider B = 2Mb/s and N=9.
92
93
In the original model, each participant would have sent 1.5*2 = 3Mb/s and would have received 2*(9+1)/2 = 10Mb
94
In the ad-hoc conference model, with B being set as the room maximum bitrate, each participant would send 2/(9-1) = 0.25Mb/s and would receive 2Mb/s
95
96
These numbers show how the ad-hoc conference model with controlled bitrate per participant is a lot more competitive as far as bandwidth management goes compared to the original model we started with, being 5-12 times more efficient in the amount of data sent/received.
97
98
In addition the ad-hoc conference also provides a much better user experience allowing all participants to be visible on screen at once.
99
100
Another thing we noticed with Chrome, while using VP8 as a codec, was that with more than 3 participants in a room, Chrome started to dynamically adjust the resolution of the video being sent, fluctuating between HD and VGA resolution, depending bitrate it was allowed to use and the amount of movement in the encoded video stream. This was an added bonus because it meant that with more participants in a room that would impose a lower bitrate value per participant we expected Chrome to to this more often, and thus we could achieve not only improved network bandwidth usage, but also lower CPU usage.
101
102
Unfortunately Firefox did not have this behavior, Firefox would maintain the original resolution value requested when the stream started for the whole duration of the call regardless of the bitrate limitation being imposed on it. In order to compensate for this we tried to request resolution adjustments based on the number of participants in the room, in order to reduce the CPU usage when the number of participants in a room increases. Unfortunately this did not prove successful, because doing this does not yield reliable results. Sometimes Firefox will switch resolutions without a problem, some other times the camera will attempt to switch resolutions and will not reopen at the new resolution, which results in the video stream not being sent anymore (it freezes on the last frame before the resolution change was attempted). This result seems random and we could not determine what causes it or how to fix it. It is also worth mentioning that this is a problem we noticed with Firefox running on OSX on a Macbook Pro with a built-in camera. We do not know if a similar problem exists for external cameras or on different operating systems (Linux or Windows).
103
104
Still the idea of exploring this feature is still open because it is a much better solution than Chrome's automatic resolution adjustment, because it yields more reliable and consistent results. Chrome switches resolution based on other factors than just bitrate and it doesn't seem to do it often enough to be effective. In addition we have found that with VP9 as a codec, Chrome would not lower the resolution of a video scream even when the bitrate is as low as 256Kb/s.
105
106
h3. The moderated conference model
107 28 Dan Pascu
108 30 Dan Pascu
Since the ad-hoc model is not best suited for every application, we also considered the moderated conference model. In this model a moderator would control the flow of the conference. The moderator is the first participant that joins the conference. The moderator would be able to see a list with all the participants, decide who is the active speaker and mute audio/video per participant when needed. In this model only 1 or at most 2 participants can be active speakers at a time and who they are is decided by the moderator. Participants cannot select what other participants they see on screen. This is decided by the moderator which selects the active participants that will be shown on everyone's screens, while all others are shown as thumbnails.
109
110
With 1 active speaker the conference is suitable for cases like when some people need to give a speech or show a presentation for others to watch. In this case the moderator simply switches the active participant by giving the next speaker their stage time.
111
112
With 2 active speakers at the same time, the conference can be used for example for having a public debate on a subject, where the active speakers debate the subject while the rest of the participants just watch the debate, or ask questions if needed.
113
114
In this model, each active speaker will have their bitrate limited by max_bitrate / number_of_active_speakers, while everyone else will just have a very low bitrate value (64Kb/s) so they can be displayed as thumbnails.
115
116
Considering B the bitrate for an HD stream @30fps, N the number of participants in the conference and AS the number of active speakers:
117
118
Each active speaker will send B/AS
119
Everyone else will send a constant 64Kb/s
120
Everyone in the room will receive B + (N-AS)*64Kb/s
121
122
For B=2Mb/s, N=10, AS=2 we have:
123
Each active speaker send 1Mb/s
124
Everyone else sends 64Kb/s = 0.064Mb/s
125
Everyone in the room will receive 2Mb/s + (10-2) * 0.064Mb/s = 2.512Mb/s
126
127
As can be seen, these numbers also show that the moderated conference model is also a lot more efficient that the original model with 2 streams per participant.
128
129 31 Dan Pascu
h3. Mobile device considerations
130
131 32 Dan Pascu
Because mobile devices have both more limited resources and more limited screen space available, we consider using the following technique for small mobile devices:
132
133
For both ad-hoc and moderated conferences, the mobile client will only display 1 or at most 2 participants in full view. For a moderated conference they are already decided by the moderator, while for an ad-hoc conference the user can select 1-2 of the participants to be seen. For the other participants the device will pause their video streams and not show thumbnails for them, but instead show them as static icons or just display them in a list. By doing this, the mobile device not only prevents screen clutter allowing for a more efficient use of the limited screen space, but by pausing the other participant's video streams, it will dramatically reduce it's CPU usage because it will not need to receive and decode their video streams just to display them as thumbnails.
134
135
By using this technique, a mobile device will only have to deal with decoding and displaying 1 or at most 2 video streams which is fully within the device's processing capabilities, regardless how many participants are in the conference room.
136
137 1 Adrian Georgescu
h2. Measurements
138
139 21 Adrian Georgescu
These load measurements were done on a Macbook Pro 15" with a 2.3GHz Intel Core I7 CPU, while having 7 participants in the room with each using 336Kb/s. The measurement shows the CPU usage in Firefox web browser with the aforementioned conditions, for the specified video codecs and resolutions which are used by all participants:
140 1 Adrian Georgescu
141 10 Dan Pascu
<pre>
142
 * H264/VGA - 150% CPU
143
 * H264/HD  - 250% CPU
144
 * VP9/VGA  - 220% CPU
145
 * VP9/HD   - 350% CPU
146
</pre>
147 6 Adrian Georgescu
148 31 Dan Pascu
As far as CPU utilization goes, most efficient codec is H264 (presumably because it has hardware accelerated support on a lot of devices), followed by VP9 and last is VP8.
149 6 Adrian Georgescu
150 31 Dan Pascu
In a conference with 2 participants both sending HD video (1280x720 @30fps), on the same laptop mentioned above we noticed the following CPU load values in Firefox:
151
152
<pre>
153
 * VP8  - 130% CPU
154
 * VP9  - 100-110% CPU
155
 * H264 - 50-70% CPU
156
</pre>
157
158 1 Adrian Georgescu
h2. Conclusions
159
160 34 Dan Pascu
We consider that the ad-hoc and moderated conference models offer much better results that the original two-streams-per-participant idea. In addition not only do they offer a better and more natural user interface, they also allow for more control from the server that can decide both the codec to be used and the bitrate limit per room, thus controlling the quality of the call in a single place.
161 1 Adrian Georgescu
162
For now we consider a room with a 2Mb/s bitrate limit using VP9 to be the best compromise between quality and resources being used. For the moment we cannot recommend H264 despite the huge improvement it would provide, especially for mobile clients, because we have found some compatibility issues for the mobile clients, where the mobile client would display a green screen for any incoming video stream with H264.
163 33 Dan Pascu
164
h2. Remaining tasks
165
166
 * sylkserver: control and feedback interface for moderator
167
 * janus: patch to request full frames when a paused video is resumed
168
 * Rebuild mobile version
169 22 Dan Pascu
170
h2. Software that was modified
171 26 Dan Pascu
172
In order to implement the bandwidth management and CPU load optimizations the following software was modified:
173
174
# sylkserver https://github.com/AGProjects/sylkserver
175
# sylk-webrtc https://github.com/AGProjects/sylk-webrtc
176 1 Adrian Georgescu
# sylkrtc.js https://github.com/AGProjects/sylkrtc.js
177 27 Dan Pascu
# python-application https://github.com/AGProjects/python-application
178
# python-sipsimple https://github.com/AGProjects/python-sipsimple