Project

General

Profile

WebRTC Conference » History » Revision 29

Revision 28 (Dan Pascu, 07/07/2017 07:53 PM) → Revision 29/59 (Dan Pascu, 07/07/2017 09:23 PM)

h1. SylkServer WebRTC Video Conference 

 https://webrtc-test.sipthor.net 


 h2. Design 

 Two types of conferences are being supported: ad-hoc conferences and moderated conferences. 

 h3. Ad-hoc conferences 

 An ad-hoc conference is a conference where all participants have the same status and no one is controlling what other are participants are doing. The participants are rendered in a matrix or up to 3x3 depending of how many participants are in the room. The layout switches automatically for everybody as participants join or leave. 

 The conference room has a fixed total bitrate configured by the server, that can be specified per room or globally with the max_bitrate setting in webrtcgateway.ini (see below). This bitrate is shared by all participants in the room, meaning that the more participants are in the room, the less bitrate each participant will use for the video stream they send, keeping the total room usage constant to the value configured by max_bitrate. The bitrate adjustment per participant is done automatically by sylkserver as participants join or leave the room, by diving the available bitrate among the number of participants. The end result of this is that each participant will send a fraction of max_bitrate (which is determined by the number of participants in the room) and will always receive a total combined of max_bitrate from all the participants in the room, no matter how many participants are in the room. The formula to compute the bitrate per participant is shown below: 

 <pre> 
 participant_send_bitrate = max_bitrate / max(number_of_participants - 1, 1) 
 </pre> 

 Using this formula we can make sure that each participant always receives max_bitrate traffic in incoming video streams, independent of the number of participants. The traffic send/received by each party can be expressed like (considering N to be the number of participants and N>1): 

 <pre> 
 participant_sent_traffic       = max_bitrate / (N - 1) 
 participant_received_traffic = max_bitrate 

 sylkserver_sent_traffic        = max_bitrate * N              (participant_received_traffic * N) 
 sylkserver_received_traffic    = max_bitrate * N / (N - 1)    (participant_sent_traffic       * N) 
 </pre> 

 h3. Moderated conferences 

 A moderated conference is a conference where a moderator can decide the flow of the conference. The moderator is the first participant to join the conference. The moderator has the ability to see a list with all the participants, can select 1 or 2 of them to be the active speakers and also has the ability to mute other participants (audio and/or video). The moderator can also change the active speakers at any time. 

 The other participants will see the selected active speakers in full-sized video and the other participants as thumbnails. They will not be able to choose which other participant to watch, the conference view in their browser will be controlled by the moderator that decides who is the active speaker that everybody else sees on their screen in full-sized video. 

 The active speakers selected by the moderator will have their bitrate set to either max_bitrate (for 1 active speaker) or max_bitrate/2 (for 2 active speakers), while everybody else will have their bitrate set to a low value (64kb/s), just enough to have them represented in small thumbnails on other participant's screens. 

 h2. Features 

 h3. Ad-hoc conferences 

 Ad-hoc conferences are best suited for conversations with family/friends, since bandwidth/bitrate is managed automatically and does not involve a dedicated person to control the flow of the conference. However they can also be used for any other video conferences that imply a free-flowing type of discussion where any participant can jump into the conversation at any time. 

 h3. Moderated conferences 

 Moderate conferences are best suited for a business environment, where participants have to make some sort of presentation in front of the other participants and a moderator is assigned to control the flow of the conference and give the microphone to the appropriate participant, while the others are just watching the active speaker. They can also be used for a conference with 2 active participants that are having a public debate on a subject, while every other participant is just watching it and eventually asking questions. 

 h2. Configuration 

 Sylkserver allows the maximum bitrate and video codec to be configured, globally or per room with the following settings in webrtcgateway.ini file: 

 <pre> 
 ; Maximum video bitrate allowed per sender in a room in bits/s. This value is 
 ; applied to any room that doesn't define its own. The value is any integer 
 ; number between 64000 and 4194304. Default value is 2016000 (~2Mb/s). 
 ; max_bitrate = 2016000 

 ; The video codec to be used by all participants in a room. This value is 
 ; applied to any room that doesn't define its own. 
 ; Possible values are: h264, vp8 and vp9. Default is vp9. 
 ; video_codec = vp9 
 </pre> 


 h2. Client support 

 h2. Things that were explored 

 In order to implement bandwidth management and CPU load optimizations we have explored a couple of things, some of which proved fruitful, while with others were abandoned or proved to be not very helpful for our goal. 

 The original idea we started with was to have each client send two video streams, one low resolution, one high resolution and let the other participants switch between them based on their need (use the high resolution video if the participant was viewed in full or the low resolution video if he was displayed as a thumbnail. As we progressed, progressed we quickly discovered that the logistics of managing this setup was a lot more complicated to manage than we have anticipated. Every participant would open 2 sessions to the conference room just to publish their low and high resolution streams, which made them appear duplicated in the conference. Special means needed to be employed to associate two such distinct sessions coming from the same device and present them as a single entity. This had to be done in each client, which meant that older clients would not be able to deal with this setup and they would automatically display every participant duplicated. 

 In addition this setup would increase the upload bandwidth of each participant 1.5 times, going against the idea of reducing the used bandwidth. 

 The advantages of this model were the reduced download bandwidth and reduced CPU utilization that resulted from only having to process one high resolution video stream while all the other video streams would be low resolution, which were overshadowed by the higher upload bandwidth being used, by the more complicated room management that was required to deal with devices connecting twice per participant in the room and by the inability to have older devices join such a conference room. 

 While we were working on this we also run into a technical limitation on Firefox, which was unable to provide 2 video streams of different resolutions at the same time. When we tried to obtain 2 video streams, one low resolution one high resolution, the moment we requested the second stream with a different resolution, the first stream's resolution was updated to match the second and we ended up with 2 streams with the same resolution. This was a limitation in Firefox that we couldn't overcome, so at this point in addition to the issues mentioned above with this mode, we were also facing the prospect of dropping Firefox support and only have our solution work with Chrome. 

 While we were contemplating our choices here we discovered that there was a mechanism by which a WEBRTC client could be constrained to limit its sending bandwidth and this mechanism could be employed dynamically during a call to make the device's sending bitrate high or low as desired without any need to renegotiate the session. This mechanism uses REMB packets which are control packets sent through RTCP and will make a browser adjust its send bitrate on the fly as requested. The good news was that both Chrome and Firefox supported this. This bit of information changed everything and we realized we could use this to build a better solution, which was a lot less complicated and more effective. 

 At the same time we realized that the initial model that the webrtc client used, where in a conference room the client would display one participant in full and the others as thumbnails, and then let the user switch which participant to view by clicking on a thumbnail to display that participant in full, was not very useful for a large category of uses, namely users having a group video chat with friends/family. In this case the user is not expected to click a thumbnail to switch to another participant in the call and only be able to see one participant at a time. 

 As a result we went on and decided to give up on the original idea with 2 streams of different resolution per participant and completely change our model. As a result of this we came up with the 2 models mentioned before: the ad-hoc conference model and the moderated conference model. 

 h3. The ad-hoc conference model 

 The ad-hoc conference mode was supposed to be used for a group chat with friends/family where one expects to see all the other participants on the screen at the same time and any participant can jump into the conversation at any time. In this model we decided to display all participants in a matrix, so everyone is visible at any time. Initially the matrix is just 1x1 when there are just 1 or 2 people in the room, but it can grow up to a 3x3 matrix that can accommodate up to 10 participants (9+yourself). This model proved to be favored by the idea of using REMB to limit send bitrate, because the more participants on screen, the smaller their video would be, which aligned perfectly with the idea of having a constant room bitrate that is shared by all participants: the more participants, the lower their bitrate would be and also the lower their video frame would be on screen compensating for the reduced quality of their video stream. 

 In order to compare the bandwidth used by this mode and the original model we attempted (the one with 2 video streams per participant, one lowres, one hires), lets consider the bitrate used by an HD stream (1280x720) playing at 30fps. This bitrate is ~2.0-2.4Mb/s, and let's call this B. We have found that for a thumbnail sized video stream of 180x120 pixels at 30 fps, the bitrate requirement was still very high, in the range or B/3 to B/2. As a result in the original model each participant had to send anywhere between 1.2*B to 1.5*B. At the same time, because only one participant was big on screen and all others were thumbnails, each participant would receive B + (N-1)*B/2 = B*(N+1)/2, where N is the number of participants. In the ad-hoc conference model, as mentioned before, each participant receives B and sends B/(N-1). 

 In order to compare these numbers, lets consider B = 2Mb/s and N=9. 

 In the original model, each participant would have sent 1.5*2 = 3Mb/s and would have received 2*(9+1)/2 = 10Mb 
 In the ad-hoc conference model, with B being set as the room maximum bitrate, each participant would send 2/(9-1) = 0.25Mb/s and would receive 2Mb/s 

 These numbers show how the ad-hoc conference model with controlled bitrate per participant is a lot more competitive as far as bandwidth management goes compared to the original model we started with, being 5-12 times more efficient in the amount of data sent/received. 

 In addition the ad-hoc conference also provides a much better user experience allowing all participants to be visible on screen at once. 

 Another thing we noticed with Chrome, while using VP8 as a codec, was that with more than 3 participants in a room, Chrome started to dynamically adjust the resolution of the video being sent, fluctuating between HD and VGA resolution, depending bitrate it was allowed to use and the amount of movement in the encoded video stream. This was an added bonus because it meant that with more participants in a room that would impose a lower bitrate value per participant we expected Chrome to to this more often, and thus we could achieve not only improved network bandwidth usage, but also lower CPU usage. 

 Unfortunately Firefox did not have this behavior, Firefox would maintain the original resolution value requested when the stream started for the whole duration of the call regardless of the bitrate limitation being imposed on it. In order to compensate for this we tried to request resolution adjustments based on the number of participants in the room, in order to reduce the CPU usage when the number of participants in a room increases. Unfortunately this did not prove successful, because doing this does not yield reliable results. Sometimes Firefox will switch resolutions without a problem, some other times the camera will attempt to switch resolutions and will not reopen at the new resolution, which results in the video stream not being sent anymore (it freezes on the last frame before the resolution change was attempted). This result seems random and we could not determine what causes it or how to fix it. It is also worth mentioning that this is a problem we noticed with Firefox running on OSX on a Macbook Pro with a built-in camera. We do not know if a similar problem exists for external cameras or on different operating systems (Linux or Windows). 

 Still the idea of exploring this feature is still open because it is a much better solution than Chrome's automatic resolution adjustment, because it yields more reliable and consistent results. Chrome switches resolution based on other factors than just bitrate and it doesn't seem to do it often enough to be effective. In addition we have found that with VP9 as a codec, Chrome would not lower the resolution of a video scream even when the bitrate is as low as 256Kb/s. 

 h3. The moderated conference model anticipated 

 h2. Measurements 

 These load measurements were done on a Macbook Pro 15" with a 2.3GHz Intel Core I7 CPU, while having 7 participants in the room with each using 336Kb/s. The measurement shows the CPU usage in Firefox web browser with the aforementioned conditions, for the specified video codecs and resolutions which are used by all participants: 

 <pre> 
  * H264/VGA - 150% CPU 
  * H264/HD    - 250% CPU 
  * VP9/VGA    - 220% CPU 
  * VP9/HD     - 350% CPU 
 </pre> 


 h2. Remaining tasks 

  * sylkserver: control interface for moderator 
  * janus: patch to request full frames when a paused video is resumed 
  * Rebuild mobile version 

 h2. Conclusions 

 h2. Software that was modified 

 In order to implement the bandwidth management and CPU load optimizations the following software was modified: 

 # sylkserver https://github.com/AGProjects/sylkserver 
 # sylk-webrtc https://github.com/AGProjects/sylk-webrtc 
 # sylkrtc.js https://github.com/AGProjects/sylkrtc.js 
 # python-application https://github.com/AGProjects/python-application 
 # python-sipsimple https://github.com/AGProjects/python-sipsimple