Video Conference Infrastructure Requirements

By Stan Baldwin
Updated: February 21, 2011

Before gathering up microphones, video cameras, audio mixers and other bits of hardware to plug into your network, a look at the capabilities of that network is wise. As in most things related to the internet, more bandwidth is always nice. Excess bandwidth can even mask shortcomings in other areas. For a good quality video conference, using the SIP or H.323 protocols, a symmetrical bandwidth of 1.5 to 3.0 Mb/sec will do nicely. Most internet connections, especially residential ones, are asymmetrical, the download speed is considerably faster than the upload speed. If a symmetrical connection is not available the frame rate of the video is likely to be affected.

Frame Rate and Resolution

A typical video conference uses 384kbps to 768kbps, this “pipe” must deliver a number of frames (pictures) per second at a particular resolution, along with the connection overhead required for internet transmission.

The higher the resolution and the greater the frame rate, the more bandwidth required. Some numbers for reference: typical resolution specifications are 640x480 pixels for webcams, 720x480 for NTSC cameras, and 720x576 for PAL cameras. Frame rates look best at 25-30 per second, but can be as slow as 7.5 frames per second and still be tolerable in some situations. The needs of a specific video conferencing setup will depend upon how that system will be used.

Codec for Video Conferencing

A codec (encoding /decoding algorithm) compresses the data at the transmission end and reconstitutes it at the receiving end. H.264/AVC is a popular industry standard codec which handles these tasks. Coding and decoding is computationally intensive and can easily stress the resources of a PC if, as is usual in low-end video communication systems, the PC is running the codec software. Hardware encoders are much faster than the the software versions and can make a noticeable quality difference in video conferences, especially across long distance or multi-hop connections.
It might seem strange but the audio part of the infrastructure needed to fully support video conferencing is possibly more important than the video. The reason for this is simple, and when you think about it, obvious. If, during a video conference, the picture goes away or becomes unstable, participants can simply drop back to a “conference call” and continue. However, if the sound goes away, or becomes unintelligible, there isn't much point in continuing the electronic connection.

Microphones

Human voice spends most of it's time around 2kHz, with a functional maximum frequency of about 7 kHz, so microphones offering frequency response from 20Hz to 20kHz are not worth the extra cost. But this doesn't mean cheap mics are the way to go either. If separate microphone(s) and speakers are used (rather than headsets) the out-going voice signal can be picked up by the microphone on the other end and come back slightly delayed (thanks to latency) from the original signal. The tell-tale echo means the system you're connected to is not using echo suppression, noise canceling microphones or has the microphone and speaker too close together.

Two people, using headsets and webcams can carry on a video conversation across the internet or PSTN with good reliability. Voice and video quality will serve the purpose, but are likely to have quality issues. As soon as the conference becomes more complex, involving more people and possibly multiple locations, the technical issues increase rapidly. The basic advice is: assure adequate symmetrical bandwidth and use dedicated video conferencing hardware. Appropriate equipment will handle the codec functions, include sophisticated microphones intended for video conferences, and feature cameras which can be adjusted for color and brightness, and physically controlled to optimize the quality of the picture as well as the content of the frame.

A final piece of advice; when setting up a video conference, have people at each end-point to check the quality of the video and voice. With their feedback you can make sure the system is going to deliver the desired experience before the participants “go live”.

Featured Research