WebRTC Signaling Explained: Making Video Calls Seamless

What is Signaling in WebRTC🤔?

Signaling is the first step in WebRTC in which peers exchange necessary information to establish peer-to-peer connections.

Why is Signaling required?

As two strangers, let's say a gentleman and a lady don't know each other. At least to start a conversation they need to know something about each other, for example, name, what they do, etc...

Similarly, one device doesn't have any information about another device, like where that device exists in the network and what data they will communicate (audio, video, or any other media).

An entity is required that is known by both devices which are trying to connect. That knowing entity is signaling, which will help to establish the connection between these two devices.

Let's have an overview with a silly example:

Let's say a gentleman and a lady want to do a video chat📳. Now none of their devices know where is another device over the internet.

Sending these details to other peers is where the signaling server comes into the picture.

As you can see from the above diagram signaling server sends details to a lady and a gentleman. What exactly are these details and how is it being sent?

These details are like IP address, port, media type, codec, and many more computer gibberish information that I don't understand.

These details are being sent in a specific format known as SDP(Session Description Protocol).

What is this SDP?

SDP is a text-based format that acts as a blueprint for a multimedia session(defined in RFC 8866 ). It essentially describes the technical details required for the communication to take place.

SDP is a key component of WebRTC signaling. We discussed WebRTC in more detail in a previous blog post "Just Curious about WebRTC...": https://blog.denilgabani.com/just-curious-about-webrtc

SDP message is made up of key/value pairs and contains a list of “media sections”. The SDP that the two WebRTC agents exchange contains details like:

The IPs and Ports that the agent is reachable on (candidates).
The number of audio and video tracks the agent wishes to send.
The audio and video codecs each agent supports.
The values used while connecting (uFrag/uPwd).
The values used while securing (certificate fingerprint).

In what sequence is this SDP message being exchanged?

Offer-SDP
- One WebRTC agent (A Gentleman) initiates the connection by creating an SDP offer using the RTCPeerConnection.createOffer() method.
- This offer-SDP contains details about Gentleman's media capabilities, including supported codecs, bandwidth limitations (if any), and preferred connection methods.
Signaling
- The offer-SDP is then sent to the other WebRTC agent (A lady) through a signaling server.
Answer-SDP
- A lady receives the offer-SDP and analyzes it to determine if it can accommodate the proposed media session.
- If possible, lady creates an answer-SDP using the RTCPeerConnection.createAnswer() method. This answer-SDP specifies how lady will respond to the offer, potentially indicating codec preferences or bandwidth limitations.
Signaling
- The answer-SDP is sent back to A Gentleman through the same signaling server.
(Possible) Renegotiation
- In some cases, Gentleman might need to send a revised offer-SDP based on the information in Lady's answer-SDP. This renegotiation can occur if, for example, no common codec is found initially. The process would then return to step 2 (Signaling) with the revised offer.
WebRTC Connection Established
- Once both parties have exchanged SDP messages and agreed on the session parameters, a WebRTC connection is established, and media data can start flowing between the peers.

Checkout SDP Demo

Do you want to see how SDP looks and how it works? Check out a demo created by divanov11:
https://divanov11.github.io/WebRTC-Simple-SDP-Handshake-Demo/

Don't worry; we don't have to manually paste this SDP offer. We'll handle it through code. Just have some patience.

Now as you can see in this demo, the connection is established after the SDP answer is pasted. However, there is a catchhhhhh.

Many times these peer devices are behind firewalls, corporate networks, or private networks, etc... making it hard to get the public IP with which other peers can connect.

To tackle this situation ICE(Interactive Connectivity Establishment) candidate exchange is required.

What exactly is this ICE(Interactive Connectivity Establishment)?

ICE is crucial for traversing NATs (Network Address Translators) and establishing direct peer-to-peer connections whenever possible.

ICE allows peers to discover their public IP addresses and ports by gathering ICE candidates, which include host addresses, server reflexive addresses obtained through STUN (Session Traversal Utilities for NAT), and relayed addresses obtained through TURN (Traversal Using Relay NAT).

ICE agents on each peer attempt to establish connectivity using these candidates.

During the signaling process, each peer includes their gathered ICE candidates within their SDP messages.

What is STUN (Session Traversal Utilities for NAT)?

STUN servers play a key role in ICE by helping peers discover their public IP addresses and ports as seen from the server's perspective.

Peers send STUN requests to STUN servers, which respond with the public IP address and port.

This information is crucial for determining if direct peer-to-peer communication is possible or if a relay (TURN) server is needed.

What is TURN (Traversal Using Relay NAT)?

In cases where direct peer-to-peer communication is not feasible due to firewall or NAT restrictions, peers can use TURN servers.

TURN servers act as intermediaries, relaying media streams between peers when direct communication is not possible.

Peers exchange relayed addresses obtained from the TURN server via SDP.

These were too many whats and hows for me. Let's see one last sequence diagram to understand this all at once.

In the Above diagram, TURN is not in the picture as the assumption is there are not many network restrictions.

TURN (Traversal Using Relay NAT) servers come into the picture in WebRTC communication when direct peer-to-peer communication is not possible due to network restrictions, such as symmetric NAT or strict firewalls.

TURN servers act as intermediaries, relaying media streams between peers when direct communication is not feasible.

References

Yeah! That's it for today. I want to now make a simple WebRTC room where people can join in video calls and thinking about making the next blog for the same. Less theory more code!!.

If you find some corrections or something useful to mention then feel free to comment💬 under the blog.

To stay updated and learn together follow me onDenil Gabani .

denilgabani