Understanding the WebRTC Protocol

What is WebRTC?

WebRTC (Web Real-Time Communication) is one of the most popular technologies web apps and websites use to capture and stream audio-video content. It is also widely used to exchange data peer-to-peer without any intermediaries.

WebRTC is essentially an HTML5 specification that enables data exchange and peer-to-peer conferencing without any plug-in installation or third-party software. It comprises a set of standards, interworking APIs, and other protocols that must work in tandem.

Compared to its predecessors, Flash being the most notable of them, WebRTC provides a more secure, low-latency, and frictionless experience that can be easily generated in any digital ecosystem.

Simply put, WebRTC facilitates audio-video communication within webpages without you having to install any plug-ins.

A brief history of WebRTC

It all started when Google bought Global IP Solutions, a VoIP software firm that had already created many elements (like codec and echo cancellation protocols) necessary for RTC. Google made this tech open-source and collaborated with the IETF (Internet Engineering Task Force) and W3C (World Wide Web Consortium) to create consensus within the industry.

In 2011, Google released the open-source project we know as WebRTC. Post-release, continuous work has been invested in updating the project and standardizing associated protocols in the IETF, as well as the required browser APIs in the W3C.

By 2016, around 2 billion browsers were WebRTC-enabled. Its popularity took off, especially during the COVID-19 pandemic, with people working from home and requiring seamless, instant connectivity via the internet.

In 2021, WebRTC received official standardization (Candidate Recommendation to Recommendation). By now, it is a widely used and highly recommended technology for audio-video communication and streaming.

Core features of WebRTC

The technology is completely open-source and free to use.
It comes embedded in browsers but allows customization for your projects.
The protocol is constantly evolving. Each updated WebRTC version seeks to be more user-centric, provide advanced features and be more reliable.
It is consistently used for other open-source projects as well as commercial products.
It facilitates chat, video calling, and peer-to-peer file sharing within browsers.
WebRTC is known for low bandwidth consumption and latency.
Secures data from source to destination during transfer. Components are encrypted, while JS apps are used via HTTPS and local hosts.

Common use cases of WebRTC

Peer-to-peer audio, video & screen sharing: This was the original purpose of WebRTC – peer-to-peer communication over the internet via audio and video calls. Its applications have expanded to include text chats, screen sharing, file sharing, and the like. Popular tools like Skype, GMeet, Slack, and Microsoft Teams use WebRTC. The protocol has also found popularity in EdTech and healthcare circles, especially in the post-COVID period.
File sharing: We’ve mentioned file sharing previously, but it deserves a standalone point. WebRTC is often used to share files in different formats, even outside audio-video connectivity. Think of WebTorrent, and you’ll know the kind of file-sharing software WebRTC can build.
IoT (Internet of Things): IoT devices use sensors to facilitate information exchange with other connected devices within a network. WebRTC helps with this data exchange (audio-video streaming) in real time. The applications here are endless in the real world – drones, nanny cams, doorbells, baby monitors, home cameras, etc.
Real-time language processing: Live closed captions, automatic translations, transcriptions – all central to language processing. By combining the HTML5 Speech API with WebRTC data channels, transcripts can be generated and sent across platforms in real time. You’ll see much of this in YouTube and Google Meet when you enable the closed caption option. Similarly, WebRTC is exceptionally useful in enabling other language processing functions.

In everyday usage, you’ll see WebRTC pop up in the following scenarios (and more):

Voice or video calling, whether one-on-one or in groups;
Watch parties, like on Netflix;
Live shopping on eCommerce and retail sites;
Video communication on telehealth apps and sites between doctors and patients;
Online classes in the EdTech sector;
Virtual events such as webinars, large meetings, and online events;
Low latency broadcasting of live events such as sports;
Interactive sessions with large audiences at industry-best latency levels;
Online gaming in which visuals are rendered in the cloud and sent out to the gamers;
Virtual spaces like the Metaverse in which your avatars are rendered in 2D or 3D format within a digital environment

Key components of WebRTC

WebRTC works via multiple interworking APIs, including MediaStream, RTCPeerConnection, and RTCDataChannel.

MediaStream: This interface lets you capture media streams from local input mechanisms – cameras & microphones. It manages actions required from these streams, like recording, sending, resizing, readjusting stream quality, and displaying it to participants in a session. Any WebRTC-based app must request access from the streamer to access the data stream via the getUserMedia() method. They also need to specify if they need permission for either video or audio.

If you have access to a web server, you can test this method on a web page. Don’t open an HTML file within a browser; it won’t work due to security mechanisms that prevent connections with cameras/mics unless a real server loads them. You can, however, accomplish this with a simple Node.js server.

RTCPeerConnection: This object serves as the primary entry to the WebRTC API. It lets you trigger an online connection, connect to the right peers and grab the right media streams by attaching the right identifiers for the stream.

Generally, if you want to connect to another browser, it has to find where that other browser is located on the internet network. This location shows up as an IP address and port number – think of it as the other browser’s address.

The IP address of your device signals to the other device(s) you’re connecting to the location of your device so that they can exchange data directly. This address is what RTCPeerConnection uses as its foundation to operate.

RTCDataChannel: This API offers a transport mechanism for web browsers to exchange generic data peer-to-peer. It uses SCTP (Stream Control Transmission Protocol) to exchange data while working on an operational peer connection.

Each call to CreateDataChannel() triggers a new data channel carrying the current SCTP association.

A few other WebRTC interfaces are:

RTCDataChannelEvent: Triggers events that show up when connecting RTCDataChannel to an RTCPeerConnection.
RTCSessionDescription: Indicates the parameters of a WebRTC session. Each RTCSessionDescription comprises a description type revealing the offer/answer negotiation it describes and the session’s SDP descriptor.
RTCStatsReport: Details the statistics for a connection or specific track on the connection. Get the report by calling RTCPeerConnection.getStats(). For information on using WebRTC statistics, use the WebRTC Statistics API.
RTCIceCandidate: This shows a candidate Interactive Connectivity Establishment (ICE) server required to trigger an RTCPeerConnection.
RTCIceTransport: Offers data regarding ICE transport. RTCPeerConnectionIceEvent: Notifies events triggered related to ICE candidates as the target. This is usually an RTCPeerConnection.
RTCRtpSender: This interface controls the data encoding and transmission for a MediaStreamTrack on an RTCPeerConnection.
RTCRtpReceiver: This interface controls receiving and decoding data for a MediaStreamTrack on an RTCPeerConnection.
RTCTrackEvent: This interface represents a track event. That, in turn, signals that an RTCRtpReceiver object has been added to the RTCPeerConnection object. The result is that a new incoming MediaStreamTrack has been created and attached to the RTCPeerConnection.
RTCSctpTransport: This interface reveals data that explains the nature of a Stream Control Transmission Protocol (SCTP) transport. It also provides methods to access the underlying Datagram Transport Layer Security (DTLS) transport, on top of which SCTP packets for an RTCPeerConnection's data channels are exchanged.

Why WebRTC works

Here’s the crux: WebRTC has replaced long development times and C/C++ with a Javascript API. C/C++ also requires higher costs and effort, while WebRTC comes with a JS API layer you can leverage right with browsers. This simplifies the development and implementation of real-time communication in any digital environment.

Now, behind the scenes, WebRTC is implemented via C/C++, but as a developer, you won’t need to go that deep to build applications.

Additionally, WebRTC applies to all modern browsers – Chrome, Safari, Firefox, and Edge, among others. It can also be extracted and integrated into embedded devices or apps without browser usage.

Wrapping up

WebRTC is quickly becoming a frequent choice for any app that requires audio-video communication and data exchange. For example, Neverinstall uses WebRTC to facilitate application streaming through a user’s browser. This is largely due to WebRTC’s proven ability to stream at the lowest possible latency compared to other protocols like HLS and RTMP.

Naturally, we’ve picked the technology that allows for the least possible lag while streaming fully-functional desktop experiences via the browser. That means your experience on Neverinstall workspaces will be almost completely similar to your experience using your regular device to go about your day-to-day operations.