What Is Real-Time Communication Infrastructure

Real-time communication infrastructure— often abbreviated as RTC infrastructure — is the backend layer that makes live messaging, voice, and video possible inside applications — handling the transport, delivery, storage, and synchronization of communication data across devices and networks in real time.

In simple terms, it is the system underneath the chat window and the video call — the servers, protocols, and services that determine whether communication is fast, reliable, compliant, and scalable.

At QuickBlox, we build and operate real-time communication infrastructure for development teams across telehealth, enterprise, and digital health. This page covers what that infrastructure actually consists of, why the architectural decisions around it matter more than most teams expect, and what separates infrastructure that holds up in production from infrastructure that needs to be replaced when the product scales.

Real-Time Communication (RTC) Infrastructure Explained

Most applications that handle real-time communication don’t expose their infrastructure to developers or users. It operates invisibly — messages arrive instantly, video calls connect, notifications land on locked screens — and the complexity underneath that experience is easy to underestimate until something goes wrong. Developers and architects often refer to this layer simply as RTC infrastructure — a shorthand that covers the full stack of servers, protocols, and services that communication-dependent applications depend on.

Real-time communication infrastructure is not a single system. It is a collection of interconnected layers, each handling a specific part of the communication stack. Understanding what those layers are and how they interact is the foundation of any sound architectural decision about how to build or procure communication capability.

Infrastructure layer	What it handles
Messaging infrastructure	Real-time message delivery, ordering, threading, conversation state, and persistent message storage
Signalling servers	Coordination layer that establishes connections between devices — negotiating how a call or session begins before media flows
Media servers	Processing and routing of audio and video streams for multi-party calls, recording, and transcription
TURN servers	Relay infrastructure that routes media traffic when a direct peer-to-peer connection cannot be established — typically due to firewalls or NAT
Push notification infrastructure	Delivery of alerts to devices when users are not active in the application
File and media storage	Secure storage, retrieval, and management of files, images, voice notes, and video shared within communication sessions
Data storage and message history	Persistent storage of conversation history, session metadata, and audit logs — with retention policies appropriate to the deployment context
Presence and user state	Real-time tracking of which users are online, typing, or available — the signals that make communication feel live

Each layer is a distinct engineering problem. Building one well is achievable. Building all of them well, maintaining them as the underlying protocols evolve, and keeping them compliant in regulated environments is the undertaking that most development teams decide not to own internally.

What Real-Time Communication Infrastructure Is Not

The term gets used loosely enough that it’s worth being precise about what real-time communication infrastructure is not — both to clarify the concept and to distinguish it from related things it’s often conflated with.

Not a communication API

A communication API is an interface your application calls to trigger communication events. The infrastructure is what the API sits in front of — the servers, protocols, and services that actually handle those events. The API is the access layer. The infrastructure is what it accesses. See What Is a Chat API? →

Not a communication SDK

An SDK is a client-side implementation layer — the libraries your application integrates to interact with the communication infrastructure from a user’s device. The SDK is how developers build on top of the infrastructure. It is not the infrastructure itself. See Chat API vs Messaging SDK: What’s the Difference? →

Not a chat app or video conferencing platform

WhatsApp, Slack, Zoom, and Teams are finished products built on communication infrastructure. They are the application layer — the user-facing product experience. Communication infrastructure is what sits underneath products like these, handling the transport, delivery, and storage that makes the product experience possible.

Not a single system

Real-time communication infrastructure is a collection of interconnected layers — messaging servers, signalling infrastructure, media servers, TURN servers, push notification delivery, file storage, and presence management — each handling a specific part of the communication stack. Treating it as a single component to be selected or replaced is one of the more common architectural misconceptions development teams encounter early in the build process.

Why Infrastructure Decisions Are Architectural Decisions

The choice of communication infrastructure is not a procurement decision that can be revisited easily once the product is built. It shapes the application’s architecture in ways that become increasingly expensive to change as the platform grows.

Decision area	Why it matters	What goes wrong when it’s made late
Identity and user model	Communication infrastructure maintains its own user representation — it needs to map cleanly to your application’s identity system	Misalignment between communication identity and application identity creates integration overhead that compounds as the platform scales
Data residency and compliance	Where messages, recordings, and audit logs live is determined by the infrastructure, not the application	Changing data residency after meaningful usage has accumulated requires migrating data across infrastructure boundaries — operationally complex and carrying its own compliance risk
Scalability profile	Different infrastructure components scale differently — what works at ten thousand concurrent users may not work at one hundred thousand	Scalability constraints are properties of early architectural choices, not problems that can be solved by adding capacity later
Vendor dependencies	SDKs integrate into client applications, APIs integrate into backend systems, and data accumulates in the infrastructure’s storage layer	Changing providers after meaningful usage has accumulated is significantly more complex than the initial integration

The Protocols Behind Real-Time Communication Infrastructure

Real-time communication infrastructure is built on a small set of open standards. Understanding them matters not because development teams need to implement them directly — that’s what the infrastructure layer handles — but because the architectural decisions those protocols shape have real downstream consequences for capability, compliance, and interoperability.

Protocol	What it handles	Where it sits
WebSockets	Persistent bidirectional connections between client and server — the foundation of real-time messaging	Client-to-server messaging layer
WebRTC	Real-time audio and video between browsers and mobile applications — peer-to-peer media exchange, encryption, adaptive quality	Video and voice communication layer
XMPP	Real-time messaging, presence, and multi-user conversations — the underlying messaging protocol QuickBlox uses	Messaging and presence layer
SRTP / DTLS	Encryption of media streams and data channels — mandatory in WebRTC implementations, not optional	Media security layer

WebSockets provide persistent, bidirectional connections between client applications and servers — the foundation of real-time messaging. Unlike HTTP, which opens a connection per request and closes it, a WebSocket connection stays open. The server pushes messages to clients as they arrive. That persistent connection is what makes the difference between messaging that feels instant and messaging that feels like email.

WebRTC is the open standard that handles real-time audio and video between browsers and mobile applications — peer-to-peer media exchange, encryption, and adaptive quality adjustment. It is the backbone of most production video implementations. One thing worth understanding about WebRTC in practice: it handles the media well, but it doesn’t handle everything. Session management, recording, group calls at scale, and TURN relay infrastructure all require additional components on top of the WebRTC layer. Teams that discover this mid-build tend to discover it expensively.

XMPP (Extensible Messaging and Presence Protocol) is the open standard QuickBlox uses as its underlying messaging protocol. The choice matters: XMPP was designed specifically for real-time messaging and presence — it handles message delivery, multi-user conversations, and online state natively rather than as bolt-on features. It is extensible, well-documented, and has been battle-tested across large-scale deployments for decades. When developers build on QuickBlox, they inherit that foundation rather than something assembled from general-purpose HTTP infrastructure.

SRTP and DTLS handle encryption of media streams and data channels in WebRTC implementations. Media is encrypted by default — not as an optional configuration, not as a compliance add-on. That default matters particularly in healthcare and enterprise environments where the assumption of unencrypted media, even briefly, creates exposure.

The Components in Detail

Messaging infrastructure

The messaging layer handles more than moving text between users. In production, it manages message ordering across unreliable networks, conversation threading across multiple participants, delivery confirmation and read state, offline queuing for users who aren’t connected, and synchronisation of message history across every device a user has. The persistent storage layer underneath it determines how long message history is retained, who can access it, and what audit trail exists for compliance purposes.

Signalling servers

Before a video or voice call can begin, the two devices involved need to agree on how to connect — what media formats they support, what network paths are available, and how to reach each other. Signalling servers coordinate this negotiation. They don’t carry media — that flows separately once the connection is established — but without them, real-time calls can’t begin. In WebRTC implementations, signalling is handled through a server that exchanges Session Description Protocol (SDP) messages between participants before the peer-to-peer connection is established.

Media servers

Peer-to-peer WebRTC works well for one-to-one calls. For group calls, recording, transcription, and simulcast — sending different quality streams to participants on different network conditions — a media server is required. Media servers receive streams from participants and redistribute them, handling the bandwidth management and encoding decisions that peer-to-peer connections can’t support at scale. QuickBlox provides media server infrastructure for group video sessions, recording, and AI-assisted transcription workflows.

TURN servers

When two devices try to establish a direct peer-to-peer connection, network conditions sometimes prevent it — corporate firewalls, strict NAT configurations, or mobile network restrictions can block direct media paths. TURN (Traversal Using Relays around NAT) servers act as relay points, routing media traffic between devices that can’t connect directly. Without TURN server infrastructure, a meaningful percentage of video calls fail to connect in real-world network environments. QuickBlox operates TURN server infrastructure as part of the video communication stack — ensuring calls connect reliably across the network conditions real users actually have.

Push notification infrastructure

Real-time messaging only reaches users who are active in the application. Push notifications are what reach everyone else — alerting users to new messages, incoming calls, or events that require their attention when the app is in the background or the device is locked. Push notification infrastructure integrates with platform-specific delivery systems (APNs for iOS, FCM for Android) and handles the reliability, delivery confirmation, and payload management that makes notifications land consistently across devices and operating system versions.

File and media storage

Users share more than text. Images, documents, voice notes, video clips, and attachments are a standard part of modern communication — and each one requires secure upload, storage, retrieval, and access control. In regulated environments, file storage carries the same compliance obligations as message storage: encryption at rest, access logging, retention policies, and BAA coverage where protected health information may be involved.

Presence and user state

Presence is the infrastructure that makes communication feel live — the online indicator, the typing signal, the last-seen timestamp. It requires real-time state management across all connected clients, with low-latency updates that reflect actual user activity rather than polling at intervals. In healthcare applications, presence has clinical implications: knowing whether a clinician is available before a patient initiates a consultation changes the workflow design of the entire communication layer.

Managed Infrastructure vs Building Your Own

The honest version of this decision looks different from how it’s usually framed.

	Building from scratch	Managed infrastructure
Control	Full control over every infrastructure layer	Control over the product experience built on top — less control over the infrastructure underneath
Maintenance burden	Owned entirely by your team — protocols evolve, platforms update, vulnerabilities surface	Carried by the provider — your team inherits updates through SDK and API versions
Compliance ownership	Your team designs, implements, and maintains compliance across every layer	Provider handles infrastructure compliance — your team verifies coverage and configures appropriately
Time to production	Months to years depending on scope and team size	Weeks for initial integration — faster path to a working implementation
best fit	Teams building communication as the core product differentiator, with engineering capacity to own the infrastructure long term	Teams where communication is a capability the product needs rather than the thing that makes the product distinctive

Building real-time communication infrastructure from scratch is not primarily a cost decision — it’s a commitment decision. Every layer in the table above is a distinct engineering problem that needs to be solved once, then maintained indefinitely. WebRTC implementations change as browsers update. XMPP extensions evolve. Push notification delivery requirements shift when Apple or Google changes their platform policies. Security vulnerabilities surface in media server software. TURN server configurations that work today need updating when network environments change.

None of that work is visible to users. None of it ships features. All of it needs to happen regardless of what else the development team is building.

The teams we see attempt to build communication infrastructure from scratch tend to fall into one of two groups. The first underestimates the scope, ships something that works at pilot scale, and hits a wall when the product grows. The second has the engineering capacity to build it properly — and eventually asks whether that capacity would have been better spent on the product itself.

Procuring managed communication infrastructure shifts that maintenance burden to the provider. The trade-off is real: less direct control over the infrastructure layer, a dependency on the provider’s reliability and roadmap, and the commercial relationship that comes with it. For teams where communication is a capability the product needs rather than the thing that makes the product distinctive, that trade-off is usually the right one. For teams building communication as the core product — a messaging platform, a video conferencing tool — the calculus is different.

The question worth answering before evaluating providers: which parts of the communication stack does your team actually need to own, and which parts are you owning by default because you haven’t asked the question yet?

Compliance and the Infrastructure Layer

Here is the compliance gap that surfaces most consistently in production deployments and that costs the most to fix after the fact: teams verify that their application is HIPAA compliant without verifying that the communication infrastructure underneath it carries the same coverage.

It’s an easy mistake to make. The application goes through legal review. The hosting environment is verified. The BAA gets signed. But the messaging processing layer — where messages are actually handled and routed — sits in infrastructure that isn’t covered. The media server that processes video streams isn’t covered. The file storage where attachments land isn’t covered. The audit log that would be needed if something went wrong doesn’t exist in a form that satisfies a compliance review.

None of that is visible during normal operation. It surfaces during audits, during incidents, and during the due diligence process when a healthcare organization evaluates the platform before signing a contract. By then, the cost of remediation — re-architecting a messaging layer, renegotiating vendor agreements, rebuilding audit infrastructure — is significantly higher than it would have been at the selection stage.

The practical implication: compliance needs to be verified at the infrastructure layer specifically, not inferred from the application’s overall compliance posture. Which components touch protected health information? Which of those are covered under a BAA? At which plan tier? Those questions need answers before the integration is built, not after it is running in production.

For a full treatment of compliance requirements in the context of messaging infrastructure, see What Is a HIPAA Compliant Chat API?

For video infrastructure compliance, see What Is HIPAA-Compliant Video Conferencing?

AI Agents and Communication Infrastructure

AI agents are increasingly operating within the same infrastructure layer as messaging and video — not as a separate system bolted on top, but as a component of the communication stack itself.

The infrastructure decision and the AI agent decision are not independent. AI agents that share the same backend infrastructure as chat and video — same identity model, same compliance coverage, same audit trail — integrate more cleanly and are easier to govern than agents assembled on top of infrastructure that wasn’t designed to support them. That shared foundation is what separates AI communication capability that works in production from AI capability that works in a demo.

For a broader introduction, see What Is an AI Agent? Healthcare AI Agent provides a healthcare-specific example.

The QuickBlox Perspective

Real-time communication infrastructure is one of the few areas where the gap between what works in a prototype and what holds up in production is genuinely large — and where the cost of discovering that gap late is genuinely high.

The teams that navigate this well share a consistent characteristic: they treat communication infrastructure as an architectural decision rather than a feature selection. They define their compliance requirements before evaluating providers. They model their scalability needs against production projections rather than pilot numbers. They think about the full stack — messaging, video, push notifications, file storage, media servers, TURN servers — rather than the two or three capabilities that are visible in the initial use case.

What changes when you think about it that way: the shortlist looks different. The evaluation criteria look different. And the integration decisions get made in the right order — compliance architecture before feature selection, infrastructure before application, scale requirements before vendor commitment.

QuickBlox provides managed real-time communication infrastructure covering the full stack: messaging, video, push notifications, file and media storage, media servers, and TURN server infrastructure — with HIPAA-compliant hosting, flexible deployment options including private cloud and on-premise, and a single BAA covering the complete communication layer.

Explore QuickBlox Chat API, QuickBlox Video Calling API, or browse the full QuickBlox API and SDK documentation →to see how the full infrastructure stack fits your specific deployment.

Common Questions About Real-Time Communication Infrastructure

What is RTC infrastructure?

RTC infrastructure is shorthand for real-time communication infrastructure — the backend systems that power live messaging, voice, and video inside applications. It covers everything from messaging servers and signalling infrastructure to media servers, TURN servers, push notifications, file storage, and presence management. The abbreviation is commonly used by developers and architects when discussing the communication layer of an application stack.

What components make up real-time communication infrastructure?

Most production communication infrastructure consists of messaging infrastructure, signaling servers, media servers, TURN servers, push notification delivery, file storage, data persistence, and presence management.

What is the difference between a signaling server and a media server?

A signaling server coordinates how a call begins — negotiating connection parameters between devices before media flows. A media server processes and routes the actual audio and video streams once the call is underway, handling group sessions, recording, and transcription. Both are required for production video infrastructure; they handle different stages of the same call.

What is a TURN server and why is it needed?

TURN (Traversal Using Relays around NAT) servers relay media traffic between devices that can't establish a direct peer-to-peer connection — typically due to firewalls or restrictive network configurations. Without TURN server infrastructure, a meaningful percentage of video calls fail to connect in real-world network environments. TURN servers are a standard component of production WebRTC deployments.

What protocols does real-time communication infrastructure use?

The primary protocols are WebSockets for real-time messaging, WebRTC for audio and video communication, XMPP for messaging and presence, and SRTP/DTLS for media encryption. These open standards handle the fundamental problems of real-time data exchange across networks and form the foundation of most production communication infrastructure.

Does real-time communication infrastructure need to be HIPAA compliant?

In healthcare deployments, yes — and the compliance obligation extends to every component of the infrastructure that touches protected health information, not just the application layer above it. Messages, recordings, file attachments, and audit logs all carry compliance obligations. BAA coverage needs to extend to the infrastructure layer specifically.

Should I build real-time communication infrastructure or use a managed provider?

The decision turns on how much of the infrastructure stack your team wants to own and maintain over the long term. Building from scratch provides maximum control but requires ongoing engineering commitment across every infrastructure layer. Managed infrastructure shifts that burden to the provider while retaining the ability to build the product experience on top. For most teams where communication is a capability rather than the core product differentiator, managed infrastructure is the more practical path.

Communication Tools

Ready Solutions

DEV DOCUMENTATION

DEV RESOURCES

Infrastructure