Best Practices for Securing RTP Media Signaled with SIP

Best Practices for Securing RTP Media Signaled with SIP Neustar, Inc.

jon.peterson@team.neustar

Cisco

rlb@ipv.sx

Vigil Security, LLC

housley@vigilsec.com

SIP RTP security Although the Session Initiation Protocol (SIP) includes a suite of security services that has been expanded by numerous specifications over the years, there is no single place that explains how to use SIP to establish confidential media sessions. Additionally, existing mechanisms have some feature gaps that need to be identified and resolved in order for them to address the pervasive monitoring threat model. This specification describes best practices for negotiating confidential media with SIP, including a comprehensive protection solution that binds the media layer to SIP layer identities.

Introduction The Session Initiation Protocol (SIP) includes a suite of security services, including Digest Authentication for authenticating entities with a shared secret, TLS for transport security, and (optionally) S/MIME for body security. SIP is frequently used to establish media sessions -- in particular, audio or audiovisual sessions, which have their own security mechanisms available, such as the Secure Real-time Transport Protocol (SRTP). However, the practices needed to bind security at the media layer to security at the SIP layer, to provide an assurance that protection is in place all the way up the stack, rely on a great many external security mechanisms and practices. This document provides documentation to explain their optimal use as a best practice. Revelations about widespread pervasive monitoring of the Internet have led to a greater desire to protect Internet communications . In order to maximize the use of security features, especially of media confidentiality, opportunistic measures serve as a stopgap when a full suite of services cannot be negotiated all the way up the stack. Opportunistic media security for SIP is described in , which builds on the prior efforts of . With opportunistic encryption, there is an attempt to negotiate the use of encryption, but if the negotiation fails, then cleartext is used. Opportunistic encryption approaches typically have no integrity protection for the keying material. This document contains the SIP Best-practice Recommendations Against Network Dangers to privacY (SIPBRANDY) profile of Secure Telephone Identity Revisited (STIR) for media confidentiality, providing a comprehensive security solution for SIP media that includes integrity protection for keying material and offers application-layer assurance that media confidentiality is in place. Various specifications that User Agents (UAs) must implement to support media confidentiality are given in the sections below; a summary of the best current practices appears in .

Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Security at the SIP and SDP Layer There are two approaches to providing confidentiality for media sessions set up with SIP: comprehensive protection and opportunistic security (as defined in ). This document only addresses comprehensive protection. Comprehensive protection for media sessions established by SIP requires the interaction of three protocols: the Session Initiation Protocol (SIP), the Session Description Protocol (SDP), and the Real-time Transport Protocol (RTP) -- in particular, its secure profile SRTP. Broadly, it is the responsibility of SIP to provide integrity protection for the media keying attributes conveyed by SDP, and those attributes will in turn identify the keys used by endpoints in the RTP media session(s) that SDP negotiates. Note that this framework does not apply to keys that also require confidentiality protection in the signaling layer, such as the SDP "k=" line, which MUST NOT be used in conjunction with this profile. In that way, once SIP and SDP have exchanged the necessary information to initiate a session, media endpoints will have a strong assurance that the keys they exchange have not been tampered with by third parties and that end-to-end confidentiality is available. To establish the identity of the endpoints of a SIP session, this specification uses STIR. The STIR Identity header has been designed to prevent a class of impersonation attacks that are commonly used in robocalling, voicemail hacking, and related threats. STIR generates a signature over certain features of SIP requests, including header field values that contain an identity for the originator of the request, such as the From header field or P&nbhy;Asserted-Identity field, and also over the media keys in SDP if they are present. As currently defined, STIR provides a signature over the "a=fingerprint" attribute, which is a fingerprint of the key used by DTLS-SRTP; consequently, STIR only offers comprehensive protection for SIP sessions in concert with SDP and SRTP when DTLS-SRTP is the media security service. The underlying Personal Assertion Token (PASSporT) object used by STIR is extensible, however, and it would be possible to provide signatures over other SDP attributes that contain alternate keying material. A profile for using STIR to provide media confidentiality is given in .

STIR Profile for Endpoint Authentication and Verification Services STIR defines the Identity header field for SIP, which provides a cryptographic attestation of the source of communications. This document includes a profile of STIR, called the SIPBRANDY profile, where the STIR verification service will act in concert with an SRTP media endpoint to ensure that the key fingerprints, as given in SDP, match the keys exchanged to establish DTLS-SRTP. To satisfy this condition, the verification service function would in this case be implemented in the SIP User Agent Server (UAS), which would be composed with the media endpoint. If the STIR authentication service or verification service functions are implemented at an intermediary rather than an endpoint, this introduces the possibility that the intermediary could act as a man in the middle, altering key fingerprints. As this attack is not in STIR's core threat model, which focuses on impersonation rather than man-in-the-middle attacks, STIR offers no specific protections against such interference. The SIPBRANDY profile for media confidentiality thus shifts these responsibilities to the endpoints rather than the intermediaries. While intermediaries MAY provide the verification service function of STIR for SIPBRANDY transactions, the verification needs to be repeated at the endpoint to obtain end-to-end assurance. Intermediaries supporting this specification MUST NOT block or otherwise redirect calls if they do not trust the signing credential. The SIPBRANDY profile is based on an end-to-end trust model, so it is up to the endpoints to determine if they support signing credentials, not intermediaries. In order to be compliant with best practices for SIP media confidentiality with comprehensive protection, UA implementations MUST implement both the authentication service and verification service roles described in . STIR authentication services MUST signal their compliance with this specification by including the "msec" claim defined in this specification to the PASSporT payload. Implementations MUST provide key fingerprints in SDP and the appropriate signatures over them as specified in . When generating either an offer or an answer , compliant implementations MUST include an "a=fingerprint" attribute containing the fingerprint of an appropriate key (see ).

Credentials In order to implement the authentication service function in the UA, SIP endpoints will need to acquire the credentials needed to sign for their own identity. That identity is typically carried in the From header field of a SIP request and contains either a greenfield SIP URI (e.g., "sip:alice@example.com") or a telephone number (which can appear in a variety of ways, e.g., "sip:+17004561212@example.com;user=phone"). contains guidance for separating the two and determining what sort of credential is needed to sign for each. To date, few commercial certification authorities (CAs) issue certificates for SIP URIs or telephone numbers; though work is ongoing on systems for this purpose (such as ), it is not yet mature enough to be recommended as a best practice. This is one reason why STIR permits intermediaries to act as an authentication service on behalf of an entire domain, just as in SIP a proxy server can provide domain-level SIP service. While CAs that offer proof-of-possession certificates similar to those used for email could be offered for SIP -- for either greenfield identifiers or telephone numbers -- this specification does not require their use. For users who do not possess such certificates, DTLS-SRTP permits the use of self-signed public keys. The profile of STIR in this document, called the SIPBRANDY profile, employs the more relaxed authority requirements of to allow the use of self-signed public keys for authentication services that are composed with UAs, by generating a certificate (per the guidance in ) with a subject corresponding to the user's identity. To obtain comprehensive protection with a self-signed certificate, some out-of-band verification is needed as well. Such a credential could be used for trust on first use (see ) by relying parties. Note that relying parties SHOULD NOT use certificate revocation mechanisms or real-time certificate verification systems for self-signed certificates, as they will not increase confidence in the certificate. Users who wish to remain anonymous can instead generate self-signed certificates as described in . Generally speaking, without access to out-of-band information about which certificates were issued to whom, it will be very difficult for relying parties to ascertain whether or not the signer of a SIP request is genuinely an "endpoint". Even the term "endpoint" is a problematic one, as SIP UAs can be composed in a variety of architectures and may not be devices under direct user control. While it is possible that techniques based on certificate transparency or similar practices could help UAs to recognize one another's certificates, those operational systems will need to ramp up with the CAs that issue credentials to end-user devices going forward.

Anonymous Communications In some cases, the identity of the initiator of a SIP session may be withheld due to user or provider policy. Following the recommendations of , this may involve using an identity such as "anonymous@anonymous.invalid" in the identity fields of a SIP request. does not currently permit authentication services to sign for requests that supply this identity. It does, however, permit signing for valid domains, such as "anonymous@example.com", as a way of implementing an anonymization service as specified in . Even for anonymous sessions, providing media confidentiality and partial SDP integrity is still desirable. One-time self-signed certificates for anonymous communications SHOULD include a subjectAltName of "sip:anonymous@anonymous.invalid". After a session is terminated, the certificate SHOULD be discarded, and a new one, with fresh keying material, SHOULD be generated before each future anonymous call. As with self-signed certificates, relying parties SHOULD NOT use certificate revocation mechanisms or real-time certificate verification systems for anonymous certificates, as they will not increase confidence in the certificate. Note that when using one-time anonymous self-signed certificates, any man in the middle could strip the Identity header and replace it with one signed by its own one-time certificate, changing the "mky" parameters of PASSporT and any "a=fingerprint" attributes in SDP as it chooses. This signature only provides protection against non&nbhy;Identity-aware entities that might modify SDP without altering the PASSporT conveyed in the Identity header.

Connected Identity Usage STIR provides integrity protection for the fingerprint attributes in SIP request bodies but not SIP responses. When a session is established, therefore, any SDP body carried by a 200&nbhy;class response in the backwards direction will not be protected by an authentication service and cannot be verified. Thus, sending a secured SDP body in the backwards direction will require an extra RTT, typically a request sent in the backwards direction. explored the problem of providing "connected identity" to implementations of (which is obsoleted by ); uses a provisional or mid-dialog UPDATE request in the backwards (reverse) direction to convey an Identity header field for the recipient of an INVITE. The procedures in are largely compatible with the revision of the Identity header in . However, the following need to be considered:

The UPDATE carrying signed SDP with a fingerprint in the backwards direction needs to be sent during dialog establishment, following the receipt of a Provisional Response Acknowledgement (PRACK) after a provisional 1xx response.
For use with this SIPBRANDY profile for media confidentiality, the UAS that responds to the INVITE request needs to act as an authentication service for the UPDATE sent in the backwards direction.
Per the text in regarding the receipt at a User Agent Client (UAC) of error code 428, 436, 437, or 438 in response to a mid-dialog request, it is RECOMMENDED that the dialog be treated as terminated. However, allows the retransmission of requests with repairable error conditions. In particular, an authentication service might retry a mid-dialog rather than treating the dialog as terminated, although only one such retry is permitted.
Note that the examples in are based on and will not match signatures using .

Future work may be done to revise for STIR; that work should take into account any impacts on the SIPBRANDY profile described in this document. The use of has some further interactions with Interactive Connectivity Establishment (ICE) ; see .

Authorization Decisions grants STIR verification services a great deal of latitude when making authorization decisions based on the presence of the Identity header field. It is largely a matter of local policy whether an endpoint rejects a call based on the absence of an Identity header field, or even the presence of a header that fails an integrity check against the request. For this SIPBRANDY profile of STIR, however, a compliant verification service that receives a dialog-forming SIP request containing an Identity header with a PASSporT type of "msec", after validating the request per the steps described in , MUST reject the request if there is any failure in that validation process with the appropriate status code per . If the request is valid, then if a terminating user accepts the request, it MUST then follow the steps in to act as an authentication service and send a signed request with the "msec" PASSporT type in its Identity header as well, in order to enable end&nbhy;to-end bidirectional confidentiality. For the purposes of this profile, the "msec" PASSporT type can be used by authentication services in one of two ways: as a mandatory request for media security or as a merely opportunistic request for media security. As any verification service that receives an Identity header field in a SIP request with an unrecognized PASSporT type will simply ignore that Identity header, an authentication service will know whether or not the terminating side supports "msec" based on whether or not its UA receives a signed request in the backwards direction per . If no such requests are received, the UA may do one of two things: shut down the dialog, if the policy of the UA requires that "msec" be supported by the terminating side for this dialog; or, if policy permits (e.g., an explicit acceptance by the user), allow the dialog to continue without media security.

Media Security Protocols As there are several ways to negotiate media security with SDP, any of which might be used with either opportunistic or comprehensive protection, further guidance to implementers is needed. In , opportunistic approaches considered include DTLS-SRTP, security descriptions, and ZRTP. Support for DTLS-SRTP is REQUIRED by this specification. The "mky" claim of PASSporT provides integrity protection for "a=fingerprint" attributes in SDP, including cases where multiple "a=fingerprint" attributes appear in the same SDP.

Relayed Media and Conferencing Providing end-to-end media confidentiality for SIP is complicated by the presence of many forms of media relays. While many media relays merely proxy media to a destination, others present themselves as media endpoints and terminate security associations before re&nbhy;originating media to its destination. Centralized conference bridges are one type of entity that typically terminates a media session in order to mux media from multiple sources and then to re-originate the muxed media to conference participants. In many such implementations, only hop-by-hop media confidentiality is possible. Work is ongoing to specify a means to encrypt both (1) the hop-by-hop media between a UA and a centralized server and (2) the end-to-end media between UAs, but it is not sufficiently mature at this time to become a best practice. Those protocols are expected to identify their own best-practice recommendations as they mature. Another class of entities that might relay SIP media are Back-to-Back User Agents (B2BUAs). If a B2BUA follows the guidance in , it may be possible for B2BUAs to act as media relays while still permitting end-to-end confidentiality between UAs. Ultimately, if an endpoint can decrypt media it receives, then that endpoint can forward the decrypted media without the knowledge or consent of the media's originator. No media confidentiality mechanism can protect against these sorts of relayed disclosures or against a legitimate endpoint that can legitimately decrypt media and record a copy to be sent elsewhere (see ).

ICE and Connected Identity Providing confidentiality for media with comprehensive protection requires careful timing of when media streams should be sent and when a user interface should signify that confidentiality is in place. In order to best enable end-to-end connectivity between UAs and to avoid media relays as much as possible, implementations of this specification MUST support ICE . To speed up call establishment, it is RECOMMENDED that implementations support Trickle ICE . Note that in the comprehensive protection case, the use of connected identity with ICE implies that the answer containing the key fingerprints, and thus the STIR signature, will come in an UPDATE sent in the backwards direction, a provisional response, and a PRACK, rather than in any earlier SDP body. Only at such a time as that UPDATE is received will the media keys be considered exchanged in this case. Similarly, in order to prevent, or at least mitigate, the denial-of-service attack described in , this specification incorporates best practices for ensuring that recipients of media flows have consented to receive such flows. Implementations of this specification MUST implement the Session Traversal Utilities for NAT (STUN) usage for consent freshness defined in .

Best Current Practices The following are the best practices for SIP UAs to provide media confidentiality for SIP sessions.

Implementations MUST support the SIPBRANDY profile as defined in and signal such support in PASSporT via the "msec" header element.
Implementations MUST follow the authorization decision behavior described in .
Implementations MUST support DTLS-SRTP for management of keys, as described in .
Implementations MUST support ICE and the STUN consent freshness mechanism, as specified in .

IANA Considerations This specification defines a new value for the "Personal Assertion Token (PASSporT) Extensions" registry called "msec". IANA has added the entry to the registry with a value pointing to this document.

Security Considerations This document describes the security features that provide media sessions established with SIP with confidentiality, integrity, and authentication.

References Normative References Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol Session Description Protocol (SDP) Offer/Answer Procedures for Interactive Connectivity Establishment (ICE) A Session Initiation Protocol (SIP) Usage for Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (Trickle ICE) Informative References

Acknowledgements We thank , , , and for contributions to this problem statement and framework. We thank and for their careful review.