Ubiquitous worldwide broadband Internet access as well the coming of age of VoIP technology have made Voice-over-IP an increasingly attractive and useful network application. Currently the “human-readable” Session Initiation Protocol (SIP) which is based on a simple HTTP-like request/response exchange is steadily gaining headway against the considerably more complex ASN.1 encoded H.323 Multimedia ITU-T standard introduced by the telecom industry some years ago. Unfortunately little attention has been given to the security aspects involved in running a phone connection over the public Internet. This paper gives a comparative overview over the security mechanisms recommended by the SIP standard and presents a practical SIP implementation realized at the Zürcher Hochschule Winterthur (ZHW), based on S/MIME authentication and encryption of the session initiation and ensuing protection of the media channels using the Secure Real-time Transport Protocol (SRTP). 1 The Session Inititation Protocol (SIP) Due to its simple and fast session setup mechanism, the Session Initiation Protocol (SIP) [Ro02] has quickly made large inroads into the Voice-over-IP (VoIP) market previously dominated by implementations adhering to the rather complex H.323 ITU-T Internet telephony standard. Whereas H.323 is closely modelling a traditional ISDN Layer 3 call setup and uses ASN.1-coded binary messages for signalling, SIP is based on an HTTP-like request/response transaction model using human-readable ASCII messages with a syntax nearly identical to HTTP/1.1 [Fi99]. Figure 10 depicts an example of a SIP INVITE request which includes all necessary information required to set up an audio connection. 1.1 Example SIP Session Figure 1 shows a typical SIP message exchange scenario between two users Alice and Bob belonging to the domains atlanta.com and biloxi.com, respectively. SIP user identification is based on a special type of Uniform Resource Identifier (URI) called a SIP URI with a form similar to an email address. In our example Alice’s SIP URI is assumed to be sip:alice@atlanta.com and Bob’s sip:bob@biloxi.com. In order to establish a multimedia connection over the Internet between Alice’s and Bob’s User Agents (UAs) which can be either hardware SIP phones or PC based softphones, Bob’s SIP URI must first be resolved into the IP address of the UA under which Bob is currently registered. SIP address resolution and routing is usually not done by the UA 398 Andreas Steffen, Daniel Kaufmann und Andreas Stricker Figure 1: Session Initiation between two User Agents itself but delegated to the proxy server responsible for the domain the UA is attached to. In our example the atlanta.com proxy will make a DNS lookup to determine the proxy server of the biloxi.com domain on behalf of Alice’s user agent. The SIP INVITE request originating from Alice’s UA is then forwarded via the atlanta.com proxy to the biloxi.com proxy which with the help of a location service determines the current whereabouts of Bob’s user agent. Both the informational Ringing message and the OK message which is issued when Bob accepts the call, take the return path via the proxy server hops whereas the ACK message and the payload packets of the ensuing multimedia session will use the direct path between the two user agents. 1.2 The SIP Trapezoid Thus the typical message flow during a SIP session takes on the form of a trapezoid as shown in Figure 2. From the point of view of network security this means that both the individual hops must be secured on a hop-by-hop basis as well as the direct path between the user agents. SIP session management messages are usually embedded into UDP datagrams but can also be transported over a TCP stream if the SIP message size comes within the physical medium’s Maximum Transmission Unit (MTU) or if the underlying security mechanism requires a TCP connection. On the other hand the Real-time Transport Protocol (RTP) [Sch03] employed in media sessions exclusively uses non-reliable UDP datagrams to transport real-time audio and video packets over the Internet. This means that any security mechanism employed to encrypt and authenticate multimedia streams must support UDP as a transport protocol. This requirement excludes certain security popular solutions like e.g. TCP based Transport Layer Security (TLS) [DA99]. The following chapter will give an overview on the choice of security mechanisms that can be selected to ensure data integrity and confidentiality for both the SIP based session management and the real-time transmission of multimedia payloads.
[1]
Mark Handley,et al.
SDP: Session Description Protocol
,
1998,
RFC.
[2]
Christopher Allen,et al.
The TLS Protocol Version 1.0
,
1999,
RFC.
[3]
Hugo Krawczyk,et al.
A Security Architecture for the Internet Protocol
,
1999,
IBM Syst. J..
[4]
Henning Schulzrinne,et al.
RTP Profile for Audio and Video Conferences with Minimal Control
,
2003,
RFC.
[5]
R. Housley.
Cryptographic Message Syntax
,
1999,
RFC.
[6]
Henning Schulzrinne,et al.
RTP: A Transport Protocol for Real-Time Applications
,
1996,
RFC.
[7]
Michael Elkins,et al.
MIME Security with Pretty Good Privacy (PGP)
,
1996,
RFC.
[8]
Roy T. Fielding,et al.
Hypertext Transfer Protocol - HTTP/1.1
,
1997,
RFC.
[9]
Dan Harkins,et al.
The Internet Key Exchange (IKE)
,
1998,
RFC.
[10]
Burton S. Kaliski,et al.
PKCS #7: Cryptographic Message Syntax Version 1.5
,
1998,
RFC.
[11]
Blake Ramsdell,et al.
S/MIME Version 3 Message Specification
,
1999,
RFC.
[12]
Mats Näslund,et al.
The Secure Real-time Transport Protocol (SRTP)
,
2004,
RFC.
[13]
Sandy Murphy,et al.
Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
,
1995,
RFC.
[14]
G. G. Stokes.
"J."
,
1890,
The New Yale Book of Quotations.
[15]
Hugo Krawczyk,et al.
HMAC: Keyed-Hashing for Message Authentication
,
1997,
RFC.
[16]
Lawrence C. Stewart,et al.
HTTP Authentication: Basic and Digest Access Authentication
,
1999
.