X
, there are types, functions, macros and header class as follows:
sip_X_t
is the structure used to store parsed header,SIP_X_INIT()
initializes a static instance of sip_X_t,sip_X_init()
initializes a dynamic instance of sip_X_t,sip_is_X()
tests if header object is instance of header X,sip_X_make()
creates a header X object by decoding given string,sip_X_format()
creates a header X object by decoding given printf() list,sip_X_dup()
duplicates (deeply copies) the header X,sip_X_copy()
copies the header X,sip_hclass_t
sip_X_class
[] contains the header class for header X.In addition to this interface, the SIP parser documentation contains description of the functionality required when a parser is extended by a new header. It is possible to add new headers to the SIP parser or extend the definition of existing ones.
In the case of SIP such a parser is very efficient. The parser can choose between different forms based on each token, as SIP syntax is carefully designed so that it requires only minimal scan-ahead. It is also easy to extend a recursive-descent parser via a standard API, unlike, for instance, a LALR parser generated by Bison.
The abstract message module msg contains a high-level parser engine that drives the parsing process and invokes the SIP parser for each header. As there are no framing between SIP messages, the parser considers any received data, be it a UDP datagram or a TCP stream, as a message stream, which may consist of one or more SIP messages. The parser works by first separating stream into fragments, then building a complete message based on parsing result. After a message is completed, it can be given to the message stream customer (typically a protocol state machine). The parser continues processing the stream and feeding the messages to protocol engine until the end of the stream is reached.
For each message, the parser starts by separating the first fragment, which is either a request or status line. After the first line has been processed, the parser engine continues by separating the headers one-by-one from the message. After the parser encounters an empty line separating the headers and the message body (payload), it invokes a function parsing the separator and payload fragment(s). When the message is complete, the parser can hand the message over to the protocol engine. Then it is ready to start again with first fragment of the next message.
Separating byte stream to messages
When the parsing process has completed, the request or status line, each header, separator and the payload are all in their own fragment structure. The fragments form a dual-linked list known as fragment chain as shown in the above figure. The buffers for the message, the fragment chain, and a whole other stuff is held by the generic message type, msg_t, defined in <msg.h>. The internal structure of msg_t is known only within msg module and it is hidden from other modules.
The abstract message module msg also drives the reverse process, invoking the encoding method of each fragment so that the whole outgoing SIP message is encoded properly.
The parser passes the fragment contents to a parsing function immediately after it has separated a fragment from the message. The parsing function is defined by the header class. The header class is either determined by the fragment position (first line, separator line or payload), or it is found from the hash table using the header name as key. There is also a special header class for unknown headers, headers with a name that is not regocnized by the parser.
For instance, the From header has following syntax:
from = ("From" | "f") ":" ( name-addr | addr-spec ) *( ";" addr-params ) name-addr = [ display-name ] "<" addr-spec ">" addr-spec = SIP-URL | URI display-name = *token | quoted-string addr-params = *( tag-param | generic-param ) tag-param = "tag" "=" ( token | quoted-string )
When a From header is parsed, the header parser function sip_from_d() separates the display-name, addr-spec and each parameter in the addr-params list. The parsing result is assigned to a sip_from_t structure, which is defined as follows:
typedef struct sip_addr_s { sip_common_t a_common[1]; sip_unknown_t *a_next; char const *a_display; url_t a_url[1]; sip_param_t const *a_params; char const *a_tag; } sip_from_t;
The string containing the display-name is put into the a_display
field, the URL contents can be found in the a_url
field, and the list of addr-params parameters is put in the a_params
array. If there is a tag-param present, a pointer to the parameter value is assigned to a_tag
field.
In other words, a single message is represented by two types, first type (msg_t) is private to the msg module and inaccessable by an application programmer, second (sip_t) is a public structure.
The sip_t structure is defined as follows:
typedef struct sip_s { msg_common_t sip_common[1]; // Used with recursive inclusion msg_pub_t *sip_next; // Ditto void *sip_user; // Application data unsigned sip_size; int sip_flags; sip_error_t *sip_error; // Erroneous headers sip_request_t *sip_request; // Request line sip_status_t *sip_status; // Status line sip_via_t *sip_via; // Via (v) sip_route_t *sip_route; // Route sip_record_route_t *sip_record_route; // Record-Route sip_max_forwards_t *sip_max_forwards; // Max-Forwards ... } sip_t;
As you can see above, the public sip_t structure contains the common header members that are also found in the beginning of a header structure. The sip_size indicates the size of the structure - the application can extend the parser and sip_t structure beyond the original size. The sip_flags contains various flags used during the parsing and printing process. They are documented in the <msg.h>. These boilerplate members are followed by the pointers to various message elements and headers.
BYE sip:joe@example.com SIP/2.0 Via: SIP/2.0/UDP sip.example.edu;branch=d7f2e89c.74a72681 Via: SIP/2.0/UDP pc104.example.edu:1030;maddr=110.213.33.19 From: Bobby Brown <sip:bb@example-email.address.hidden>;tag=77241a86 To: Joe User <sip:joe@example-email.address.hidden>;tag=7c6276c1 Call-ID: 4c4e911b@pc104.example.edu CSeq: 2
The figure below shows the layout of the BYE message above after parsing:
BYE message and its representation in C
The leftmost box represents the message of type msg_t. Next box from the left reprents the sip_t structure, which contains pointers to a header objects. The next column contains the header objects. There is one header object for each message fragment. The rightmost box represents the I/O buffer used when the message was received. Note that the I/O buffer may be non-continous and composed of many separate memory areas.
The message object has link to the public message structure (m_object), to the dual-linked fragment chain (m_frags) and to the I/O buffer (m_buffer). The public message structure contains pointers to the headers according to their type. If there are multiple headers of the same type (like there are two Via headers in the above message), the headers are put into a single-linked list.
Each fragment has pointers to successing and preceding fragment. It also contains pointer to the corresponding data within the I/O buffer and its length.
The main purpose of the fragment chain is to preserve the original order of the headers. If there were an third Via header after CSeq in the message, the fragment representing it would be after the CSeq header in the fragment chain but after second Via in the header list.