Sofia URL Module

1.12.1

Module Meta Information

The Sofia url module contains macros and functions for using URL datatype, parsing and printing URLs.

Contact:
Pekka Pessi <Pekka.Pessi@nokia-email.address.hidden>
Status:
Core library
License:
LGPL

Using URL Library

The URL library provides URL datatype and helper functions related to it. There is URL parser, which separates the URL components to the url_t structure.

Note:
Please note that we use terms URL and URI interchangeable.
The formal URI syntax is defined in the RFC2396.

The URLs consist of a subset of printable ASCII (ECMA-5) characters. The subset excludes space and characters commonly used as delimiters in text-based protocols, such as < > # % and " (double quote), and so called unwise characters whose positions are reserved for national extensions in ECMA-5. In US-ASCII, those characters are: { } | \ ^ [ ] `

There are also nine characters that can have special syntactic meaning in some parts of the URI. These reserved characters are used to separate syntactical parts of the URLs from each other. The reserved characters are as follows: : @ / ; ? & = + and $.

The URL library understands two alternative URL syntaxes. First, the basic syntax used by, e.g., ftp:, http: and rtsp: URLs:

scheme ":" ["//" [ user [":" password ] "@"] host [":" port ] ] ["/" path ] ["?" query ] ["#" fragment ]

Alternatively, the syntax used by mailto:, sip:, im:, tel, and pres: URLs:

scheme ":" [ [ user [":" password ] "@"] host [":" port ] ] [";" params ] ["?" query ] ["#" fragment ]

Note that also "*" is considered to be a valid URL (with type url_any).

For example:

http://example.org:7100/cgi-bin/query?key=90786
ftp://user:pass\@ftp.example.com/pub/
sip:user:pass\@example.com;user=ip
tel:+358718008000

Converting a String to url_t

The decoding function url_d() takes a string and splits it into parts as shown above. The substrings are stored into the url_t structure. When decoding, the hex encoding using % is removed if the encoded character can syntactically be part of the field. For instance, "%41" is decoded as "A" in the user part, but "%40" (@) is left as is. (This is called canonization of the URL fields.)

For example, when we parse the url below

sip:joe%2Euser@example%2Ecom;method=%4D%45%53%53%41%47%45?body=CANNED%20MSG
the components are NUL-terminated, canonized and assigned to the structure as follows:
 url_type = url_sip
 url_root = 0 
 url_scheme = "sip"
 url_user = "joe.user"
 url_password = NULL
 url_host = "example.com"
 url_port = NULL
 url_path = NULL
 url_params = "method=MESSAGE"
 url_headers = "body=CANNED%20MSG"
 url_fragment = NULL

Other functions parsing URLs are as follows:

Converting a String to url_t

The url_e() encodes the url, in other words, it joins the substrings in url_t to the provided buffer.

Functions and Macros in URL Module

The url parsing, printing, copying and access functions are defined in the url.h include file:

In addition to the basic URL structure, url_t, the library interface provides an union type url_string_t for passing unparsed strings instead of parsed URLs as function arguments:

For printf()-style formatting, macros URL_PRINT_FORMAT and URL_PRINT_ARGS() are provided.


Sofia-SIP 1.12.1 - Copyright (C) 2006 Nokia Corporation. All rights reserved. Licensed under the terms of the GNU Lesser General Public License.