The URLs consist of a subset of printable ASCII (ECMA-5) characters. The subset excludes space and characters commonly used as delimiters in text-based protocols, such as < > # % and " (double quote), and so called unwise characters whose positions are reserved for national extensions in ECMA-5. In US-ASCII, those characters are: { } | \ ^ [ ] `
There are also nine characters that can have special syntactic meaning in some parts of the URI. These reserved characters are used to separate syntactical parts of the URLs from each other. The reserved characters are as follows: : @ / ; ? & = + and $.
The URL library understands two alternative URL syntaxes. First, the basic syntax used by, e.g., ftp:, http: and rtsp: URLs:
scheme ":" ["//" [ user [":" password ] "@"] host [":" port ] ] ["/" path ] ["?" query ] ["#" fragment ]
Alternatively, the syntax used by mailto:, sip:, im:, tel, and pres: URLs:
scheme ":" [ [ user [":" password ] "@"] host [":" port ] ] [";" params ] ["?" query ] ["#" fragment ]
Note that also "*" is considered to be a valid URL (with type url_any).
For example:
http://example.org:7100/cgi-bin/query?key=90786 ftp://user:pass\@ftp.example.com/pub/ sip:user:pass\@example.com;user=ip tel:+358718008000
For example, when we parse the url below
sip:joe%2Euser@example%2Ecom;method=%4D%45%53%53%41%47%45?body=CANNED%20MSG
url_type = url_sip url_root = 0 url_scheme = "sip" url_user = "joe.user" url_password = NULL url_host = "example.com" url_port = NULL url_path = NULL url_params = "method=MESSAGE" url_headers = "body=CANNED%20MSG" url_fragment = NULL
Other functions parsing URLs are as follows:
In addition to the basic URL structure, url_t, the library interface provides an union type url_string_t for passing unparsed strings instead of parsed URLs as function arguments:
For printf()-style formatting, macros URL_PRINT_FORMAT and URL_PRINT_ARGS() are provided.