|
Text.ParserCombinators.Parsec.Rfc2822 | Portability | portable | Stability | provisional | Maintainer | simons@cryp.to |
|
|
|
|
|
Description |
This module provides parsers for the grammar defined in
RFC2822, "Internet Message Format",
http://www.faqs.org/rfcs/rfc2822.html.
Please note: The module is not particularly well tested.
|
|
Synopsis |
|
maybeOption :: GenParser tok st a -> GenParser tok st (Maybe a) | | unfold :: CharParser a b -> CharParser a b | | header :: String -> CharParser a b -> CharParser a b | | obs_header :: String -> CharParser a b -> CharParser a b | | no_ws_ctl :: CharParser a Char | | text :: CharParser a Char | | specials :: CharParser a Char | | quoted_pair :: CharParser a String | | fws :: CharParser a String | | ctext :: CharParser a Char | | comment :: CharParser a String | | cfws :: CharParser a String | | atext :: CharParser a Char | | atom :: CharParser a String | | dot_atom :: CharParser a String | | dot_atom_text :: CharParser a String | | qtext :: CharParser a Char | | qcontent :: CharParser a String | | quoted_string :: CharParser a String | | word :: CharParser a String | | phrase :: CharParser a [String] | | utext :: CharParser a Char | | unstructured :: CharParser a String | | date_time :: CharParser a CalendarTime | | day_of_week :: CharParser a Day | | day_name :: CharParser a Day | | date :: CharParser a (Int, Month, Int) | | year :: CharParser a Int | | month :: CharParser a Month | | month_name :: CharParser a Month | | day :: CharParser a Int | | time :: CharParser a (TimeDiff, Int) | | time_of_day :: CharParser a TimeDiff | | hour :: CharParser a Int | | minute :: CharParser a Int | | second :: CharParser a Int | | zone :: CharParser a Int | | data NameAddr = NameAddr {} | | address :: CharParser a [NameAddr] | | mailbox :: CharParser a NameAddr | | name_addr :: CharParser a NameAddr | | angle_addr :: CharParser a String | | group :: CharParser a [NameAddr] | | display_name :: CharParser a String | | mailbox_list :: CharParser a [NameAddr] | | address_list :: CharParser a [NameAddr] | | addr_spec :: CharParser a String | | local_part :: CharParser a String | | domain :: CharParser a String | | domain_literal :: CharParser a String | | dcontent :: CharParser a String | | dtext :: CharParser a Char | | data GenericMessage a = Message [Field] a | | type Message = GenericMessage String | | message :: CharParser a Message | | body :: CharParser a String | | | | fields :: CharParser a [Field] | | orig_date :: CharParser a CalendarTime | | from :: CharParser a [NameAddr] | | sender :: CharParser a NameAddr | | reply_to :: CharParser a [NameAddr] | | to :: CharParser a [NameAddr] | | cc :: CharParser a [NameAddr] | | bcc :: CharParser a [NameAddr] | | message_id :: CharParser a String | | in_reply_to :: CharParser a [String] | | references :: CharParser a [String] | | msg_id :: CharParser a String | | id_left :: CharParser a String | | id_right :: CharParser a String | | no_fold_quote :: CharParser a String | | no_fold_literal :: CharParser a String | | subject :: CharParser a String | | comments :: CharParser a String | | keywords :: CharParser a [[String]] | | resent_date :: CharParser a CalendarTime | | resent_from :: CharParser a [NameAddr] | | resent_sender :: CharParser a NameAddr | | resent_to :: CharParser a [NameAddr] | | resent_cc :: CharParser a [NameAddr] | | resent_bcc :: CharParser a [NameAddr] | | resent_msg_id :: CharParser a String | | return_path :: CharParser a String | | path :: CharParser a String | | received :: CharParser a ([(String, String)], CalendarTime) | | name_val_list :: CharParser a [(String, String)] | | name_val_pair :: CharParser a (String, String) | | item_name :: CharParser a String | | item_value :: CharParser a String | | optional_field :: CharParser a (String, String) | | field_name :: CharParser a String | | ftext :: CharParser a Char | | obs_qp :: CharParser a String | | obs_text :: CharParser a String | | obs_char :: CharParser a Char | | obs_utext :: CharParser a String | | obs_phrase :: CharParser a [String] | | obs_phrase_list :: CharParser a [String] | | obs_fws :: CharParser a String | | obs_day_of_week :: CharParser a Day | | obs_year :: CharParser a Int | | obs_month :: CharParser a Month | | obs_day :: CharParser a Int | | obs_hour :: CharParser a Int | | obs_minute :: CharParser a Int | | obs_second :: CharParser a Int | | obs_zone :: CharParser a Int | | obs_angle_addr :: CharParser a String | | obs_route :: CharParser a [String] | | obs_domain_list :: CharParser a [String] | | obs_local_part :: CharParser a String | | obs_domain :: CharParser a String | | obs_mbox_list :: CharParser a [NameAddr] | | obs_addr_list :: CharParser a [NameAddr] | | obs_fields :: GenParser Char a [Field] | | obs_orig_date :: CharParser a CalendarTime | | obs_from :: CharParser a [NameAddr] | | obs_sender :: CharParser a NameAddr | | obs_reply_to :: CharParser a [NameAddr] | | obs_to :: CharParser a [NameAddr] | | obs_cc :: CharParser a [NameAddr] | | obs_bcc :: CharParser a [NameAddr] | | obs_message_id :: CharParser a String | | obs_in_reply_to :: CharParser a [String] | | obs_references :: CharParser a [String] | | obs_id_left :: CharParser a String | | obs_id_right :: CharParser a String | | obs_subject :: CharParser a String | | obs_comments :: CharParser a String | | obs_keywords :: CharParser a [String] | | obs_resent_from :: CharParser a [NameAddr] | | obs_resent_send :: CharParser a NameAddr | | obs_resent_date :: CharParser a CalendarTime | | obs_resent_to :: CharParser a [NameAddr] | | obs_resent_cc :: CharParser a [NameAddr] | | obs_resent_bcc :: CharParser a [NameAddr] | | obs_resent_mid :: CharParser a String | | obs_resent_reply :: CharParser a [NameAddr] | | obs_return :: CharParser a [Char] | | obs_received :: CharParser a [(String, String)] | | obs_path :: CharParser a String | | obs_optional :: CharParser a (String, String) |
|
|
|
Useful parser combinators
|
|
|
Return Nothing if the given parser doesn't match. This
combinator is included in the latest parsec distribution as
optionMaybe, but ghc-6.6.1 apparently doesn't have it.
|
|
|
unfold = between (optional cfws) (optional cfws)
|
|
|
Construct a parser for a message header line from the
header's name and a parser for the body.
|
|
|
Like header, but allows the obsolete white-space rules.
|
|
Primitive Tokens (section 3.2.1)
|
|
|
Match any US-ASCII non-whitespace control character.
|
|
|
Match any US-ASCII character except for r, n.
|
|
|
Match any of the RFC's "special" characters: ()<>[]:;@,.\".
|
|
Quoted characters (section 3.2.2)
|
|
|
Match a "quoted pair". All characters matched by text may be
quoted. Note that the parsers returns both characters, the
backslash and the actual content.
|
|
Folding white space and comments (section 3.2.3)
|
|
|
Match "folding whitespace". That is any combination of wsp and
crlf followed by wsp.
|
|
|
Match any non-whitespace, non-control character except for "(",
")", and "\". This is used to describe the legal content of
comments.
Note: This parser accepts 8-bit characters, even though this is
not legal according to the RFC. Unfortunately, 8-bit content in
comments has become fairly common in the real world, so we'll just
accept the fact.
|
|
|
Match a "comments". That is any combination of ctext,
quoted_pairs, and fws between brackets. Comments may nest.
|
|
|
Match any combination of fws and comments.
|
|
Atom (section 3.2.4)
|
|
|
Match any US-ASCII character except for control characters,
specials, or space. atom and dot_atom are made up of this.
|
|
|
Match one or more atext characters and skip any preceeding or
trailing cfws.
|
|
|
Match dot_atom_text and skip any preceeding or trailing cfws.
|
|
|
Match two or more atexts interspersed by dots.
|
|
Quoted strings (section 3.2.5)
|
|
|
Match any non-whitespace, non-control US-ASCII character except
for "\" and """.
|
|
|
Match either qtext or quoted_pair.
|
|
|
Match any number of qcontent between double quotes. Any cfws
preceeding or following the "atom" is skipped automatically.
|
|
Miscellaneous tokens (section 3.2.6)
|
|
|
Match either atom or quoted_string.
|
|
|
Match either one or more words or an obs_phrase.
|
|
|
Match any non-whitespace, non-control US-ASCII character except
for "\" and """.
|
|
|
Match any number of utext tokens.
"Unstructured text" is used in free text fields such as subject.
Please note that any comments or whitespace that prefaces or
follows the actual utext is included in the returned string.
|
|
Date and Time Specification (section 3.3)
|
|
|
Parse a date and time specification of the form
Thu, 19 Dec 2002 20:35:46 +0200
where the weekday specification "Thu," is optional. The parser
returns a CalendarTime, which is set to the appropriate values.
Note, though, that not all fields of CalendarTime will
necessarily be set correctly! Obviously, when no weekday has been
provided, the parser will set this field to Monday - regardless
of whether the day actually is a monday or not. Similarly, the day
of the year will always be returned as 0. The timezone name will
always be empty: "".
Nor will the date_time parser perform any consistency checking.
It will accept
40 Apr 2002 13:12 +0100
as a perfectly valid date.
In order to get all fields set to meaningful values, and in order
to verify the date's consistency, you will have to feed it into any
of the conversion routines provided in System.Time, such as
toClockTime. (When doing this, keep in mind that most functions
return local time. This will not necessarily be the time you're
expecting.)
|
|
|
This parser will match a day_name, optionally wrapped in folding
whitespace, or an obs_day_of_week and return it's Day value.
|
|
|
This parser will the abbreviated weekday names ("Mon", "Tue", ...)
and return the appropriate Day value.
|
|
|
This parser will match a date of the form "dd:mm:yyyy" and return
a tripple of the form (Int,Month,Int) - corresponding to
(year,month,day).
|
|
|
This parser will match a four digit number and return it's integer
value. No range checking is performed.
|
|
|
This parser will match a month_name, optionally wrapped in
folding whitespace, or an obs_month and return it's Month
value.
|
|
|
This parser will the abbreviated month names ("Jan", "Feb", ...)
and return the appropriate Month value.
|
|
|
Match either an obs_day, or a one or two digit number and return it.
|
|
|
This parser will match a time_of_day specification followed by a
zone. It returns the tuple (TimeDiff,Int) corresponding to the
return values of either parser.
|
|
|
This parser will match a time-of-day specification of "hh:mm" or
"hh:mm:ss" and return the corrsponding time as a TimeDiff.
|
|
|
This parser will match a two-digit number and return it's integer
value. No range checking is performed.
|
|
|
This parser will match a two-digit number and return it's integer
value. No range checking is performed.
|
|
|
This parser will match a two-digit number and return it's integer
value. No range checking takes place.
|
|
|
This parser will match a timezone specification of the form
"+hhmm" or "-hhmm" and return the zone's offset to UTC in
seconds as an integer. obs_zone is matched as well.
|
|
Address Specification (section 3.4)
|
|
|
A NameAddr is composed of an optional realname a mandatory
e-mail address.
| Constructors | |
|
|
|
Parse a single mailbox or an address group and return the
address(es).
|
|
|
Parse a name_addr or an addr_spec and return the
address.
|
|
|
Parse an angle_addr, optionally prefaced with a display_name,
and return the address.
|
|
|
Parse an angle_addr or an obs_angle_addr and return the address.
|
|
|
Parse a "group" of addresses. That is a display_name, followed
by a colon, optionally followed by a mailbox_list, followed by a
semicolon. The found address(es) are returned - what may be none.
Here is an example:
parse group "" "my group: user1@example.org, user2@example.org;"
This input comes out as:
Right ["user1@example.org","user2@example.org"]
|
|
|
Parse and return a phrase.
|
|
|
Parse a list of mailbox addresses, every two addresses being
separated by a comma, and return the list of found address(es).
|
|
|
Parse a list of address addresses, every two addresses being
separated by a comma, and return the list of found address(es).
|
|
Addr-spec specification (section 3.4.1)
|
|
|
Parse an "address specification". That is a local_part, followed
by an "@" character, followed by a domain. Return the complete
address as String, ignoring any whitespace or any comments.
|
|
|
Parse and return a "local part" of an addr_spec. That is either
a dot_atom or a quoted_string.
|
|
|
Parse and return a "domain part" of an addr_spec. That is either
a dot_atom or a domain_literal.
|
|
|
Parse a "domain literal". That is a "[" character, followed by
any amount of dcontent, followed by a terminating "]"
character. The complete string is returned verbatim.
|
|
|
Parse and return any characters that are legal in a
domain_literal. That is dtext or a quoted_pair.
|
|
|
Parse and return any ASCII characters except "[", "]", and
"\".
|
|
Overall message syntax (section 3.5)
|
|
|
This data type repesents a parsed Internet Message as defined in
this RFC. It consists of an arbitrary number of header lines,
represented in the Field data type, and a message body, which may
be empty.
| Constructors | |
|
|
|
|
|
Parse a complete message as defined by this RFC and it broken down
into the separate header fields and the message body. Header lines,
which contain syntax errors, will not cause the parser to abort.
Rather, these headers will appear as OptionalFields (which are
unparsed) in the resulting Message. A message must be really,
really badly broken for this parser to fail.
This behaviour was chosen because it is impossible to predict what
the user of this module considers to be a fatal error;
traditionally, parsers are very forgiving when it comes to Internet
messages.
If you want to implement a really strict parser, you'll have to put
the appropriate parser together yourself. You'll find that this is
rather easy to do. Refer to the fields parser for further details.
|
|
|
This parser will return a message body as specified by this RFC;
that is basically any number of text characters, which may be
divided into separate lines by crlf.
|
|
Field definitions (section 3.6)
|
|
|
This data type represents any of the header fields defined in this
RFC. Each of the various instances contains with the return value
of the corresponding parser.
| Constructors | |
|
|
|
This parser will parse an arbitrary number of header fields as
defined in this RFC. For each field, an appropriate Field value
is created, all of them making up the Field list that this parser
returns.
If you look at the implementation of this parser, you will find
that it uses Parsec's try modifier around all of the fields.
The idea behind this is that fields, which contain syntax errors,
fall back to the catch-all optional_field. Thus, this parser will
hardly ever return a syntax error -- what conforms with the idea
that any message that can possibly be accepted should be.
|
|
The origination date field (section 3.6.1)
|
|
|
Parse a "Date:" header line and return the date it contains a
CalendarTime.
|
|
Originator fields (section 3.6.2)
|
|
|
Parse a "From:" header line and return the mailbox_list
address(es) contained in it.
|
|
|
Parse a "Sender:" header line and return the mailbox address
contained in it.
|
|
|
Parse a "Reply-To:" header line and return the address_list
address(es) contained in it.
|
|
Destination address fields (section 3.6.3)
|
|
|
Parse a "To:" header line and return the address_list
address(es) contained in it.
|
|
|
Parse a "Cc:" header line and return the address_list
address(es) contained in it.
|
|
|
Parse a "Bcc:" header line and return the address_list
address(es) contained in it.
|
|
Identification fields (section 3.6.4)
|
|
|
Parse a "Message-Id:" header line and return the msg_id
contained in it.
|
|
|
Parse a "In-Reply-To:" header line and return the list of
msg_ids contained in it.
|
|
|
Parse a "References:" header line and return the list of
msg_ids contained in it.
|
|
|
Parse a "message ID:" and return it. A message ID is almost
identical to an angle_addr, but with stricter rules about folding
and whitespace.
|
|
|
Parse a "left ID" part of a msg_id. This is almost identical to
the local_part of an e-mail address, but with stricter rules
about folding and whitespace.
|
|
|
Parse a "right ID" part of a msg_id. This is almost identical to
the domain of an e-mail address, but with stricter rules about
folding and whitespace.
|
|
|
Parse one or more occurences of qtext or quoted_pair and
return the concatenated string. This makes up the id_left of a
msg_id.
|
|
|
Parse one or more occurences of dtext or quoted_pair and
return the concatenated string. This makes up the id_right of a
msg_id.
|
|
Informational fields (section 3.6.5)
|
|
|
Parse a "Subject:" header line and return it's contents verbatim.
|
|
|
Parse a "Comments:" header line and return it's contents verbatim.
|
|
|
Parse a "Keywords:" header line and return the list of phrases
found. Please not that each phrase is again a list of atoms, as
returned by the phrase parser.
|
|
Resent fields (section 3.6.6)
|
|
|
Parse a "Resent-Date:" header line and return the date it
contains as CalendarTime.
|
|
|
Parse a "Resent-From:" header line and return the mailbox_list
address(es) contained in it.
|
|
|
Parse a "Resent-Sender:" header line and return the mailbox_list
address(es) contained in it.
|
|
|
Parse a "Resent-To:" header line and return the mailbox
address contained in it.
|
|
|
Parse a "Resent-Cc:" header line and return the address_list
address(es) contained in it.
|
|
|
Parse a "Resent-Bcc:" header line and return the address_list
address(es) contained in it. (This list may be empty.)
|
|
|
Parse a "Resent-Message-ID:" header line and return the msg_id
contained in it.
|
|
Trace fields (section 3.6.7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Optional fields (section 3.6.8)
|
|
|
Parse an arbitrary header field and return a tuple containing the
field_name and unstructured text of the header. The name will
not contain the terminating colon.
|
|
|
Parse and return an arbitrary header field name. That is one or
more ftext characters.
|
|
|
Match and return any ASCII character except for control
characters, whitespace, and ":".
|
|
Miscellaneous obsolete tokens (section 4.1)
|
|
|
Match the obsolete "quoted pair" syntax, which - unlike
quoted_pair - allowed any ASCII character to be specified when
quoted. The parser will return both, the backslash and the actual
character.
|
|
|
Match the obsolete "text" syntax, which - unlike text - allowed
"carriage returns" and "linefeeds". This is really weird; you
better consult the RFC for details. The parser will return the
complete string, including those special characters.
|
|
|
Match and return the obsolete "char" syntax, which - unlike
character - did not allow "carriage return" and "linefeed".
|
|
|
Match and return the obsolete "utext" syntax, which is identical
to obs_text.
|
|
|
Match the obsolete "phrase" syntax, which - unlike phrase -
allows dots between tokens.
|
|
|
Match a "phrase list" syntax and return the list of Strings
that make up the phrase. In contrast to a phrase, the
obs_phrase_list separates the individual words by commas. This
syntax is - as you will have guessed - obsolete.
|
|
Obsolete folding white space (section 4.2)
|
|
|
Parse and return an "obsolete fws" token. That is at least one
wsp character, followed by an arbitrary number (including zero)
of crlf followed by at least one more wsp character.
|
|
Obsolete Date and Time (section 4.3)
|
|
|
Parse a day_name but allow for the obsolete folding syntax.
|
|
|
Parse a year but allow for a two-digit number (obsolete) and the
obsolete folding syntax.
|
|
|
Parse a month_name but allow for the obsolete folding syntax.
|
|
|
Parse a day but allow for the obsolete folding syntax.
|
|
|
Parse a hour but allow for the obsolete folding syntax.
|
|
|
Parse a minute but allow for the obsolete folding syntax.
|
|
|
Parse a second but allow for the obsolete folding syntax.
|
|
|
Match the obsolete zone names and return the appropriate offset.
|
|
Obsolete Addressing (section 4.4)
|
|
|
This parser will match the "obsolete angle address" syntax. This
construct used to be known as a "route address" in earlier RFCs.
There are two differences between this construct and the
angle_addr: For one - as usual -, the obsolete form allows for
more liberal insertion of folding whitespace and comments.
Secondly, and more importantly, angle addresses used to allow the
(optional) specification of a "route". The newer version does not.
Such a routing address looks like this:
<@example1.org,@example2.org:simons@example.org>
The parser will return a tuple that - in case of the above address -
looks like this:
(["example1.org","example2.org"],"simons@example.org")
The first part contains a list of hosts that constitute the route
part. This list may be empty! The second part of the tuple is the
actual addr_spec address.
|
|
|
This parser parses the "route" part of obs_angle_addr and
returns the list of Strings that make up this route. Relies on
obs_domain_list for the actual parsing.
|
|
|
This parser parses a list of domain names, each of them prefaced
with an "at". Multiple names are separated by a comma. The list of
domains is returned - and may be empty.
|
|
|
Parse the obsolete syntax of a local_part, which allowed for
more liberal insertion of folding whitespace and comments. The
actual string is returned.
|
|
|
Parse the obsolete syntax of a domain, which allowed for more
liberal insertion of folding whitespace and comments. The actual
string is returned.
|
|
|
This parser will match the obsolete syntax for a mailbox_list.
This one is quite weird: An obs_mbox_list contains an arbitrary
number of mailboxes - including none -, which are separated by
commas. But you may have multiple consecutive commas without giving
a mailbox. You may also have a valid obs_mbox_list that
contains no mailbox at all. On the other hand, you must have
at least one comma.
So, this input is perfectly valid:
","
But this one is - contrary to all intuition - not:
"simons@example.org"
Strange, isn't it?
|
|
|
This parser is identical to obs_mbox_list but parses a list of
addresses rather than mailboxes. The main difference is that an
address may contain groups. Please note that as of now, the
parser will return a simple list of addresses; the grouping
information is lost.
|
|
Obsolete header fields (section 4.5)
|
|
|
|
Obsolete origination date field (section 4.5.1)
|
|
|
Parse a date header line but allow for the obsolete
folding syntax.
|
|
Obsolete originator fields (section 4.5.2)
|
|
|
Parse a from header line but allow for the obsolete
folding syntax.
|
|
|
Parse a sender header line but allow for the obsolete
folding syntax.
|
|
|
Parse a reply_to header line but allow for the obsolete
folding syntax.
|
|
Obsolete destination address fields (section 4.5.3)
|
|
|
Parse a to header line but allow for the obsolete
folding syntax.
|
|
|
Parse a cc header line but allow for the obsolete
folding syntax.
|
|
|
Parse a bcc header line but allow for the obsolete
folding syntax.
|
|
Obsolete identification fields (section 4.5.4)
|
|
|
Parse a message_id header line but allow for the obsolete
folding syntax.
|
|
|
Parse an in_reply_to header line but allow for the obsolete
folding and the obsolete phrase syntax.
|
|
|
Parse a references header line but allow for the obsolete
folding and the obsolete phrase syntax.
|
|
|
Parses the "left part" of a message ID, but allows the obsolete
syntax, which is identical to a local_part.
|
|
|
Parses the "right part" of a message ID, but allows the obsolete
syntax, which is identical to a domain.
|
|
Obsolete informational fields (section 4.5.5)
|
|
|
Parse a subject header line but allow for the obsolete
folding syntax.
|
|
|
Parse a comments header line but allow for the obsolete
folding syntax.
|
|
|
Parse a keywords header line but allow for the obsolete
folding syntax. Also, this parser accepts obs_phrase_list.
|
|
Obsolete resent fields (section 4.5.6)
|
|
|
Parse a resent_from header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_sender header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_date header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_to header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_cc header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_bcc header line but allow for the obsolete
folding syntax.
|
|
|
Parse a resent_msg_id header line but allow for the obsolete
folding syntax.
|
|
|
Parse a Resent-Reply-To header line but allow for the
obsolete folding syntax.
|
|
Obsolete trace fields (section 4.5.7)
|
|
|
|
|
|
|
Match obs_angle_addr.
|
|
|
This parser is identical to optional_field but allows the more
liberal line-folding syntax between the "field_name" and the "field
text".
|
|
Produced by Haddock version 2.6.1 |