![]() Component "netstring"
|
Netchannels The idea of Netchannels is to wrap the I/O functions for channels into classes. There is one basic type for input channels, and one for output channels: class type in_obj_channel = object (* Fundamental methods: *) method input : string -> int -> int -> int method close_in : unit -> unit method pos_in : int (* Derived methods: *) method really_input : string -> int -> int -> unit method input_char : unit -> char method input_line : unit -> string method input_byte : unit -> int (* Transitional method: *) method _rep_in : [ `Chan of in_channel | `Other ] end class type out_obj_channel = object (* fundamental methods: *) method output : string -> int -> int -> int method close_out : unit -> unit method pos_out : int method flush : unit -> unit (* Derived methods: *) method really_output : string -> int -> int -> unit method output_char : char -> unit method output_string : string -> unit method output_byte : int -> unit method output_buffer : Buffer.t -> unit method output_channel : ?len:int -> in_obj_channel -> unit (* Transitional method: *) method _rep_out : [ `Chan of out_channel | `Other ] endWhy are these class types useful? Because they hide the real source where input data come from, and hide the real destination output data is sent to. This is best illustrated by studying counter examples, which can be found in the O'Caml standard library. The module Lexing provides three functions that create lexers: val from_channel : Pervasives.in_channel -> lexbuf val from_string : string -> lexbuf val from_function : (string -> int -> int) -> lexbufThis is a bad interface because you need a new function for every type of data source you want to support. A similar example can be found in Printf: val fprintf : out_channel -> ('a, out_channel, unit) format -> 'a val printf : ('a, out_channel, unit) format -> 'a val eprintf : ('a, out_channel, unit) format -> 'a val sprintf : ('a, unit, string) format -> 'a val bprintf : Buffer.t -> ('a, Buffer.t, unit) format -> 'a val kprintf : (string -> string) -> ('a, unit, string) format -> 'aYou can print to channels, strings, buffers, and functions, and every type needs a new function supporting it. As Netstring has lots of parsers and printers, the interface would be much bigger than it is if we had adopted the same style. The solution is to hide the real source or destination, and to define the interface that all such generic sources and destinations must have. The result are the class types in_obj_channel and out_obj_channel, as shown above. Programming with in_obj_channel For example, let us program a function reading a data source line by line, and returning the sum of all lines which must be integer numbers. The argument ch is an open in_obj_channel, and the return value is the sum: let sum_up ch = let sum = ref 0 in try while true do let line = ch # input_line() in sum := !sum + int_of_string line done; assert false with End_of_file -> !sum ;;The interesting point is that the data source can be anything: a channel, a string, or any other class that implements the class type in_obj_channel[1]. This expression opens the file "data" and returns the sum of this file: let ch = new input_channel (open_in "data") in sum_up chThe class input_channel is an implementation of the type in_obj_channel where every method of the class simply calls the corresponding function of Pervasives. (By the way, it would be a good idea to close the channel afterwards: ch#close_in(). We will discuss that below.) This expression sums up the contents of a constant string: let s = "1\n2\n3\n4" in let ch = new input_string s in sum_up chThe class input_string is an implementation of the type in_obj_channel that reads from a string that is treated like a channel. The effect of using the Netchannels module is that the same implementation sum_up can be used to read from multiple data sources, as it is sufficient to call the function with different implementations of in_obj_channel. The details of in_obj_channel The properties of any class that implements in_obj_channel can be summarized as follows:
Programming with out_obj_channel The following function outputs the numbers of an int list sequentially on the passed netchannel: let print_int_list ch l = List.iter (fun n -> ch # output_string (string_of_int n); ch # output_char '\n'; ) l; ch # flush() ;;The following statements write the output into a file: let ch = new output_channel (open_out "data") in print_int_list ch [1;2;3]And these statements write the output into a buffer: let b = Buffer.create 16 in let ch = new output_buffer b in print_int_list ch [1;2;3]Again, the caller of the function print_int_list determines the type of the output destination, and you do not need several functions for several types of destination. The details of out_obj_channel The properties of any class that implements out_obj_channel can be summarized as follows:
How to close channels As channels may use file descriptors for their implementation, it is very important that all open channels are closed after they have been used; otherwise the operating system will certainly get out of file descriptors. The simple way, let ch = new <channel_class> args ... in ... do something ... ch # close_in() or close_out()is dangerous because an exception may be raised between channel creation and the "close" invocation. An elegant solution is to use with_in_obj_channel and with_out_obj_channel, as in: with_in_obj_channel (* or with_out_obj_channel *) (new <channel_class> ...) (fun ch -> ... do something ... )This programming idiom ensures that the channel is always closed after usage, even in the case of exceptions. Complete sample code: let sum = with_in_obj_channel (new input_channel (open_in "data")) sum_up ;; with_out_obj_channel (new output_channel (open_out "data")) (fun ch -> print_int_list ch ["1";"2";"3"]) ;; Examples: HTML parsing and printing In the Netstring library there are lots of parsers and printers that accept netchannels as data sources and destinations, respectively. One of them is the Nethtml module providing an HTML parser and printer. A few code snippets how to call them, just to get used to netchannels: let html_document = with_in_obj_channel (new input_channel (open_in "myfile.html")) Nethtml.parse ;; with_out_obj_channel (new output_channel (open_out "otherfile.html")) (fun ch -> Nethtml.write ch html_document) ;; Transactional output channels Sometimes you do not want that generated output is directly sent to the underlying file descriptor, but rather buffered until you know that everything worked fine. Imagine you program a network service, and you want to return the result only when the computations are successful, and an error message otherwise. One way to achieve this effect is to manually program a buffer: let network_service ch = try let b = Buffer.create 16 in let ch' = new output_buffer b in ... computations, write results into ch' ... ch' # close_out; ch # output_buffer b with error -> ... write error message to ch ... ;;There is a better way to do this, as there are transactional output channels. This type of netchannels provide a buffer for all written data like the above example, and only if data is explicitly committed it is copied to the real destination. Alternatively, you can also rollback the channel, i.e. delete the internal buffer. The signature of the type trans_out_obj_channel is: class type trans_out_obj_channel = object inherit out_obj_channel method commit_work : unit -> unit method rollback_work : unit -> unit endThey have the same methods as out_obj_channel plus commit_work and rollback_work. There are two implementations, one of them keeping the buffer in memory, and the other using a temporary file: let ch' = new buffered_trans_channel chAnd: let ch' = new tempfile_trans_channel chIn the latter case, there are optional arguments specifiying where the temporary file is created. Now the network service would look like: let network_service transaction_provider ch = try let ch' = transaction_provider ch in ... computations, write results into ch' ... ch' # commit_work(); ch' # close_out() (* implies ch # close_out() *) with error -> ch' # rollback_work(); ... write error message to ch' ... ch' # commit_work(); ch' # close_out() (* implies ch # close_out() *) ;;You can program this function without specifying which of the two implementations is used. Just call this function as network_service (new buffered_trans_channel) chor network_service (new tempfile_trans_channel) chto determine the type of transaction buffer. Some details:
Pipes and filters The class pipe is an in_obj_channel and an out_obj_channel at the same time (i.e. the class has the type io_obj_channel). A pipe has two endpoints, one for reading and one for writing (similar to the pipes provided by the operating system). Of course, you cannot read and write at the same time (different from the pipes of the OS), so there must be an internal buffer storing the data that have been written but not yet read. How can such a construction be useful? Imagine you have two routines that run alternately, and one is capable of writing into netchannels, and the other can read from a netchannel. Pipes are the missing communication link in this situation, because the writer routine can output into the pipe, and the reader routine can read from the buffer of the pipe. In the following example, the writer outputs numbers from 1 to 100, and the reader sums them up: let pipe = new pipe() ;; let k = ref 1 ;; let writer() = if !k <= 100 then ( pipe # output_string (string_of_int !k); incr k; if !k > 100 then pipe # close_out() else pipe # output_char '\n'; ) ;; let sum = ref 0 ;; let reader() = let line = pipe # input_line() in sum := !sum + int_of_string line ;; try while true do writer(); reader() done with End_of_file -> () ;;The writer function prints the numbers into the pipe, and the reader function reads them in. By closing only the output end of the pipe the writer signals the end of the stream, and the input_line method raises the exception End_of_file. Of course, this example is very simple. What does happen when more is printed into the pipe than read? The internal buffer grows. What does happen when more is tried to read from the pipe than available? The input methods signal this by raising the special exception Buffer_underrun. Unfortunately, handling this exception can be very complicated, as the reader must be able to deal with partial reads. This could be solved by using the Netstream module. A netstream is another extension of in_obj_channel that allows one to look ahead, i.e. you can look at the bytes that will be read next, and use this information to decide whether enough data are available or not. Netstreams are explained in another chapter of this manual. Pipes have another feature that makes them useful even for "normal" programming. You can specify a conversion function that is called when data is to be transferred from the writing end to the reading end of the pipe. The module Netencoding.Base64 defines such a pipe that converts data: The class encoding_pipe automatically encodes all bytes written into it by the Base64 scheme: let pipe = new Netencoding.Base64.encoding_pipe() ;; pipe # output_string "Hello World"; pipe # close_out() ;; let s = pipe # input_line() ;;s has now the value "SGVsbG8gV29ybGQ=", the encoded form of the input. This kind of pipe has the same interface as the basic pipe class, and the same problems to use it. Fortunately, the Netstring library has another facility simplifying the usage of pipes, namely filters. There are two kinds of filters: The class output_filter redirects data written to an out_obj_channel through a pipe, and the class input_filter arranges that data read from an in_obj_channel flows through a pipe. An example makes that clearer. Imagine you have a function write_results that writes the results of a computation into an out_obj_channel. Normally, this channel is simply a file: with_out_obj_channel (new output_channel (open_out "results")) write_resultsNow you want that the file is Base64-encoded. This can be arranged by calling write_results differently: let pipe = new Netencoding.Base64.encoding_pipe() in with_out_obj_channel (new output_channel (open_out "results")) (fun ch -> let ch' = new output_filter pipe ch in write_results ch'; close_out ch' )Now any invocation of an output method for ch' actually prints into the filter, which redirects the data through the pipe, thus encoding them, and finally passing the encoded data to the underlying channel ch. Note that you must close ch' to ensure that all data are filtered, it is not sufficient to flush output. It is important to understand why filters must be closed to work properly. The problem is that the Base64 encoding converts triples of three bytes into quadruples of four bytes. Because not every string to convert is a multiple of three, there are special rules how to handle the exceeding one or two bytes at the end. The pipe must know the end of the input data in order to apply these rules correctly. If you only flush the filter, the exceeding bytes would simply remain in the internal buffer, because it is possible that more bytes follow. By closing the filter, you indicate that the definite end is reached, and the special rules for trailing data must be performed. - Many conversions have similar problems, and because of this it is a good advice to always close output filters after usage. There is not only the class output_filter but also input_filter. This class can be used to perform conversions while reading from a file. Note that you often do not need to close input filters, because input channels can signal the end by raising End_of_file, so the mentioned problems usually do not occur. There are a number of predefined conversion pipes:
Defining your own netchannel classes As subtyping and inheritance are orthogonal in O'Caml, you can simply create your own netchannels by defining classes that match the in_obj_channel or out_obj_channel types: class my_in_channel : in_obj_channel = object (self) method input s pos len = ... method close_in() = ... method pos_in = ... method really_input s pos len = ... method input_char() = ... method input_line() = ... method input_byte() = ... method _rep_in = ... endOf course, this is non-trivial, especially for the in_obj_channel case. Fortunately, the Netchannels module includes a "construction kit" that allows one to define a channel class from only a few methods [2]. A closer look at in_obj_channel and out_obj_channel shows that some methods can be derived from more fundamental methods. The following class types include only the fundamental methods: class type raw_in_channel = object method input : string -> int -> int -> int method close_in : unit -> unit method pos_in : int end class type raw_out_channel = object method output : string -> int -> int -> int method close_out : unit -> unit method pos_out : int method flush : unit -> unit end Basically, it is sufficient to define only these methods, and to add the missing methods by inheriting from augment_raw_in_channel or augment_raw_out_channel: class my_in_channel : in_obj_channel = object (self) inherit augment_raw_in_channel method input s pos len = ... method close_in() = ... method pos_in = ... endNote that this does not add any buffering to the channel. For example, really_input calls input as often as necessary to read the demanded string. This can make I/O quite slow, and it is recommended to add a buffering mechanism to self-defined classes. Again, there are some helper classes simplifying this task: class my_raw_in_channel : raw_in_channel = object (self) method input s pos len = ... method close_in() = ... method pos_in = ... end class my_in_channel : in_obj_channel = object (self) inherit buffered_raw_in_channel (new my_raw_in_channel) inherit augment_raw_in_channel endIt is important to turn the raw_in_channel into a buffered channel first, and then to augment the class by the missing methods - otherwise the latter methods would bypass the buffer. This works for output channel in the same way. There is still a problem with my_in_channel, and it has to do with the poor implementation of input_line that can be found in augment_raw_in_channel. This method reads character by character until it finds the '\n' ending the line. Of course, this algorithm can be improved when a buffer is present. To get a better algorithm that is aware of the buffer: class my_raw_in_channel : raw_in_channel = object (self) method input s pos len = ... method close_in() = ... method pos_in = ... end class my_in_channel : in_obj_channel = object (self) inherit buffered_raw_in_channel (new my_raw_in_channel) inherit augment_raw_in_channel method input_line = self # enhanced_input_line endenhanced_input_line is a private method defined by buffered_raw_in_channel that is intended to override the poor definition of augment_raw_in_channel when it is possible to do so. There is no corresponding problem with buffered output channels. Making netchannels from file descriptors and sockets The classes input_descr and output_descr create raw netchannels from file descriptors (Unix.file_descr). These netchannels are unbuffered, but you can apply the "construction kit" presented in the last section to add buffering if needed. A special task has the class socket_descr. The idea is to create socket_descr objects for connected sockets. These objects are raw input and raw output netchannels at the same time (the socket is bidirectional). The input and output methods are unbuffered, and it is possible to serve both directions of the socket with only one object. The special effect of the object is that close_in and close_out perform shutdowns of the read and write endpoints, respectively, if only one of the close methods is called, and the other not. If both sides are closed, the socket will be closed. FAQ
|