The getopt program will fetch the options from the command line and check that the required options are present. I'll implement this in several different ways to demonstrate some programming techniques.
The goal is to to recognise the following options.
This is optional.
This is required. It sets the width.
This is required. It sets the height.
This prints a help message.
The remaining arguments are file names.
Our little program will print the file names. If the verbose option is given then it will also print the width and height. The usage will be:
Usage: [-h] [-v|--verbose] [--width width] [--height height] files |
This first version, getopt1.sml, is in a mostly-functional style. The deviation is in the use of an exception to abort the program with an error message.
The first part of the program has some type definitions for documentation.[1]
(* The options will be returned as a list of pairs of name and value. We need to use an option type for the value so that we can distinguish between a missing value and an empty value. *) type Option = string * string option (* The result from the command line parsing will be a list of file names and a set of options. *) type CmdLine = (Option list) * (string list) (* This exception will bomb with a usage message. *) exception Usage of string |
I've defined Option to be a pair of a string for the name of the option and an optional string for its value. The name will be an internal canonical name. The CmdLine type is to describe the result from parse_cmdline, namely a list of options and a list of files.
I've defined an exception Usage which carries a message. I use this to abort the program. The exception is caught when it aborts the main function and prints a message on stdErr. The exception handler returns the failure code so that the program exits with an exit code of 1.
The next section of the program scans the arguments.
fun parse_cmdline argv : CmdLine = let fun loop [] opts = (opts, []) (* no more args *) | loop ("-h"::rest) opts = loop rest (("help", NONE) :: opts) | loop ("-v"::rest) opts = loop rest (("verbose", NONE) :: opts) | loop ("--verbose"::rest) opts = loop rest (("verbose", NONE) :: opts) | loop ("--width"::rest) opts = get_value "width" rest opts | loop ("--height"::rest) opts = get_value "height" rest opts | loop (arg::rest) opts = ( if String.sub(arg, 0) = #"-" then raise Usage (concat["The option ", arg, " is unrecognised."]) else (opts, arg::rest) (* the final result *) ) and get_value name [] opts = ( raise Usage (concat[ "The value for the option ", name, " is missing."]) ) | get_value name (v::rest) opts = ( loop rest ((name, SOME v) :: opts) ) in loop argv [] end |
The parse_cmdline function scans the arguments in an inner loop. I've got a type constraint on the expression pattern parse_cmdline argv to indicate that its resulting value is of the type CmdLine. Although this is not strictly necessary it aids readability. It can also make it easier to find the location of type errors by putting in explicit points where the type is known. (See Appendix B for a discussion on type errors). If you wanted a type constraint on the argv argument then you would need to put it in parentheses i.e. (argv: string list).
I've used literal strings in the binding patterns for conciseness. So for example the second variant of the loop function says that if the argument list starts with a "-h" then continue looping over the rest of the arguments with the options table in opts augmented with the pair ("help", NONE).
The first variant catches the case of running out of arguments while scanning for options. In this case I return the options that I have and the list of files is empty.
To handle an option which requires a value I've used a separate function, get_value. It looks at the first of the rest of the arguments. If the rest of the arguments are empty then the value is missing. If present then I add it to the opts table and continue the loop. Note that with an option type, a value that is present is tagged with the SOME data constructor. This algorithm will treat the case of --width --height as a width option having the value --height.
The get_value function must be joined to the loop function with the and keyword to make the forward reference to get_value in loop legal. The two are mutually recursive functions.
The last variant of the loop function catches all the arguments that don't match any of the preceding option strings. Remember that in cases the variants are matched in order from first to last. An identifier in a binding pattern, here arg, will match any value. I need to check if the value starts with a hyphen in which case it is an unrecognised option. I've used the String.sub function which subscripts a string to get a character. The first character is at index 0. The #"-" notation is the hyphen character. If the argument does not start with a hyphen then I return the final result which is the options table and the rest of the arguments, not forgetting that arg is one of them.
I've used parentheses to bracket the body of each variant although in this code they are redundant. I find that I get fewer syntax surprises this way as the code gets more complex. Imagine a body containing a case expression!
The next section of the program has some utility functions to deal with option tables.
and find_option opts name : (string option) option = ( case List.find (fn (n, v) => n = name) opts of NONE => NONE | SOME (n, v) => SOME v ) and has_option opts name = (find_option opts name) <> NONE and require_option opts name and_value : string = ( case find_option opts name of NONE => raise Usage (concat[ "The option '", name, "' is missing."]) | SOME NONE => (* found but has no value *) ( if and_value then raise Usage (concat[ "The option '", name, "' is missing a value."]) else "" ) | SOME (SOME v) => v (* found and has a value *) ) |
The find_option searches the table for a given name. I've used the List.find function which finds the first entry in the option list that satisfies the predicate function which is the first argument to List.find. Remember that each member of the list is a pair of name and value. So the argument to the predicate is a pair, matching with (n, v). The predicate tests if the name field is the same as the supplied name.
The option tables have a subtle property. I built them in reverse by pushing new options onto the front of the list. So if there are duplicate options then the first one found will be the last on the command line. I either should not rely on this or I should document it loudly.
The result from the List.find will be of the type Option option. That is it will be NONE if the option was not found or else some name-value pair, SOME (n, v). I've decided that I only want to return the value. But I have to indicate if the value was found or not in the options table so I wrap it in another level of option.
The has_option just tests if find_option returns a non-NONE value. The equality and inequality (<>) operators are available for the type T option if they are available for a type T.
The require_option function checks that an option is present in the table. If the and_value flag is true then I also require it to have a value. If it has a value then I return it. Because every if expression must have both a then and an else part I can't avoid covering the case of an option not having a value and not needing to, even though I don't use this case in the program. Better safe than sorry.
The final part of the program is the main function. It should be fairly straightforward.
fun main(arg0, argv) = let val (opts, files) = parse_cmdline argv val width = require_option opts "width" true val height = require_option opts "height" true fun show_stuff() = ( print "The files are"; app (fn f => (print " "; print f)) files; print ".\n"; if has_option opts "verbose" then print(concat[ "The width is ", width, ".\n", "The height is ", height, ".\n" ]) else () ) in if has_option opts "help" then print "some helpful blurb\n" else show_stuff(); OS.Process.success end handle Usage msg => ( TextIO.output(TextIO.stdErr, concat[msg, "\nUsage: [-h] [-v|--verbose] [--width width]", " [--height height] files\n"]); OS.Process.failure ) |
Observe again how all if expressions must have both a then and an else part and each part must return a value. Since the print function has unit as a return value then the else part must too. The () is the notation for the one and only value of the unit type. You can interpret it as "do nothing".
In a real program of course I would call some function to do the work of the program and pass it the options and the file names. But the rest of the code of the program may be quite large and it will only refer to the options in a few places scattered throughout the program. It would be awkward to pass the options table all the way through the program just to be read in a few places. The program would quickly become difficult to read. Instead I will cheat and put the options into a global table. Since they are used read-only and are set before the body of the program is run this won't break referential transparency. The program will still be as good as pure.
The way to put values into global variables in SML is to use global reference values. Reference values emulate the variables of imperative programs. But we don't have to get our hands dirty dealing with them. The SML/NJ utility library includes a hash table module that uses reference values internally to store its contents imperatively. In the next section I show how to set up one of those.
The SML/NJ utility library defines a generic hash table using imperative storage so that you can update its contents. (See Chapter 5). The table is generic over the key type. You need to supply a specification of the key type and its properties to make an instance of the table type. Then you can create values of the table type. All of the values have the same key type but each can have a different content type, since the table is polymorphic in this type. But all entries in a particular table have the same content type. SML does not do dynamic typing or subtyping.
The generic hash table is defined by this functor from the hash-table-fn.sml file in the SML/NJ library.
functor HashTableFn (Key : HASH_KEY) : MONO_HASH_TABLE |
The functor takes a Key structure as an argument and produces a hash table structure. The HASH_KEY signature describes what the Key structure must tell the functor. Observe that in signatures, functions are described as a value of a function type.
signature HASH_KEY = sig type hash_key val hashVal : hash_key -> word (* Compute an unsigned integer from a hash key. *) val sameKey : (hash_key * hash_key) -> bool (* Return true if two keys are the same. * NOTE: if sameKey(h1, h2), then it must be the * case that (hashVal h1 = hashVal h2). *) end (* HASH_KEY *) |
For the option table I want strings for keys. So I've defined a string table key with the following structure. The hash function comes from another library module in the hash-string.sml file.
structure STRT_key = struct type hash_key = string val hashVal = HashString.hashString fun sameKey (s1, s2) = (s1 = s2) end |
Now I can assemble these to make a module I call STRT that implements a hash table from strings to some content type. I've also defined a useful exception that will be used later for when table lookups fail.
structure STRT = HashTableFn(STRT_key) exception NotFound |
This structure conforms to the MONO_HASH_TABLE signature. Here MONO means it is monomorphic in the key type. This signature describes all of the types and values (including functions) that the hash table structure makes public. Here is a part of this signature containing the features that I use often.
signature MONO_HASH_TABLE = sig structure Key : HASH_KEY type 'a hash_table val mkTable: (int * exn) -> 'a hash_table (* Create a new table; the int is a size hint * and the exception is to be raised by find. *) val insert: 'a hash_table -> (Key.hash_key * 'a) -> unit (* Insert an item. If the key already has an item * associated with it, then the old item is * discarded. *) val lookup: 'a hash_table -> Key.hash_key -> 'a (* Find an item, the table's exception is raised * if * the item doesn't exist *) val find: 'a hash_table -> Key.hash_key -> 'a option (* Look for an item, return NONE if the item * doesn't exist *) val listItemsi: 'a hash_table -> (Key.hash_key * 'a) list (* Return a list of the items (and their keys) in * the table *) end |
This shows that the table structure exports a copy of the Key structure that defined it. This is good practice as it can be useful to get access to the hash function of the table.
So now I have the type 'a STRT.hash_table which maps from string keys to some content type represented by the type variable 'a. I can create a table from strings to strings like this.
type OptionTable = string STRT.hash_table val option_tbl: OptionTable = STRT.mkTable(101, NotFound) |
The type constraint on the table value settles the type of the table immediately to save the compiler and the reader having to figure it out.
With these hash table tools I can go on to write a neater getopt program, called getopt2.sml. I'm in the habit of putting useful things like the string table structure into a common module which can be used throughout a project. I put global variables like the option table into their own separate module. These would normally go into separate files. In the source code for this program I've put them all in the same file. Here is the common module which exports all of its declarations.
structure Common = struct (*-------------------------------------------------*) (* A hash table with string keys. *) structure STRT_key = struct type hash_key = string val hashVal = HashString.hashString fun sameKey (s1, s2) = (s1 = s2) end structure STRT = HashTableFn(STRT_key) exception NotFound (*-------------------------------------------------*) end |
Then I define a signature for the global module to constrain what it exports. It's got a basic API for setting and testing options. In keeping with the previous getopt program an option value is an optional string so that I can tell the difference between a missing option value and an empty option value.
signature GLOBAL = sig type Option = string option (* Add an option to the table silently overriding an existing entry. *) val addOption: (string * Option) -> unit (* Test if an option is in the table. *) val hasOption: string -> bool (* Get the value of an option if it exists. *) val getOption: string -> Option option end |
Next I define the global module. The open declaration imports everything from Common and makes its names directly visible. Note that there must be a definition for every name declared in the GLOBAL signature so the Option type must be defined again.
structure Global: GLOBAL = struct open Common (*-------------------------------------------------*) (* The option table. *) type Option = string option type OptionTable = Option STRT.hash_table val option_tbl: OptionTable = STRT.mkTable(20, NotFound) fun addOption arg = STRT.insert option_tbl arg fun hasOption name = STRT.find option_tbl name <> NONE fun getOption name = STRT.find option_tbl name (*-------------------------------------------------*) end |
The option table is a value in the structure. This value will be created when the module is compiled into the heap as I described in the section called Assembling the Hello World Program. The value comes from the mkTable function. It will end up in the exported heap file. When defining addOption I made the argument type match the argument to the STRT.insert function. This avoids unpacking and repacking the contents as it passes from function to function.
I could have abbreviated the definitions of addOption and getOption further by taking advantage of currying but I think that this obscures the code a bit for no real gain.
val addOption = STRT.insert option_tbl val getOption = STRT.find option_tbl |
Finally the main program is rewritten to eliminate all mention of a table of options.
structure Main= struct (* This exception will bomb with a usage message. *) exception Usage of string fun parse_cmdline argv : string list = let fun loop [] = [] (* no more arguments *) | loop ("-h"::rest) = add ("help", NONE) rest | loop ("-v"::rest) = add ("verbose", NONE) rest | loop ("--verbose"::rest) = add ("verbose", NONE) rest | loop ("--width"::rest) = get_value "width" rest | loop ("--height"::rest) = get_value "height" rest | loop (arg::rest) = ( if String.sub(arg, 0) = #"-" then raise Usage (concat[ "The option ", arg, " is unrecognised."]) else arg::rest (* the final result *) ) and get_value name [] = ( raise Usage (concat["The value for the option ", name, " is missing."]) ) | get_value name (v::rest) = add (name, SOME v) rest and add pair rest = ( Global.addOption pair; loop rest ) in loop argv end fun require_option name and_value : string = ( case Global.getOption name of NONE => raise Usage (concat[ "The option '", name, "' is missing."]) | SOME NONE => (* found but no value *) ( if and_value then raise Usage (concat["The option '", name, "' is missing a value."]) else "" ) | SOME (SOME v) => v (* found with a value *) ) fun main(arg0, argv) = let val files = parse_cmdline argv val width = require_option "width" true val height = require_option "height" true fun show_stuff() = ( print "The files are"; app (fn f => (print " "; print f)) files; print ".\n"; if Global.hasOption "verbose" then print(concat[ "The width is ", width, ".\n", "The height is ", height, ".\n" ]) else () ) in if Global.hasOption "help" then print "some helpful blurb\n" else show_stuff(); OS.Process.success end handle Usage msg => ( TextIO.output(TextIO.stdErr, concat[msg, "\nUsage: [-h] [-v|--verbose] [--width width]", " [--height height] files\n"]); OS.Process.failure ) val _ = SMLofNJ.exportFn("getopt2", main) end |
Since I am now using modules from the SML/NJ utility library I must mention the library in the CM file for the program. Here is the getopt2.cm file. It has the path to a CM file for the library which was created when SML/NJ was installed.
group is getopt2.sml /src/smlnj/current/lib/smlnj-lib.cm |
An alternative to having the table in the heap is to have it built on demand when the program runs. A convenient way to do this in SML/NJ is described in the section called Lazy Suspensions in Chapter 4.
The getopt programs I've done so far implement a simple command line syntax. This next one, called getopt3.sml, does the full Gnu-style syntax with short and long options etc. It will report its usage as:
Usage: getopt -v --verbose Select verbose output --width=width The width in pixels --height=height The height in pixels -h --help Show this message. |
I've written this program using the GetOpt structure in the SML/NJ utility library. This structure is rather under-documented and not that easy to figure out. You can find its signature in the getopt-sig.sml file. When you use this module to parse the command line you get back a list of options and files similar to my first getopt program. But I will then transfer them to a global option table as in the the second getopt program.
I start by building an Option module that contains the command line parsing and can deliver the values of the options imperatively. The API for this module is specified in the OPTION signature. I've put in an alias of G for the GetOpt structure to save on typing.
structure Option: OPTION = struct structure G = GetOpt |
The interface to the GetOpt structure revolves around a single type to represent all of the possible options. This should be a datatype to be useful. I start by defining the Option type. I keep the width and height as strings for simplicity but in a real program you would probably use integers with Int.fromString to do the conversion.
(* This represents an option found on the command line. *) datatype Option = Verbose | Help | Width of string | Height of string |
An option is described by the following record type in GetOpt.
type 'a opt_descr = { short : string, long : string list, desc : 'a arg_descr, help : string } (* Description of a single option *) |
The short field contains the single letter version of the option. If you have more than one letter then they are treated as synonyms. The long field contains the long version of the option as a list of strings. Again if you have more than one then they are treated as synonyms.
The desc field describes properties of the options value and how to map it to the representation type (my Option). The value of this field is of the following datatype from GetOpt.
datatype 'a arg_descr = NoArg of unit -> 'a | ReqArg of (string -> 'a) * string | OptArg of (string option -> 'a) * string |
The 'a type variable is a place holder for the representation type. If the option takes no value then supply NoArg. You must include with it a function that returns a representation value for the option. If the option requires a value then supply ReqArg along with a function to convert the value to the representation type and a description of the value for the usage message. If the option's value is optional then supply OptArg along with a conversion function and a description as for ReqArg. Note that for OptArg the conversion function is passed a string option type to tell whether the value is available or not.
Here is my code for part of the option description list.
fun NoArg opt = G.NoArg (fn () => opt) fun ReqArg opt descr = G.ReqArg (opt, descr) val options: (Option G.opt_descr) list = [ {short = "v", long = ["verbose"], desc = NoArg Verbose, help = "Select verbose output" }, {short = "", long = ["width"], desc = ReqArg Width "width", help = "The width in pixels" }, |
I've defined two helper functions to make it easier to write the option descriptions. My NoArg function takes a value of the representation type and wraps it into a conversion function for the GetOpt.NoArg data constructor.
My ReqArg does a similar thing but here the first argument opt is the conversion function. The data constructors Width and Height in the Option type can be used as functions to construct values of type Option from the types that they tag. For example the Width data constructor behaves as a function with the type string -> Option which is just the type needed for GetOpt.ReqArg.
The command line is parsed using the GetOpt.getOpt function. Its signature is
datatype 'a arg_order = RequireOrder | Permute | ReturnInOrder of string -> 'a (* What to do with options following non-options: * RequireOrder: no processing after first non-option * Permute: freely intersperse options and non-options * ReturnInOrder: wrap non-options into options *) val getOpt : { argOrder : 'a arg_order, options : 'a opt_descr list, errFn : string -> unit } -> string list -> ('a list * string list) (* takes as argument an arg_order to specify the * non-options handling, a list of option descriptions * and a command line containing the options and * arguments, and returns a list of (options, * non-options) *) |
The first argument is a record of details about the options. I'll just use RequireOrder for the ordering control. The options field is my list of option descriptions. For an error function I just need something to print a string to stdErr.
fun toErr msg = TextIO.output(TextIO.stdErr, msg) |
The second argument is the argv list. The result is a list of representation values and the remaining arguments, the file names. Here is my code to parse the command line.
val opt_tbl: (Option list) ref = ref [] fun parseCmdLine argv = let val (opts, files) = G.getOpt { argOrder = G.RequireOrder, options = options, errFn = toErr } argv in opt_tbl := opts; files end |
When I get back the list of options I assign it to the imperative variable opt_tbl. The variable must have an initial value which is constructed by the ref data constructor from an empty list.
Then I can write some accessor functions to get option information from the table.
fun hasVerbose() = ( List.exists (fn opt => opt = Verbose) (!opt_tbl) ) fun hasHelp() = ( List.exists (fn opt => opt = Help) (!opt_tbl) ) fun getWidth() = let val opt_width = List.find (fn Width _ => true | _ => false) (!opt_tbl) in case opt_width of NONE => NONE | SOME(Width w) => SOME w | _ => raise Fail "Option,getWidth" ) |
The ! operator dereferences the imperative variable. It is like the * operator in C. The operator precedence rules require the parentheses around the dereference. I've used List.exists to check for the presence of the simple options in the table.
To get the width value out I need List.find to return the entry from the list. The predicate needs some more elaborate binding patterns in order to recognise the Width tag and ignore the string value with an underscore. The case expression must cover all possibilities or else you will get messy warnings from the compiler which should be avoided at all costs or some day you will miss a genuine error among the warnings. In this example I need to cover the find returning a non-Width entry even though that is impossible for the predicate I've used. The Fail exception is a built-in exception in the Basis library that you can use to signal an internal fatal error. Use it only for impossible conditions like this. The GetOpt implementation also uses it for impossible conditions.
Finally I rewrite the require_option function to check that the option was supplied on the command line. It works for all of the get* functions.
fun require_option func name : string = ( case func() of NONE => raise Usage (concat[ "The option '", name, "' is missing."]) | SOME v => v ) |
Then my main function becomes:
fun main(arg0, argv) = let val files = Option.parseCmdLine argv val width = require_option Option.getWidth "width" val height = require_option Option.getHeight "height" fun show_stuff() = ( print "The files are"; app (fn f => (print " "; print f)) files; print ".\n"; if Option.hasVerbose() then print(concat[ "The width is ", width, ".\n", "The height is ", height, ".\n" ]) else () ) in if Option.hasHelp() then print "some helpful blurb\n" else show_stuff(); OS.Process.success end handle Usage msg => ( toErr msg; toErr "\n"; toErr(Option.usage()); toErr "\n"; OS.Process.failure ) |
[1] | My naming convention dresses up public names in mixed case and uses lower case with underscores for private names. Types start with an uppercase. |