「48時間でSchemeを書こう」の版間の差分

削除された内容 追加された内容
→‎最初の一歩: 表の和訳以外訳出
→‎構文解析: 翻訳準備
159 行
# <code>getLine</code>はコンソールから一行読み込み、文字列として返すIOアクションです。名前を入力を促し、名前を読み、コマンドライン引数の代わりにそれを出力するようにプログラムを変更しなさい。
 
== Parsing構文解析 ==
 
=== Writing a Simple Parser ===
 
Now, let's try writing a very simple parser. We'll be using the [http://www.cs.uu.nl/~daan/download/parsec/parsec.html Parsec] library, which comes with [http://www.haskell.org/ghc GHC] (needs to install libghc6-parsec-dev on Ubuntu) but may need to be downloaded separately if you're using another compiler.
 
Start by adding this line to the import section:
 
import Text.ParserCombinators.Parsec hiding (spaces)
 
This makes the Parsec library functions available to us, except the <code>spaces</code> function, whose name conflicts with a function that we'll be defining later.
 
Now, we'll define a parser that recognizes one of the symbols allowed in Scheme identifiers:
 
symbol :: Parser Char
symbol = oneOf "!#$%&amp;|*+-/:&lt;=&gt;?@^_~"
 
This is another example of a monad: in this case, the "extra information" that is being hidden is all the info about position in the input stream, backtracking record, first and follow sets, etc. Parsec takes care of all of that for us. We need only use the Parsec library function [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#oneOf oneOf], and it'll recognize a single one of any of the characters in the string passed to it. Parsec provides a number of pre-built parsers: for example, [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#letter letter] and [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#digit digit] are library functions. And as you're about to see, you can compose primitive parsers into more sophisticated productions.
 
Let's define a function to call our parser and handle any possible errors:
 
readExpr :: String -&gt; String
readExpr input = case parse symbol "lisp" input of
Left err -&gt; "No match: " ++ show err
Right val -&gt; "Found value"
 
As you can see from the type signature, <code>readExpr</code> is a function (-&gt;) from a String to a String. We name the parameter <code>input</code>, and pass it, along with the <code>symbol</code> action we defined above and the name of the parser ("lisp"), to the Parsec function [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#parse parse].
 
<code>parse</code> can return either the parsed value or an error, so we need to handle the error case. Following typical Haskell convention, Parsec returns an [http://www.haskell.org/onlinereport/standard-prelude.html#$tEither Either] data type, using the <code>Left</code> constructor to indicate an error and the <code>Right</code> one for a normal value.
 
We use a <code>case...of</code> construction to match the result of <code>parse</code> against these alternatives. If we get a Left value (error), then we bind the error itself to <code>err</code> and return "No match" with the string representation of the error. If we get a Right value, we bind it to <code>val</code>, ignore it, and return the string "Found value".
 
The <code>case...of</code> construction is an example of pattern matching, which we will see in much greater detail [[../Evaluation, Part 1#Beginnings of an evaluator: Primitives|later on]].
 
Finally, we need to change our main function to call <code>readExpr</code> and print out the result (need to add <code>import System</code> in the beginning of the file now):
 
main :: IO ()
main = do args &lt;- getArgs
putStrLn <span class="changed_code">(readExpr (args !! 0))</span>
 
To compile and run this, you need to specify <code>-package parsec</code> on the command line, or else there will be link errors. For example:
 
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.1.hs listing3.1.hs]
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser $
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser a
No match: "lisp" (line 1, column 1):
unexpected "a"
 
=== Whitespace ===
 
Next, we'll add a series of improvements to our parser that'll let it recognize progressively more complicated expressions. The current parser chokes if there's whitespace preceding our symbol:
 
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " %"
No match: "lisp" (line 1, column 1):
unexpected " "
 
Let's fix that, so that we ignore whitespace.
 
First, lets define a parser that recognizes any number of whitespace characters. Incidentally, this is why we included the <code>hiding (spaces)</code> clause when we imported Parsec: there's already a function "[http://www.cs.uu.nl/~daan/download/parsec/parsec.html#spaces spaces]" in that library, but it doesn't quite do what we want it to. (For that matter, there's also a parser called [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#lexeme lexeme] that does exactly what we want, but we'll ignore that for pedagogical purposes.)
 
spaces :: Parser ()
spaces = skipMany1 space
 
Just as functions can be passed to functions, so can actions. Here we pass the Parser action [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#space space] to the Parser action [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#skipMany1 skipMany1], to get a Parser that will recognize one or more spaces.
 
Now, let's edit our parse function so that it uses this new parser. Changes are in red:
 
readExpr input = case parse <span style="color:red">(spaces &gt;&gt; symbol)</span> "lisp" input of
Left err -&gt; "No match: " ++ show err
Right val -&gt; "Found value"
 
We touched briefly on the &gt;&gt; ("bind") operator in lesson 2, where we mentioned that it was used behind the scenes to combine the lines of a do-block. Here, we use it explicitly to combine our whitespace and symbol parsers. However, bind has completely different semantics in the Parser and IO monads. In the Parser monad, bind means "Attempt to match the first parser, then attempt to match the second with the remaining input, and fail if either fails." In general, bind will have wildly different effects in different monads; it's intended as a general way to structure computations, and so needs to be general enough to accommodate all the different types of computations. Read the documentation for the monad to figure out precisely what it does.
 
Compile and run this code. Note that since we defined <code>spaces</code> in terms of <code>skipMany1</code>, it will no longer recognize a plain old single character. Instead you ''have to'' precede a symbol with some whitespace. We'll see how this is useful shortly:
 
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.2.hs listing3.2.hs]
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " %"
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser %
No match: "lisp" (line 1, column 1):
unexpected "%"
expecting space
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " abc"
No match: "lisp" (line 1, column 4):
unexpected "a"
expecting space
 
=== Return Values ===
 
Right now, the parser doesn't ''do'' much of anything - it just tells us whether a given string can be recognized or not. Generally, we want something more out of our parsers: we want them to convert the input into a data structure that we can traverse easily. In this section, we learn how to define a data type, and how to modify our parser so that it returns this data type.
 
First, we need to define a data type that can hold any Lisp value:
 
data LispVal = Atom String
| List [LispVal]
| DottedList [LispVal] LispVal
| Number Integer
| String String
| Bool Bool
 
This is an example of an ''algebraic data type''<nowiki>: it defines a set of possible values that a variable of type LispVal can hold. Each alternative (called a </nowiki>''constructor'' and separated by <code>|</code>) contains a tag for the constructor along with the type of data that that constructor can hold. In this example, a <code>LispVal</code> can be:
 
# An <code>Atom</code>, which stores a String naming the atom
# A <code>List</code>, which stores a list of other LispVals (Haskell lists are denoted by brackets); also called a ''proper'' list
# A <code>DottedList</code>, representing the Scheme form <code>(a b . c)</code>; also called an ''improper'' list. This stores a list of all elements but the last, and then stores the last element as another field
# A <code>Number</code>, containing a Haskell Integer
# A <code>String</code>, containing a Haskell String
# A <code>Bool</code>, containing a Haskell boolean value
 
Constructors and types have different namespaces, so you can have both a constructor named String and a type named String. Both types and constructor tags always begin with capital letters.
 
Next, let's add a few more parsing functions to create values of these types. A string is a double quote mark, followed by any number of non-quote characters, followed by a closing quote mark:
 
parseString :: Parser LispVal
parseString = do char '"'
x &lt;- many (noneOf "\"")
char '"'
return $ String x
 
We're back to using the do-notation instead of the &gt;&gt; operator. This is because we'll be retrieving the value of our parse (returned by [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#many many] ([http://www.cs.uu.nl/~daan/download/parsec/parsec.html#noneOf noneOf] "\"")) and manipulating it, interleaving some other parse operations in the meantime. In general, use &gt;&gt; if the actions don't return a value, &gt;&gt;= if you'll be immediately passing that value into the next action, and do-notation otherwise.
 
Once we've finished the parse and have the Haskell String returned from <code>many</code>, we apply the String constructor (from our LispVal data type) to turn it into a LispVal. Every constructor in an algebraic data type also acts like a function that turns its arguments into a value of its type. It also serves as a pattern that can be used in the left-hand side of a pattern-matching expression; we saw an example of this in [[#Writing a Simple Parser|Lesson 3.1]] when we matched our parser result against the two constructors in the <code>Either</code> data type.
 
We then apply the built-in function [http://www.haskell.org/onlinereport/standard-prelude.html#$tMonad return] to lift our LispVal into the Parser monad. Remember, each line of a do-block must have the same type, but the result of our String constructor is just a plain old LispVal. Return lets us wrap that up in a Parser action that consumes no input but returns it as the inner value. Thus, the whole parseString action will have type Parser LispVal.
 
The <code>$</code> operator is infix function application: it's the same as if we'd written <code>return (String x)</code>, but <code>$</code> is right-associative, letting us eliminate some parentheses. Since <code>$</code> is an operator, you can do anything with it that you'd normally do to a function: pass it around, partially apply it, etc. In this respect, it functions like the Lisp function [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.4 apply].
 
Now let's move on to Scheme variables. An [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-5.html#%_sec_2.1 atom] is a letter or symbol, followed by any number of letters, digits, or symbols:
 
parseAtom :: Parser LispVal
parseAtom = do first &lt;- letter &lt;|&gt; symbol
rest &lt;- many (letter &lt;|&gt; digit &lt;|&gt; symbol)
let atom = first:rest
return $ case atom of
"#t" -&gt; Bool True
"#f" -&gt; Bool False
_ -&gt; Atom atom
 
Here, we introduce another Parsec combinator, the choice operator [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#or &lt;|&gt;]. This tries the first parser, then if it fails, tries the second. If either succeeds, then it returns the value returned by that parser. The first parser must fail before it consumes any input: we'll see later how to implement backtracking.
 
Once we've read the first character and the rest of the atom, we need to put them together. The "let" statement defines a new variable <code>atom</code>. We use the list cons operator <code>:</code> for this. Instead of <code>:</code>, we could have used the concatenation operator <code>++</code> like this <code>[first]++rest</code>; recall that <code>first</code> is just a single character, so we convert it into a singleton list by putting brackets around it.
 
Then we use a case expression to determine which <code>LispVal</code> to create and return, matching against the literal strings for true and false. The underscore <code>_</code> alternative is a readability trick: case blocks continue until a <code>_</code> case (or fail any case which also causes the failure of the whole <code>case</code> expression), think of <code>_</code> as a ''wildcard''. So if the code falls through to the <code>_</code> case, it always matches, and returns the value of <code>atom</code>.
 
Finally, we create one more parser, for numbers. This shows one more way of dealing with monadic values:
 
parseNumber :: Parser LispVal
parseNumber = liftM (Number . read) $ many1 digit
 
It's easiest to read this backwards, since both function application (<code>$</code>) and function composition (<code>.</code>) associate to the right. The parsec combinator [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#many1 many1] matches one or more of its argument, so here we're matching one or more digits. We'd like to construct a number <code>LispVal</code> from the resulting string, but we have a few type mismatches. First, we use the built-in function [http://www.haskell.org/onlinereport/standard-prelude.html#$vread read] to convert that string into a number. Then we pass the result to <code>Number</code> to get a <code>LispVal</code>. The function composition operator <code>.</code> creates a function that applies its right argument and then passes the result to the left argument, so we use that to combine the two function applications.
 
Unfortunately, the result of <code>many1 digit</code> is actually a <code>Parser String</code>, so our combined <code>Number . read</code> still can't operate on it. We need a way to tell it to just operate on the value inside the monad, giving us back a <code>Parser LispVal</code>. The standard function <code>liftM</code> does exactly that, so we apply <code>liftM</code> to our <code>Number . read</code> function, and then apply the result of that to our parser.
 
We also have to import the Monad module up at the top of our program to get access to <code>liftM</code>:
 
import Control.Monad
 
This style of programming - relying heavily on function composition, function application, and passing functions to functions - is very common in Haskell code. It often lets you express very complicated algorithms in a single line, breaking down intermediate steps into other functions that can be combined in various ways. Unfortunately, it means that you often have to read Haskell code from right-to-left and keep careful track of the types. We'll be seeing many more examples throughout the rest of the tutorial, so hopefully you'll get pretty comfortable with it.
 
Let's create a parser that accepts either a string, a number, or an atom:
 
parseExpr :: Parser LispVal
parseExpr = parseAtom
&lt;|&gt; parseString
&lt;|&gt; parseNumber
 
And edit readExpr so it calls our new parser:
 
readExpr :: String -&gt; String
readExpr input = case parse <span style="color:red">parseExpr</span> "lisp" input of
Left err -&gt; "No match: " ++ show err
Right _ -&gt; "Found value"
 
Compile and run this code, and you'll notice that it accepts any number, string, or symbol, but not other strings:
 
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [.../code/listing3.3.hs listing3.3.hs]
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "\"this is a string\""
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser 25
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser symbol
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser (symbol)
bash: syntax error near unexpected token `symbol'
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(symbol)"
No match: "lisp" (line 1, column 1):
unexpected "("
expecting letter, "\"" or digit
 
練習問題
# Rewrite parseNumber using
## do-notation
## explicit sequencing with the [http://www.haskell.org/onlinereport/standard-prelude.html#tMonad &gt;&gt;=] operator
# Our strings aren't quite [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.3.5 R5RS compliant], because they don't support escaping of internal quotes within the string. Change parseString so that \" gives a literal quote character instead of terminating the string. You may want to replace <code>noneOf "\""</code> with a new parser action that accepts ''either'' a non-quote character ''or'' a backslash followed by a quote mark.
# Modify the previous exercise to support \n, \r, \t, \\, and any other desired escape characters
# Change parseNumber to support the [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.2.4 Scheme standard for different bases]. You may find the [http://www.haskell.org/onlinereport/numeric.html#sect14 readOct and readHex] functions useful.
# Add a Character constructor to LispVal, and create a parser for [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.3.4 character literals] as described in R5RS.
# Add a Float constructor to LispVal, and support R5RS syntax for [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.2.4 decimals]. The Haskell function [http://www.haskell.org/onlinereport/numeric.html#sect14 readFloat] may be useful.
# Add data types and parsers to support the [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.2.1 full numeric tower] of Scheme numeric types. Haskell has built-in types to represent many of these; check the [http://www.haskell.org/onlinereport/standard-prelude.html#$tNum Prelude]. For the others, you can define compound types that represent eg. a Rational as a numerator and denominator, or a Complex as a real and imaginary part (each itself a Real number).
 
=== Recursive Parsers: Adding lists, dotted lists, and quoted datums ===
 
Next, we add a few more parser actions to our interpreter. Start with the parenthesized lists that make Lisp famous:
 
parseList :: Parser LispVal
parseList = liftM List $ sepBy parseExpr spaces
 
This works analogously to <code>parseNumber</code>, first parsing a series of expressions separated by whitespace (<code>sepBy parseExpr spaces</code>) and then apply the List constructor to it within the Parser monad. Note too that we can pass <code>parseExpr</code> to [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#sepBy sepBy], even though it's an action we wrote ourselves.
 
The dotted-list parser is somewhat more complex, but still uses only concepts that we're already familiar with:
 
parseDottedList :: Parser LispVal
parseDottedList = do
head &lt;- endBy parseExpr spaces
tail &lt;- char '.' &gt;&gt; spaces &gt;&gt; parseExpr
return $ DottedList head tail
 
Note how we can sequence together a series of Parser actions with <code>&gt;&gt;</code> and then use the whole sequence on the right hand side of a do-statement. The expression <code>char '.' &gt;&gt; spaces</code> returns a <code>Parser ()</code>, then combining that with <code>parseExpr</code> gives a <code>Parser LispVal</code>, exactly the type we need for the do-block.
 
Next, let's add support for the single-quote syntactic sugar of Scheme:
 
parseQuoted :: Parser LispVal
parseQuoted = do
<nowiki>char '\''
x &lt;- parseExpr
return $ List [Atom "quote", x]</nowiki>
 
Most of this is fairly familiar stuff: it reads a single quote character, reads an expression and binds it to <code>x</code>, and then returns <code>(quote x)</code>, to use Scheme notation. The <code>Atom</code> constructor works like an ordinary function: you pass it the String you're encapsulating, and it gives you back a LispVal. You can do anything with this LispVal that you normally could, like put it in a list.
 
Finally, edit our definition of parseExpr to include our new parsers:
 
parseExpr :: Parser LispVal
parseExpr = parseAtom
&lt;|&gt; parseString
&lt;|&gt; parseNumber
<span style="color:red">&lt;|&gt; parseQuoted
&lt;|&gt; do char '('
x &lt;- try parseList &lt;|&gt; parseDottedList
char ')'
return x</span>
 
This illustrates one last feature of Parsec: backtracking. <code>parseList</code> and <code>parseDottedList</code> recognize identical strings up to the dot; this breaks the requirement that a choice alternative may not consume any input before failing. The [http://www.cs.uu.nl/~daan/download/parsec/parsec.html#try try] combinator attempts to run the specified parser, but if it fails, it backs up to the previous state. This lets you use it in a choice alternative without interfering with the other alternative.
 
Compile and run this code:
 
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.4.hs listing3.4.hs]
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a test)"
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a (nested) test)"
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a (dotted . list) test)"
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a '(quoted (dotted . list)) test)"
Found value
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a '(imbalanced parens)"
No match: "lisp" (line 1, column 24):
unexpected end of input
expecting space or ")"
 
Note that by referring to parseExpr within our parsers, we can nest them arbitrarily deep. Thus, we get a full Lisp reader with only a few definitions. That's the power of recursion.
 
練習問題
# Add support for the [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-7.html#%_sec_4.2.6 backquote] syntactic sugar: the Scheme standard details what it should expand into (quasiquote/unquote).
# Add support for [http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html#%_sec_6.3.6 vectors]. The Haskell representation is up to you: GHC does have an [http://www.haskell.org/ghc/docs/latest/html/libraries/array-0.3.0.0/Data-Array.html Array] data type, but it can be difficult to use. Strictly speaking, a vector should have constant-time indexing and updating, but destructive update in a purely functional language is difficult. You may have a better idea how to do this after the section on set!, later in this tutorial.
# Instead of using the try combinator, left-factor the grammar so that the common subsequence is its own parser. You should end up with a parser that matches a string of expressions, and one that matches either nothing or a dot and a single expressions. Combining the return values of these into either a List or a DottedList is left as a (somewhat tricky) exercise for the reader: you may want to break it out into another helper function.
 
== Evaluation, Part 1 ==
== Error Checking and Exceptions ==