Simpler is Better

February 26, 2010

Still double posting from the new blog.

I was in the middle of writing a post about non-recursive enumerators (see my last post) when I realized it was too much of an uphill battle. While they gave the promise of uniting both request and response bodies under a single interface that allowed easy generation of lazy bytestrings, they were going to be too complicated.

So I’ve broken down and admitted that I will have to have two separate interfaces for these things. It makes sense after all: the requirements for a server spitting out a response body are quite different than an application reading a request body.

So I think it’s safe to declare the winner on the response body side to be the recursive enumerator (henceforth known as enumerator). In addition, to deal with a number of compile type issues, I’ve added a newtype, so that the definition of Enumerator is:

newtype Enumerator = Enumerator { runEnumerator :: forall a.
              (a -> B.ByteString -> IO (Either a a))
                 -> a
                 -> IO (Either a a)
}

I know it looks scary, but it’s not really that bad: first argument is an iteratee and the second is the initial seed. Don’t worry, they wai repository has some good examples of how to use it in the Network.Wai.Enumerator module. You can also check out the wai-extra repository.

Without rehashing the discussion from the last post, I mentioned what I called a “Source”, but complained that it required the use of a MVar. I also alluded to an alternate definition that did away with the MVar, instead allowing explicit state passing. That’s what’s currently in the WAI. The definition is:

data Source = forall a. Source a (a -> IO (Maybe (B.ByteString, a)))

The Source constructor has two pieces: the “a” and that ugly function. The “a” is the initial state of the Source. One example (used by both SimpleServer and CGI in wai-extra) for that “a” is the value of the Content-Length header.

The second piece (the ugly function) takes some state, and then gives you the next piece of the request body and a new state, if they’re available. Back to the previous example, if given a value of 0, it knows that the request body has been entirely consumed and returns Nothing. Otherwise, it takes another chunk off the request body, and returns that chunk with the remaining length.

I think this is by far the simplest approach to program to. I was reluctant to introduce it since it involves two different interfaces, but at this point I see no better alternative. If anyone is actually interested in why I’m rejecting non-recursive enumerators, feel free to ask.

At this point, I think the WAI is ready to be released. I’ll wait a week for comments, and then put it on Hackage.

Four HTTP Request Body Interfaces (repost)

February 21, 2010

Looks like Planet Haskell is still linking to my old blog, so here’s a repost from the new blog:

Sorry for the long delay since my last post, I’ve actually been working on a project recently. It should be released last week, but since it’s a web application programmed in Haskell, it’s given me a chance to do some real-world testing of WAI, wai-extra (my collection of handlers and middlewares) and Yesod (my web framework). None of these have been released yet, though that date is fast approaching.

Anyway, back to the topic at hand: request body interface. I’m going to skip over the response body for now because, frankly, I think it’s less contraversial: enumerators seem to be a good fit. What flavor of enumerator could be a good question, but I’d rather figure out what works best on the request side and then choose something that matches nicely.

I’ve been evaluating the choices in order to decide what to use in the WAI. In order to get a good comparison of the options, let’s start off by stating our goals:

  • Performant. The main goal of the WAI is not user-friendliness, but to be the most efficient abstraction over different servers possible.
  • Safe. You’ll see below some examples of being unsafe.
  • Determinstic. We want to make sure that we are never forced to use more than a certain amount of memory.
  • Early termination. We shouldn’t be forced to read the entire contents of a long body, as this could open up DoS attacks.
  • Simple. Although user-friendliness isn’t the first goal, it’s still something to consider.
  • Convertible. In particular, many people will be most comfortable (application-side) using cursors and lazy bytestrings, so we’d like an interface that can be converted to those.

One other point: we’re not going to bother considering anything but bytestrings here. I think the reasons for this are obvious.

Lazy bytestring

This is the approach currently used by Hack, which is pretty well used and accepted by the community (including myself).

Pros

  • Allows both the server and client to write simple code.
  • Lots of tools to support it in standard libraries.
  • Mostly space efficient.

Cons

  • I said mostly space efficient, because you only get the space efficiency if you use lazy I/O. Lazy I/O is also known as unsafeInterleaveIO. Remember that concern about safety I mentioned above? This is it.
  • Besides that, lazy I/O is non-deterministic.

In fact, avoiding lazy I/O is the main impetus for writing the WAI. I don’t consider this a possible solution.

Source

The inspiration for this approach is- frankly- every imperative IO library on the planet. Think of Handle: you have functions to open the handle, close it, test if there’s more data (isEOF) and to get more data. In our case, there’s no need for the first two (the server performs them before and after calling the application, respectively), so we can actually get away with this definition:

type Source = IO (Maybe ByteString) -- a strict bytestring

Each time you call Source, it will return the next bytestring in the request, until the end, where a Nothing is returned.

Pros

  • Simple and standard.
  • Deterministic.
  • Space efficient.

Cons

  • This makes the server the callee, not the caller. In general, it is more difficult to write callees, though in the particular case of a server I’m not certain how much more difficult it really is.
  • This provides no mechanism for the server to keep state (eg, bytes read so far).

Overall, this is a pretty good approach nonetheless. Also, at the cost of complicating things a bit, we could redefine Source as:

type Source a = a -> IO (Maybe (a, ByteString))

This would solve the second of the problems above by forcing the application to thread the state through.

Recursive enumerator

The idea for the recursive enumerator comes from a few sources, but I’ll cite Hyena for the moment. The idea takes a little bit of time to wrap your mind around, and it doesn’t help that there are many definitions of enumerators and iteratees with slightly different definitions. Here I will present a very specialized version of an enumerator, which should hopefully be easier to follow.

You might be wondering: what’s a recursive enumerator? Just ignore the word for now, it will make sense when we discuss the non-recursive variant below.

Anyway, let’s dive right in:

-- Note: this is a strict byte string
type Enumerator a = (a -> ByteString -> IO (Either a a))
                  -> a
                  -> IO (Either a a)

I appologize in advance for having slightly complicated this type from its usual form by making the return type IO (Either a a) instead of IO a, but it has some real world uses. I know it’s possible to achieve the same result with the latter definition, but it’s slightly more work. I’m not opposed to switching back to the former if there’s enough desire.

So what exactly does this mean? An Enumerator is a data producer. When you call the enumerator, it’s going to start handing off one bytestring at a time to the iteratee.

The iteratee is the first argument to the enumerator. It is a data consumer. To put it more directly: the application will be writing an iteratee which receives the raw request body and generates something with it, most likely a list of POST parameters.

So what’s that a? It has a few names: accumulator, seed, or state. That’s the way the iteratee is able to keep track of what it’s doing. Each step along the way, the enumerator will collect the result of the iteratee and pass it in next time around.

And finally, what’s going on with that Either? That’s what allows us to have early termination. If the iteratee returns a Left value, it’s a signal to the enumerator to stop processing data. A Right means to keep going. Similarly, when the enumerator finishes, it returns a Left to indicate that the iteratee requested early termination, and Right to indicate that all input was consumed.

To give a motivating example, here’s a function that converts an enumerator into a lazy bytestring. Two things: firstly, this function is not written efficiently, it’s meant to be easy to follow. More importantly, this lazy bytestring is not exactly lazy: the entire value must be read into memory. If we were two convert this in reality to a lazy bytestring, we would want to use lazy IO so reduce memory footprint. However, as Nicolas Pouillard pointed out to me, the only way to do this involes forkIO.

import Network.Wai
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import Control.Applicative

type Iteratee a = a -> S.ByteString -> IO (Either a a)

toLBS :: Enumerator [S.ByteString] -> IO L.ByteString
toLBS e = L.fromChunks . reverse . either id id <$> e iter [] where
    iter :: Iteratee [S.ByteString]
    iter bs b = return $ Right $ b : bs

As this post is already longer than I’d hoped for, I’ll skip an explanation and to pros/cons:

Pros

  • Space efficient and deterministic.
  • Server is the caller, makes it easier to write.
  • No need for IORef/MVar at all.

Cons

  • Application is the callee, which is more difficult to write. However, this can be mitigated by having a single package which does POST parsing from an enumerator.
  • Cannot be (simply) translated into a source or lazy bytestring. Unless someone can show otherwise, you need to start the enumerator is a separate thread and then use MVars or Chans to pass the information back. On top of that, you then need to be certain to use up all input, or else you will have a permanently locked thread.

While I think this is a great interface for the response body, and I’ve already implemented working code on top of this, I’m beginning to think we should reconsider going this route.

Non-recursive enumerator

The inspiration for this approach comes directly from a paper by Oleg. I found it easier to understand what was going on once I specialized the types Oleg presents, so I will be doing the same here. I will also do a little bit of renaming, so appologies in advance.

The basic distinction between this and a recursive enumerator is that the latter calls itself after calling the iteratee, while the former is given a function to call.

I’m not going to go into a full discussion of this here, but I hope to make another post soon explaining exactly what’s going on (and perhaps deal with some of the cons).

type Enumerator a = RecEnumerator a -> RecEnumerator a
type RecEnumerator a = Iteratee a -> a -> IO (Either a a)
type Iteratee a = a -> B.ByteString -> IO (Either a a)

Pros

  • Allows creation of the source (Oleg calls it a cursor) interface- and thus lazy byte string- without forkIO.
  • Space efficient and deterministic.

Cons

  • I think it’s significantly more complicated than the other approaches, though that could just be the novelty of it.
  • It still requires use of an IORef/MVar to track state. I have an idea of how to implement this without that, but it is significantly more complex.

Conclusion

Well, the conclusion for me is I’m beginning to lean back towards the Source interface. It’s especially tempting to try out the source variant I mention, since that would eliminate the need for IORef/MVar. I’d be interested to hear what others have to say though.

New blog address

January 10, 2010

Hey all,

I’ve got a new blog address at:

http://www.snoyman.com/blog/

If you’ve been reading this from Planet Haskell, I’ve requested of them to update the feed address, so in a few days (hopefully) you’ll be getting all my updates.

mod_rewrite, subfolders and trailing slashes

December 30, 2009

I’m posting this because I spent many hours debugging this issue and couldn’t find a single site on the internet that provided a solution. I’m not thrilled with the answer I have, so if anyone knows of a better way to do this, I’m all ears.

The goal

I’m currently starting to host multiple web applications on a single domain in separate subfolders. For example, wordify should be served from http://www.snoyman.com/wordify/. Since I’m using Apache on my host, I need to use mod_rewrite to accomplish this. Anyone who has ever dealt with mod_rewrite probably has the same fear of it that I do. However, I’m purposely not going to go into a mod_rewrite rant.

Anyway, the mod_rewrite needs to make an internal redirect (meaning not changing the user’s URL via a 301 redirect) to the CGI script. For example, I want a request to http://www.snoyman.com/wordify/toword/57/ to be treated by Apache as http://www.snoyman.com/wordify/dispatch.cgi/toword/57/. No problem.

The problem

My initial solution gave a problem when people went to http://www.snoyman.com/wordify. In particular, it would redirect them to a URL based on the absolute filename on the system of the request (eg, http://www.snoyman.com/f5/snoyman/public/&#8230;). My temporary solution was to set up a 301 redirect from that ugly URL to the correct location, but that’s not an efficient solution since it requires a whole extra round trip for the user to get to the page.

It turns out that mod_rewrite does funny stuff with converting URLs to pathnames, and there’s no way to specify which type of value you would like to deal with.

The solution

Below is my new .htaccess file. Here’s what you need to know to follow this and adapt it to your own purposes:

  • The root of my website is /f5/snoyman/public
  • All of the wordify files are kept in /f5/snoyman/public/wordify
  • The CGI program for generating the content is called dispatch.cgi
Options +ExecCGI
AddHandler cgi-script .cgi

Options +FollowSymlinks

RewriteEngine On
RewriteRule ^/f5/snoyman/public/wordify$ /wordify/ [R=301,S=1]
RewriteCond $1 !^dispatch.cgi
RewriteRule ^(.*) dispatch.cgi/$1 [L]

A few notes:

  • The first two lines are purely for CGI purposes. You might not need them, or may wish to use different file extensions.
  • The follow symlinks is not needed for the rewrite purposes.
  • The first RewriteRule line does the trailing-slash addition. It specifies R=301 so that the user receives a 301 permanent redirect. The S=1 is a skip option so that the following rule is not executed.
  • The RewriteCond applies to the second RewriteRule, and makes sure we don’t end up with an infinite redirect loop. Remember, mod_rewrite will reloop through all your rules each time one of the rules makes a change to the URL.
  • I’m not sure exactly what the L option does, but it definitely doesn’t guarantee that a rule is the last one executed. That’s why we need the S on the first rule.

It’s ugly, but it works. The only downside I’d like to address is to disallow access from http://www.snoyman.com/wordify/dispatch.cgi/&#8230;, but I’m not too worried about that. Following good RESTful principles would make information available at precisely one URL, but the dispatch.cgi is unlikely to be stumbled upon by mistake.

Yesod RESTful web framework sample

December 21, 2009

Update: the example should now be working (as of 2009-12-22 18:10 UTC). Thanks to Chris and Felipe for the bug reports below.

View the sample being discussed here. For full effect, try with and without Javascript.

I’ve been working for a while on a web framework, previously under the name “restful”, but more recently renamed to Yesod. In future posts, I hope to give a more general overview of the features of this framework, but for now I’m just interested in showing a single code sample.

In order to get a nice test suite going, I’ve been racking up simple example web apps. While trying to think of one, I ran across a post on Happstack, which incidentally had a great sample app: factorials. I’m not trying to compare features of Yesod against Happstack here, just give proper attribution for the idea.

The code is available as part of my github repo. I’ve also converted the code to HTML. The file is well enough documented; the rest of this post will try to point out the features of Yesod that make this demonstration notable.

Multiple representations

This is probably the most important piece. Every web framework that exists can generate an HTML page. The vast majority can also generate JSON. Most of them know to set the content-type header correctly (I hope). Yesod, however, takes the same data and can give it different representations.

The trick is in the Yesod.Rep module, in the HasReps typeclass. Any instance of this typeclass can specify multiple renderings of itself. For example, HtmlObject has both HTML and JSON representations (more are possible, but probably unnecesary). You can wrap an HtmlObject with a TemplateFile and then have the data displayed nicely with a HStringTemplate template. To top it all off: HtmlObject handles all the entity encodings for you, so no more cross-site scripting attacks (exaggeration, I know).

Simplified routes

I was always annoyed when using Django that I specified my routes using regexs. There’s no need. I’ve never seen a webapp that did something beyond breaking up pieces across slashes and routing based on that. To get really fancy, you can accept only digits for one of the path pieces.

If you look in the code, you’ll see what looks like a quasi-quoted YAML file. Well, that’s exactly what it is. Yesod includes some Template Haskell to use this YAML file to generate a completely compile-time checked set of routes. It guarantees:

  • No overlapping routes exist.
  • Within each route, there are not duplicate handlers for each verb (request method).
  • Each specified handler takes the right arguments. For example, the resource path “/user/#userid/variable/$varname/” would require a function that takes an Int (for the #userid) and String (for the $varname).

There is also a version of the TH function which does not check for overlapping patterns.

Swappable backends

This example uses the hack-handler-simpleserver so it can be easily tested on a local system without running a web server. However, swap that for hack-handler-cgi, and you’ve got a CGI program. In fact, it will work with any Hack handler.

Various features

There’s a bunch of features in use here under the surface, such as automatic URL cleanup (trailing slashes and the like), JSON-P support, etc. There’s even more power not being used: OpenID authentication, client-side encrypted session data, request method override, etc. These will all be documented before release.

Conclusion

Yesod has been in development for quite a while now (over a year I believe). It’s the core for a few of my sites (photoblog is the largest), and is rapidly approaching its first release. It’s been on hold while some of its underlying libraries matured (failure, attempt and data-object). However, if you’re interested in building Ajax sites following RESTful principles, it could very well be the framework you’re looking for.

data-object family

December 17, 2009

A big thanks to Nicolas Pouillard, who co-authored data-object (as well as some of the underlying libraries like attempt) for coming up with many of the great ideas here.

Introduction

Before you get worried, this has nothing to do with object-oriented. The term “object” here refers to a JSON object, which basically means a data type which can represent three things:

  • Scalars
  • Sequences (or lists)
  • Mappings (or dictionaries)

This format happens to be an incredibly useful things, and the goal of data-object is to provide the Object data type in one place where other libraries can use it, and thus easily exchange data with other libraries. So far, this library has been used for:

  • data-object-json: a wrapper around json-b for JSON parsing/emitting
  • data-object-yaml: a binding to the libyaml C library. (Note: the C source code is included in the package, so you don’t need to have it installed separately on your system.)
  • json2yaml: a simple utility program for converting JSON to YAML files (I was shocked that I couldn’t find something like this elsewhere).
  • It is also playing a prominent role in the Yesod web framework to provide such features as automatic string escaping, JSON output and interfacing with HStringTemplate

Hopefully that gives you an idea that this library is useful. Before rolling your own data type to do basically the same thing, please consider using this library instead.

Overview of design choices

The datatype itself is incredibly simple; the important points are what go along with it.

  • The Object datatype is polymorphic in both the key and value. You can make String->String objects, Int->String, or anything else you like.
  • This library depends on convertible-text, which provides generic conversion type classes.
  • There is a template haskell function included to automatically generate a number of instances.
  • There are three specific aliases provided for Object in their own modules: TextObject, StringObject and ScalarObject.

What, no code samples?

Sorry, not this time. If you want to see example code that uses the data-object library, I recommend data-object-json and json2yaml (data-object-yaml has a lot of C library cruft).

Also, this library is still young, so I’m very much open to suggestions.

Two language extensions

December 7, 2009

Below are my ideas for two languages extensions which I think add a lot to the Haskell language without adding too much ambiguity. At least, I haven’t found any issues with the ideas so far, but I’m sure plenty of other people will be able to ;).

AutomaticClassSynonyms

Let’s say I’ve got the Failure class, and I’d like to define a MonadFailure class- simply for convenience- which is a subclass of both Failure and Monad. Well, defining the class is easy:

class (Monad m, Failure m) => MonadFailure m

I believe that this should automatically make anything which is an instance of both Monad and Failure an instance of MonadFailure, since the definition of MonadFailure is completely empty. I look at class instances as needing to address two issues:

  • Existence: is there some instance which makes sense?
  • Uniqueness: of those instances which make sense, which one should I use?

Here, there is no room for ambiguity: there exists an instance which makes sense (eg, instance MonadFailure Maybe), and there is precisely one instance which makes sense. There is no alternative way to define this instance.

Therefore, I think that in this case we should not complain if we have two instances for the same data type, since we know that the instances will be identical. That would make this extension work very nicely with existing code. It also adds no new syntax.

What I didn’t say

I specifically do not think this extension should make automatic instances of classes which have default definitions for all its functions. The first example that comes to mind is Exception: even though both fromException and toException have default definitions, I think the user should still have to explicitly instanciate exception, even if a type is already an instance of Typeable and Show.

SubClassOverloading

This extension is a bit more complicated. For motivation, let’s look at the interaction between Monad and Applicative. For most cases, a Monad can define an Applicative instance as such:

instance Functor MyMonad where
  fmap = liftM
instance Applicative MyMonad where
  pure = return
  (<*>) = ap

Well, that’s irritating! Instead of just writing a five line Monad instance, I have to write five extra boilerplate lines.

As a separate issue, Applicative is not defined as a superclass of Monad, and therefore I cannot treat all Monads as Applicatives. But we can’t add that superclass requirement without breaking existing code.

So I say we allow the definition of Monad as such:

class Applicative m => Monad m where
  fail s :: s -> m a -- or we could just take this out...
  (>>=) :: m a -> (a -> m b) -> m b -- the same
  return :: a -> m a -- also the same
  fmap = liftM -- a default definition for a superclass function
  pure = return
  (<*>) = ap

And suddenly all Monads are Applicative! Since every function in the Functor and Applicative classes is a given default definition in Monad, they can be automatically derived.

But what if you want to define a special version of fmap? Simple: do it like always! The definition in Monad is merely the default; if the compiler finds a separate instance for your data type, it uses that instead. This way, old code still works without a hitch.

The downside

The only downside I can see is that suddenly you’ll have instances of classes where before there were none. Not that having Applicative instances in and of itself is a downside, but there might be cases where it would define inappropriate instances (not that I can think of any off-hand). On the other hand, this would be mitigated slightly by the requirement of the type-class author to explicitly turn on this flag.

Let the beatings begin

Well, this is my first time suggesting any changes to Haskell, so I expect to be thoroughly scolded for my perposterous, heratical notions. Even if these suggestions are lacking, however, I hope we eventually get something which allows these kinds of features in Haskell.

String-like

December 4, 2009

While working on a type-safe method for embedding HTML fragments, I was reminded of some of my annoyances with my web-encodings package. In particular, I hated how I was doing all of these automatic conversions from and to lazy bytestrings, strict bytestrings and strings (ie [Char]). It’s always caused me a few headaches:

  • Often times I need to explicitly set types with a type signature.
  • I know that I’m needlessly wasting cycles.
  • There is more than one way to convert between a String and a ByteString; in particular, Latin-1 encodings (ie, Data.ByteString.Char8.pack) versus UTF-8.

I decided that it would be a good idea to provide these functions for all string-like data types. In addition to strings, strict bytestrings and lazy bytestrings, I also want to support strict and lazy text. My first idea was to provide five different modules in web-encodings. I did not relish the thought of writing it, much less maintaining it.

class StringLike

Then I had an idea. When doing html escaping, for example, all I really need to do is call “concatMap escapeHtmlChar”, where escapeHtmlChar might look like:

escapeHtmlChar '<' = "&lt;"
escapeHtmlChar '>' = "&gt;"
...
escapeHtmlChar c = [c]

I could obviously write 5 versions of the escapeHtml function, each calling a specialized version of concatMap. In fact, it’s very simple to do so: all five data types involved provide a concatMap function. I might need a little tweaking for packing at some points, but it’s very simple.

But of course I still didn’t want to have five functions. So I decided to create the “StringLike” typeclass. It looks something like this:

class StringLike a where
    head :: a -> Char
    tail :: a -> a
    lengthLT :: Int -> a -> Bool
    concatMap :: (Char -> String) -> a -> a
    ... (many more function)

As simple as this looks, there are a few things to note:

  • The basic type is always a Char. This means that we are treating bytestrings as if they are encoded in Latin-1.
  • Based on a suggestion by Daniel Fischer, there is no length function. Instead, there are length comparison functions, which is probably what’s needed in general.
  • There’s a fine line of when to use String and when to use the type itself. For example, I think the first argument to concatMap should be a function returning a String, not the specific type. tail should most definitely return the type itself. But there are some corner cases, such as the isPrefixOf function.

You can see the whole StringLike typeclass on github.

The ugly

Well, since my functions (encodeHtml, decodeUrl, etc) are still dealing with type classes instead of concrete values, I might still need an occasional type signature to get it to work. However, since there’s only one type involved, it should be much easier. For example, stringing together a number of these functions is completely unambiguous.

Also, I’ve lost the ability to pattern match strings. Instead, I must manually check the length and use head and tail functions. This is made most clear by the decodeUrl function. I have a feeling view patterns might be of assistance here, but I haven’t looked into it yet.

Useful?

I’m curious if the community would find this useful as a standalone package. If I were to release it, it would probably be two modules:

  • Data.StringLike would simply be the basic operations any string-like type should provide.
  • Data.StringLike.Extra would be higher-level functions built on top of this. Most likely, it would all go in a typeclass so individual types could provide more efficient versions of specific functions.

Look forward to hearing some opinions on this.

From Dreamhost to NearlyFreeSpeech.net

November 29, 2009

Well, I’d toyed around with the idea for quite a while, but when SSH when down for a few days at Dreamhost, I decided it was time to finally make the switch.

The easy part was moving off static sites and getting my personally hosted blog onto wordpress.com. Then of course is the real challenge: a Haskell site.

First shot: compile on the server

Since NearlyFreeSpeech.net (henceforce NFS) claims support for GHC, I thought I’d try out compiling on their servers. My first issue was that they only have 6.8.3, whereas some of my libraries require 6.10. I e-mailed them about this, and they let me know that their unstable server had it available. Switching over was painless.

However, I had a few problems with this approach:

  1. cabal-install would not link due to memory constraints. I can manually install all the libraries I need, but that’s a real pain.
  2. All the files I ended up using used up 250MB. NFS charges per megabyte of storage, so I didn’t feel like wasting time.
  3. As usual with shared hosting, compiling is slow. It’s not as horrible as Dreamhost, where they kill the compiles regularly, but still not as nice as using my shiny new system at home.

Upload binaries

So I can of course just compile my binaries locally and upload them, right? Well, I don’t happen to run a FreeBSD box. I’ve been itching to try out VirtualBox for a while though, and this seemed like a good time to do it.

I know what you’re thinking: I’m a masochist, and this is overkill for this kind of project. However, setting this up didn’t actually require too much work, and it’s a much more durable solution than trying to compile binaries on some flaky shared host (looking at you again Dreamhost).

Anyway, the process was very straightforward:

  1. Download the FreeBSD 7.2 ISO (NFS beta realm runs 7.2).
  2. Install VirtualBox locally.
  3. Install FreeBSD. It’s not too complicated. But make sure you set aside enough hard disk and RAM.
  4. Update ports collection. This was the worst part for me, since I’ve never used FreeBSD before. Also, I tried a selective update at first: bad idea. Just update the whole thing.
  5. Install ghc. Basically, “cd /usr/ports/lang/ghc && make install”. It takes a *long* time as it installs everything.
  6. Compile the binaries inside FreeBSD. I used git/ssh to transfer to projects over to the virtual machine, which was very convenient.
  7. Upload binaries and call it a day.

Other notes

I have opted not to use NFS for my large static file hosting. I’m using the Amazon S3 service, which so far I’ve found to be much faster than Dreamhost. It’s a little tricky to get started though. I was able to sync all my photos using s3sync.

Also, don’t forget to strip your binaries before uploading them, it can save a lot of time.

NFS only supports CGI. This is fine for my purposes, but others may not be so happy with it.

Finally, NFS recently started charging $0.01/day for dynamic sites. It’s only $3.65 a year, but if you’re like me and like to have lots of different sites running, it might add up. I’ll probably just end up running services under the same domain name instead of separate subdomains.

Conclusion

Well, I won’t give my full stamp of approval on this yet, but so far I’m impressed. I’ll try to post some follow-up on this in the future.

Introduction to attempt error reporting library

October 25, 2009

I’ve just released the attempt package on hackage. It is meant to address the issue of error handling, which is currently rather ad-hoc in Haskell. It’s my hope that by putting it in its own package, we can start to standardize between packages and get some nice composable error handling between packages.

The library is built on extensible exceptions to give users the ability to return more complex exception values than is afforded by either a Maybe or (Either String). It’s similar to an (Either SomeException), but provides many class instances, a monad transformer and helper functions.

Below is an HTML version of the literate Haskell example file that is in the attempt repository. Hopefully it will give you a running start on how to use it.

This library should be considered unstable, in that the API is still open to change. As such, I’d appreciate any feedback people have.


This file is an example of how to use the attempt library, as literate
Haskell. We’ll start off with some import statements.

> {-# LANGUAGE DeriveDataTypeable #-}
> {-# LANGUAGE ExistentialQuantification #-}
> import Data.Attempt
> import Control.Monad.Attempt
> import qualified Data.Attempt.Helper as A
> import System.Environment (getArgs)
> import Safe (readMay)
> import Data.Generics
> import qualified Control.Exception as E

We’re going to deal with a very simplistic example. Let’s say you have some
text files that need processing. The files are each three lines long. The
first and last line are integers; the second is a mathematical operator (one
of +, -, * and /). Your goal with each file is to simply perform the
mathematical operator on the two numbers. Let’s start with the Operator data
type.

> data Operator = Add | Sub | Mul | Div
> instance Read Operator where
>   readsPrec _ "+" = [(Add, "")]
>   readsPrec _ "-" = [(Sub, "")]
>   readsPrec _ "*" = [(Mul, "")]
>   readsPrec _ "/" = [(Div, "")]
>   readsPrec _ s = []
>
> toFunc :: Operator -> Int -> Int -> Int
> toFunc Add = (+)
> toFunc Sub = (-)
> toFunc Mul = (*)
> toFunc Div = div

Nothing special here (besides some sloppy programming). Let’s go ahead and
write the first version of our process function.

> process1 :: FilePath -> IO Int
> process1 filePath = do
>   contents <- readFile filePath -- IO may fail for some reason
>   let [num1S, opS, num2S] = lines contents -- maybe there aren't 3 lines?
>       num1 = read num1S -- read might fail
>       op   = read opS   -- read might fail
>       num2 = read num2S -- read might fail
>   return $ toFunc op num1 num2

If you test this function out on a valid file, it works just fine. But what
happens when you call it with invalid data? In fact, there are five things
which could go wrong that I’d be interested in dealing with in the above code.

So now we need some way to deal with these issues. There’s a few standard ones
in the Haskell toolbelt:

  1. Wrap the response in Maybe. Disadvantage: can’t give any indication what he
    error was.
  2. Wrap the response in an Either String. Disadvantage: error type is simply a
    string, which isn’t necesarily very informative. Also, Either is not defined
    by the standard library to be a Monad, making this type of processing clumsy.
  3. Wrap in a more exotic Either SomeException or some such. Disadvantage:
    still not a Monad.
  4. Declare your own error type. Disadvantage: ad-hoc, and makes it very
    difficult to compose different libraries together.

In steps the attempt library. It’s essentially option 4 wrapped in a library
for general consumption. Features include:

  1. Uses extensible exceptions so you can report whatever information you want.
  2. Exceptions are not explicitly typed, so you don’t need to wrap insanely
    long function signatures to explain what exceptions you might be throwing.
  3. Defines all the standard instances you want, including providing a monad
    transformers.

    1. Attempt is a Monad.
    2. There is a Data.Attempt.Helper module which provides a special read
      function.
  4. Let’s transform the above example to use the attempt library in its most basic
    form:

    > data ProcessError = NotThreeLines String | NotInt String | NotOperator String
    >   deriving (Show, Typeable)
    > instance E.Exception ProcessError
    >
    > process2 :: FilePath -> IO (Attempt Int)
    > process2 filePath =
    >   E.handle (\e -> return $ Failure (e :: E.IOException)) $ do
    >       contents <- readFile filePath
    >       return $ case lines contents of
    >           [num1S, opS, num2S] ->
    >               case readMay num1S of
    >                   Just num1 ->
    >                       case readMay opS of
    >                           Just op ->
    >                               case readMay num2S of
    >                                   Just num2 -> Success $ toFunc op num1 num2
    >                                   Nothing -> Failure $ NotInt num2S
    >                           Nothing -> Failure $ NotOperator opS
    >                   Nothing -> Failure $ NotInt num1S
    >           _ -> Failure $ NotThreeLines contents

    If you run these on the sample files in the input directory, you’ll see that
    we’re getting the right result; the program in not erroring out, simply
    returning a failure message. However, this wasn’t very satisfactory with all of
    those nested case statements. Let’s use two facts to our advantage:

    > data ProcessErrorWrapper =
    >   forall e. E.Exception e => BadIntWrapper e
    >   | forall e. E.Exception e => BadOperatorWrapper e
    >   deriving (Typeable)
    > instance Show ProcessErrorWrapper where
    >   show (BadIntWrapper e) = "BadInt: " ++ show e
    >   show (BadOperatorWrapper e) = "BadOperator: " ++ show e
    > instance E.Exception ProcessErrorWrapper
    > process3 :: FilePath -> IO (Attempt Int)
    > process3 filePath =
    >   E.handle (\e -> return $ Failure (e :: E.IOException)) $ do
    >       contents <- readFile filePath
    >       return $ case lines contents of
    >           [num1S, opS, num2S] -> do
    >               num1 <- wrapFailure BadIntWrapper $ A.read num1S
    >               op   <- wrapFailure BadOperatorWrapper $ A.read opS
    >               num2 <- wrapFailure BadIntWrapper $ A.read num2S
    >               return $ toFunc op num1 num2
    >           _ -> Failure $ NotThreeLines contents

    That certainly cleaned stuff up. The special read function works just as you
    would expected: if the read succeeds, it returns a Success value. Otherwise,
    it returns a Failure.

    But what’s going on with that wrapFailure stuff? This is just to clean up the
    output. The read function will return an exception of type “CouldNotRead”,
    which let’s you know that you failed a read attempt, but doesn’t let you know
    what you were trying to read.

    So far, so good. But that “case lines contents” bit is still a little
    annoying. Let’s get rid of it.

    > process4 :: FilePath -> IO (Attempt Int)
    > process4 filePath =
    >   E.handle (\e -> return $ Failure (e :: E.IOException)) $ do
    >       contents <- readFile filePath
    >       return $ do
    >           let contents' = lines contents
    >           [num1S, opS, num2S] <-
    >               A.assert (length contents' == 3)
    >                        contents'
    >                        (NotThreeLines contents)
    >           num1 <- wrapFailure BadIntWrapper $ A.read num1S
    >           op   <- wrapFailure BadOperatorWrapper $ A.read opS
    >           num2 <- wrapFailure BadIntWrapper $ A.read num2S
    >           return $ toFunc op num1 num2

    There’s unfortunately no simple way to catch pattern match fails, but an
    assertion works almost as well. The only thing which is still a bit irksome is
    the whole exception handling business. Let’s be rid of that next.

    > process5 :: FilePath -> AttemptT IO Int
    > process5 filePath = do
    >   contents <- A.readFile filePath
    >   let contents' = lines contents
    >   [num1S, opS, num2S] <-
    >       A.assert (length contents' == 3)
    >                contents'
    >                (NotThreeLines contents)
    >   num1 <- wrapFailure BadIntWrapper $ A.read num1S
    >   op   <- wrapFailure BadOperatorWrapper $ A.read opS
    >   num2 <- wrapFailure BadIntWrapper $ A.read num2S
    >   return $ toFunc op num1 num2

    There’s a built-in readFile function that handles all that handling of error
    garbage for you. If you compare this version of the function to the first, you
    should notice that it’s very similar. You can avoid a lot of the common
    sources of runtime errors by simply replacing unsafe functions (Prelude.read)
    with safe ones (Data.Attempt.Helper.read).

    However, there’s still one other different between process5 and process2-4:
    the return type. process2-4 return (IO (Attempt Int)), while process5 returns
    an (AttemptT IO Int). This is the monad transformer version of Attempt; read
    the documentation for more details. To get back to the same old return type as
    before:

    > process6 :: FilePath -> IO (Attempt Int)
    > process6 = runAttemptT . process5

    Below is a simple main function for testing out these various functions. Try
    them out on the files in the input directory. Also, to simulate an IO error,
    call them on a non-existant file.

    > main = do
    >   args <- getArgs
    >   if length args /= 2
    >       then error "Usage: Example.lhs <process> <file path>"
    >       else return ()
    >   let [processNum, filePath] = args
    >   case processNum of
    >       "1" -> process1 filePath >>= print
    >       "2" -> process2 filePath >>= print
    >       "3" -> process3 filePath >>= print
    >       "4" -> process4 filePath >>= print
    >       "5" -> runAttemptT (process5 filePath) >>= print
    >       "6" -> process6 filePath >>= print
    >       x -> error $ "Invalid process function: " ++ x

Follow

Get every new post delivered to your Inbox.