Archive for the ‘Haskell’ Category

Yesod RESTful web framework sample

December 21, 2009

Update: the example should now be working (as of 2009-12-22 18:10 UTC). Thanks to Chris and Felipe for the bug reports below.

View the sample being discussed here. For full effect, try with and without Javascript.

I’ve been working for a while on a web framework, previously under the name “restful”, but more recently renamed to Yesod. In future posts, I hope to give a more general overview of the features of this framework, but for now I’m just interested in showing a single code sample.

In order to get a nice test suite going, I’ve been racking up simple example web apps. While trying to think of one, I ran across a post on Happstack, which incidentally had a great sample app: factorials. I’m not trying to compare features of Yesod against Happstack here, just give proper attribution for the idea.

The code is available as part of my github repo. I’ve also converted the code to HTML. The file is well enough documented; the rest of this post will try to point out the features of Yesod that make this demonstration notable.

Multiple representations

This is probably the most important piece. Every web framework that exists can generate an HTML page. The vast majority can also generate JSON. Most of them know to set the content-type header correctly (I hope). Yesod, however, takes the same data and can give it different representations.

The trick is in the Yesod.Rep module, in the HasReps typeclass. Any instance of this typeclass can specify multiple renderings of itself. For example, HtmlObject has both HTML and JSON representations (more are possible, but probably unnecesary). You can wrap an HtmlObject with a TemplateFile and then have the data displayed nicely with a HStringTemplate template. To top it all off: HtmlObject handles all the entity encodings for you, so no more cross-site scripting attacks (exaggeration, I know).

Simplified routes

I was always annoyed when using Django that I specified my routes using regexs. There’s no need. I’ve never seen a webapp that did something beyond breaking up pieces across slashes and routing based on that. To get really fancy, you can accept only digits for one of the path pieces.

If you look in the code, you’ll see what looks like a quasi-quoted YAML file. Well, that’s exactly what it is. Yesod includes some Template Haskell to use this YAML file to generate a completely compile-time checked set of routes. It guarantees:

  • No overlapping routes exist.
  • Within each route, there are not duplicate handlers for each verb (request method).
  • Each specified handler takes the right arguments. For example, the resource path “/user/#userid/variable/$varname/” would require a function that takes an Int (for the #userid) and String (for the $varname).

There is also a version of the TH function which does not check for overlapping patterns.

Swappable backends

This example uses the hack-handler-simpleserver so it can be easily tested on a local system without running a web server. However, swap that for hack-handler-cgi, and you’ve got a CGI program. In fact, it will work with any Hack handler.

Various features

There’s a bunch of features in use here under the surface, such as automatic URL cleanup (trailing slashes and the like), JSON-P support, etc. There’s even more power not being used: OpenID authentication, client-side encrypted session data, request method override, etc. These will all be documented before release.

Conclusion

Yesod has been in development for quite a while now (over a year I believe). It’s the core for a few of my sites (photoblog is the largest), and is rapidly approaching its first release. It’s been on hold while some of its underlying libraries matured (failure, attempt and data-object). However, if you’re interested in building Ajax sites following RESTful principles, it could very well be the framework you’re looking for.

Advertisements

Two language extensions

December 7, 2009

Below are my ideas for two languages extensions which I think add a lot to the Haskell language without adding too much ambiguity. At least, I haven’t found any issues with the ideas so far, but I’m sure plenty of other people will be able to ;).

AutomaticClassSynonyms

Let’s say I’ve got the Failure class, and I’d like to define a MonadFailure class- simply for convenience- which is a subclass of both Failure and Monad. Well, defining the class is easy:

class (Monad m, Failure m) => MonadFailure m

I believe that this should automatically make anything which is an instance of both Monad and Failure an instance of MonadFailure, since the definition of MonadFailure is completely empty. I look at class instances as needing to address two issues:

  • Existence: is there some instance which makes sense?
  • Uniqueness: of those instances which make sense, which one should I use?

Here, there is no room for ambiguity: there exists an instance which makes sense (eg, instance MonadFailure Maybe), and there is precisely one instance which makes sense. There is no alternative way to define this instance.

Therefore, I think that in this case we should not complain if we have two instances for the same data type, since we know that the instances will be identical. That would make this extension work very nicely with existing code. It also adds no new syntax.

What I didn’t say

I specifically do not think this extension should make automatic instances of classes which have default definitions for all its functions. The first example that comes to mind is Exception: even though both fromException and toException have default definitions, I think the user should still have to explicitly instanciate exception, even if a type is already an instance of Typeable and Show.

SubClassOverloading

This extension is a bit more complicated. For motivation, let’s look at the interaction between Monad and Applicative. For most cases, a Monad can define an Applicative instance as such:

instance Functor MyMonad where
  fmap = liftM
instance Applicative MyMonad where
  pure = return
  (<*>) = ap

Well, that’s irritating! Instead of just writing a five line Monad instance, I have to write five extra boilerplate lines.

As a separate issue, Applicative is not defined as a superclass of Monad, and therefore I cannot treat all Monads as Applicatives. But we can’t add that superclass requirement without breaking existing code.

So I say we allow the definition of Monad as such:

class Applicative m => Monad m where
  fail s :: s -> m a -- or we could just take this out...
  (>>=) :: m a -> (a -> m b) -> m b -- the same
  return :: a -> m a -- also the same
  fmap = liftM -- a default definition for a superclass function
  pure = return
  (<*>) = ap

And suddenly all Monads are Applicative! Since every function in the Functor and Applicative classes is a given default definition in Monad, they can be automatically derived.

But what if you want to define a special version of fmap? Simple: do it like always! The definition in Monad is merely the default; if the compiler finds a separate instance for your data type, it uses that instead. This way, old code still works without a hitch.

The downside

The only downside I can see is that suddenly you’ll have instances of classes where before there were none. Not that having Applicative instances in and of itself is a downside, but there might be cases where it would define inappropriate instances (not that I can think of any off-hand). On the other hand, this would be mitigated slightly by the requirement of the type-class author to explicitly turn on this flag.

Let the beatings begin

Well, this is my first time suggesting any changes to Haskell, so I expect to be thoroughly scolded for my perposterous, heratical notions. Even if these suggestions are lacking, however, I hope we eventually get something which allows these kinds of features in Haskell.

String-like

December 4, 2009

While working on a type-safe method for embedding HTML fragments, I was reminded of some of my annoyances with my web-encodings package. In particular, I hated how I was doing all of these automatic conversions from and to lazy bytestrings, strict bytestrings and strings (ie [Char]). It’s always caused me a few headaches:

  • Often times I need to explicitly set types with a type signature.
  • I know that I’m needlessly wasting cycles.
  • There is more than one way to convert between a String and a ByteString; in particular, Latin-1 encodings (ie, Data.ByteString.Char8.pack) versus UTF-8.

I decided that it would be a good idea to provide these functions for all string-like data types. In addition to strings, strict bytestrings and lazy bytestrings, I also want to support strict and lazy text. My first idea was to provide five different modules in web-encodings. I did not relish the thought of writing it, much less maintaining it.

class StringLike

Then I had an idea. When doing html escaping, for example, all I really need to do is call “concatMap escapeHtmlChar”, where escapeHtmlChar might look like:

escapeHtmlChar '<' = "&lt;"
escapeHtmlChar '>' = "&gt;"
...
escapeHtmlChar c = [c]

I could obviously write 5 versions of the escapeHtml function, each calling a specialized version of concatMap. In fact, it’s very simple to do so: all five data types involved provide a concatMap function. I might need a little tweaking for packing at some points, but it’s very simple.

But of course I still didn’t want to have five functions. So I decided to create the “StringLike” typeclass. It looks something like this:

class StringLike a where
    head :: a -> Char
    tail :: a -> a
    lengthLT :: Int -> a -> Bool
    concatMap :: (Char -> String) -> a -> a
    ... (many more function)

As simple as this looks, there are a few things to note:

  • The basic type is always a Char. This means that we are treating bytestrings as if they are encoded in Latin-1.
  • Based on a suggestion by Daniel Fischer, there is no length function. Instead, there are length comparison functions, which is probably what’s needed in general.
  • There’s a fine line of when to use String and when to use the type itself. For example, I think the first argument to concatMap should be a function returning a String, not the specific type. tail should most definitely return the type itself. But there are some corner cases, such as the isPrefixOf function.

You can see the whole StringLike typeclass on github.

The ugly

Well, since my functions (encodeHtml, decodeUrl, etc) are still dealing with type classes instead of concrete values, I might still need an occasional type signature to get it to work. However, since there’s only one type involved, it should be much easier. For example, stringing together a number of these functions is completely unambiguous.

Also, I’ve lost the ability to pattern match strings. Instead, I must manually check the length and use head and tail functions. This is made most clear by the decodeUrl function. I have a feeling view patterns might be of assistance here, but I haven’t looked into it yet.

Useful?

I’m curious if the community would find this useful as a standalone package. If I were to release it, it would probably be two modules:

  • Data.StringLike would simply be the basic operations any string-like type should provide.
  • Data.StringLike.Extra would be higher-level functions built on top of this. Most likely, it would all go in a typeclass so individual types could provide more efficient versions of specific functions.

Look forward to hearing some opinions on this.

From Dreamhost to NearlyFreeSpeech.net

November 29, 2009

Well, I’d toyed around with the idea for quite a while, but when SSH when down for a few days at Dreamhost, I decided it was time to finally make the switch.

The easy part was moving off static sites and getting my personally hosted blog onto wordpress.com. Then of course is the real challenge: a Haskell site.

First shot: compile on the server

Since NearlyFreeSpeech.net (henceforce NFS) claims support for GHC, I thought I’d try out compiling on their servers. My first issue was that they only have 6.8.3, whereas some of my libraries require 6.10. I e-mailed them about this, and they let me know that their unstable server had it available. Switching over was painless.

However, I had a few problems with this approach:

  1. cabal-install would not link due to memory constraints. I can manually install all the libraries I need, but that’s a real pain.
  2. All the files I ended up using used up 250MB. NFS charges per megabyte of storage, so I didn’t feel like wasting time.
  3. As usual with shared hosting, compiling is slow. It’s not as horrible as Dreamhost, where they kill the compiles regularly, but still not as nice as using my shiny new system at home.

Upload binaries

So I can of course just compile my binaries locally and upload them, right? Well, I don’t happen to run a FreeBSD box. I’ve been itching to try out VirtualBox for a while though, and this seemed like a good time to do it.

I know what you’re thinking: I’m a masochist, and this is overkill for this kind of project. However, setting this up didn’t actually require too much work, and it’s a much more durable solution than trying to compile binaries on some flaky shared host (looking at you again Dreamhost).

Anyway, the process was very straightforward:

  1. Download the FreeBSD 7.2 ISO (NFS beta realm runs 7.2).
  2. Install VirtualBox locally.
  3. Install FreeBSD. It’s not too complicated. But make sure you set aside enough hard disk and RAM.
  4. Update ports collection. This was the worst part for me, since I’ve never used FreeBSD before. Also, I tried a selective update at first: bad idea. Just update the whole thing.
  5. Install ghc. Basically, “cd /usr/ports/lang/ghc && make install”. It takes a *long* time as it installs everything.
  6. Compile the binaries inside FreeBSD. I used git/ssh to transfer to projects over to the virtual machine, which was very convenient.
  7. Upload binaries and call it a day.

Other notes

I have opted not to use NFS for my large static file hosting. I’m using the Amazon S3 service, which so far I’ve found to be much faster than Dreamhost. It’s a little tricky to get started though. I was able to sync all my photos using s3sync.

Also, don’t forget to strip your binaries before uploading them, it can save a lot of time.

NFS only supports CGI. This is fine for my purposes, but others may not be so happy with it.

Finally, NFS recently started charging $0.01/day for dynamic sites. It’s only $3.65 a year, but if you’re like me and like to have lots of different sites running, it might add up. I’ll probably just end up running services under the same domain name instead of separate subdomains.

Conclusion

Well, I won’t give my full stamp of approval on this yet, but so far I’m impressed. I’ll try to post some follow-up on this in the future.

Restful, data-object changes

September 16, 2009

It’s been a while since I’ve posted. Mostly, I’ve been refactoring my restful library and writing some code that actually uses it. That’s usually the best way to get a better API after all.

I doubt these projects will really interest too many people, but here they are in case you are interested in some real-world Hack and Restful code:

  • review-minder is used to keep track of information I’ve learned (let’s say vocabulary words) and remind me to review them at certain intervals. For coolness, those intervals happen to be the fibonacci sequence.
  • photoblog is the software I use for running my son’s photo blog.

Underlying things about both of these programs:

  • They use my yaml library, and thus also data-object. I switched from JSON because Yaml is more ammenable for version control software.
  • They have Ajax interfaces based on jquery. Photoblog is in particular interesting: it has a javascript-disabled interface available, and uses jquery-history for the dynamic interface (you know, that stuff after the # in the URL).

I made some updates to data-object recently which I consider to be questionable, so I’m not sure it will last. I ran into some Haskell brick-walls when trying to make these changes; hopefully the next post will describe the problem, my current solution, and what I wish Haskell would let me do.

Final note: restful is nowhere near API stable, which is why I haven’t released it to Hackage. If anyone is interested in using it, or has some suggestion, please send them along. I’m currently not rushing this project so I end up with a nice, clean API.

hack-handler-webkit

July 8, 2009

So I had the idea the other day to make it possible to turn Hack web applications into standalone GUIs. I prefer writing web apps to desktop apps with, for example, GTK, so being able to run arbitrary web apps as if they are simply desktop ones would be a big boon for me. Plus, it means you can live the dream of writing an application once and having it run client-server and locally.

I ripped off the sample for the GTK Webkit port, removed some of the features (I decided I didn’t want back/forward buttons or an address bar), wrote some FFI code, combined it all with hack-handler-simpleserver, and created hack-handler-webkit.

Caveats:

  • The code is not incredibly beautiful, especially since I’ve never written FFI code before (hack-handler-fastcgi was just a ripoff of the original fastcgi package).
  • I’ve only tested it on Ubuntu Jaunty. It doesn’t really do much checking, just uses pkg-config to check for the existence of the webkit-1.0 package. I doubt this will work for other distributions.
  • As stated above, it uses the GTK port. I would like to get this working for Windows and Mac as well, without using the GTK port. If anyone would like to fork this on github and add that functionality, it would be much appreciated.

If you’re running Ubuntu and wanted to give this a try, do the following:

  1. apt-get the correct stuff. I think you’ll just need libwebkit-dev (ie, apt-get install libwebkit-dev).
  2. Download hack-handler-webkit from github and install it. It’s not an hackage yet for obvious reasons.
  3. I branched the hack-samples package I spoke about last week to use this webkit backend. Go ahead and try that out.

Comments, suggestions and bug reports are much appreciated!

Hack Introduction

June 28, 2009

There’s been some noise– and confusion– recently about hack. Hopefully this post can address some of the issues.

What it is

Hack is a webserver interface. This means, it defines a protocol for allowing web applications to talk to different web servers. For example, I can write a web application to use the Hack protocol and then easily switch backends from CGI to FastCGI to Happstack.

Hack is authored by Jinjing Wang.

What is isn’t

  • A web server. This is just a protocol for talking to web servers (see handlers later on)
  • A framework. If you’re looking for a Rails replacement, you’re looking in the wrong place. However, if you want to write a Rails replacement, I would recommend Hack as a good base for it.
  • A coffee maker.

Architecture

The architecture is very simple. Hack defines the following:

Env

The Env data type is essentially the request object. It has the query string, the POST body, HTTP headers, etc. Notice I said query string and not get parameters. In an effort to keep the protocol as light weight as possible, there is not query string processing, POST parameter processing, cookie handling, etc handled by Hack. The application must handle it all.

That said, there are a few options:

  • Write all the processing code yourself.
  • Use my web-encodings package, which handles processing of those fields.
  • Use a hack frontend library (see below).
  • Use a framework. None are available right now, but I’m working on a Restful front controller. That’s what I currently use for a few sites.

Response

Response is simply the output of an application for a single request. It is the status code, HTTP headers and body. Remember, we’re talking low level here: you don’t have any high level templates or Haskell-to-Javascript converters at this level. That’s where a framework would come in.

Application

An application is just a “Env -> IO Response”. It takes a single request and generates a response. As a little piece of advice, if you want to have long-running processes (like with FastCGI) and don’t want to have to reload your data every time, use currying! (Hopefully, my next post will be a sample Hack application which will do just that. I appologize for the lack of examples here, but I’m trying to just give an overview.)

Middleware

Some tasks are going to be performed by many applications, and thus it would be a waste to force each application to reimplement that functionality. For example, do you want to have to write gzip compression into every application you write? I thought not. Therefore, middleware just takes an existing application and wraps it with extra functionality. Two notes:

  1. You can use multiple middlewares at once. I use, for example, cleanpath, clientsession, gzip and jsonp.
  2. The order in which you apply these matters. (Again, hopefully more details on this in the next post.)

Handler

A handler is simply a function with the type signature “Application -> IO ()” (or something similar enough). Basically, it’s what “runs” your application. Jinjing has written a number of handlers, but I’m not very familiar with those. I’ve written three which I use on a regular basis, so I’ll describe them here.

hack-handler-cgi

Run your application as a regular old CGI application. If you don’t know about CGI, you probably should do a little more research into web programming before attempting Hack.

hack-handler-fastcgi

Simply wraps up hack-handler-cgi with the FastCGI C library, in the same way that the fastcgi package wraps up cgi.

hack-handler-simpleserver

This is a little standalone HTTP server that I wrote. It is not meant to be production quality. I only use this for debugging purposes (ie, so I don’t have to set up Apache on my local system). Caveat emptor.

Frontend

I wrote a monadcgi frontend for kicks, and now looking at Hackage I see Jinjing also wrote one for happstack. Not being familiar with that package or Happstack, I’ll just address the monadcgi one.

Basically, there has been a CGI library around for a while that defines a CGI monad. There are two problems with this:

  1. Some people (including me) think that the approach chosen for the library is too “object oriented”.
  2. If you write code for this library, you’re stuck with CGI (or FastCGI with the fastcgi package).

Using the monadcgi frontend for Hack, you can take any application written for the old CGI monad and make it work with any Hack handler.

Conslusion

Hack is in its infancy right now; don’t let the large number of Hack packages on Hackage let you think otherwise. Nonetheless, some of us are using it in production settings now with great success. The documentation is lacking, but on the other hand, Hack is so incredibly simple that it doesn’t really need documentation. In any event, I hope to rectify the documentation issue with some code samples soon.

Also, I’d like to address some potential criticism: Hack does not solve many problems. I’ve heard that people are considered with leaving file handles open, database locking, etc. These are real issues that plague us all in web development. However, this is not Hack’s concern. Hack simply let’s your application talk to a handler. Period. You still need to figure out if you want to use HSP or the html library, if you’ll use jquery or HJScript, or if you’ll go the HDBC, Takusen or happstack-state route.

No. Hack ignores all these issues, and hopefully will allow people around the Haskell community to begin to standardize our web development practices in at least one arena.

Filename encoding issues

June 11, 2009

The Problems

Music Collection

My wife has a large collection of Hebrew music. Since we imported it from some ancient MP3 CDs (I think it was burned on an old OS 9 Mac or something like that), we’ve always had filename and tag character encoding issues, so that the titles come out looking like àøé÷ àééðùèééï, éöç÷ ÷ìôèø. I keep saying I’ll get around to fixing it…

Photo Collection

The other day, our landlord got a new Windows XP system to replace his Windows 98 one. He had a large collection of photos on it, many with Hebrew titles. I wanted to just transfer the files across the network to his Windows computer, but I couldn’t get them to talk. Instead of debugging that, I just used secure shell to copy the files directly to my Linux system, from which I intended to burn a CD. Unfortunately, when I got to my computer, I saw that all his files had an “Invalid encoding” message.

Explanation

Linux (or at least my Ubuntu system, I can’t speak authoritatively here) stores filenames in UTF-8 character encoding. Many legacy systems, like Windows 98, stored files in language specific character encodings. In the case of Hebrew, it’s called WINDOWS-1255. This is a single-byte character set, meaning the first 128 possible values are the same as ASCII, and the next 128 are language-specific. Unfortunately, there are many encodings like this, and there is no way to tell them apart without outside information. The most common of these is Latin-1, which includes a lot of vowels with funny marks over them (see the music collection sample above).

So, when importing the music collection, my Linux box attempted to convert from the legacy character encoding to UTF-8. (I actually don’t remember at which point this conversion happened, it could have been earlier. It’s irrelevant in any event.) Unfotunately, it didn’t know it was dealing with Hebrew, and so took a guess that it was Latin-1. Since, for example, the Hebrew letter Alef has a hex code of E0 in Windows-1255, which is à in Latin-1, all of the Hebrew looks like I fell asleep doing my Spanish homework.

With the photo collection, the secure shell transfer never attempted to do the Latin-1 to UTF-8 conversion, and thus the files on the Linux box showed up with the original Windows-1255 encoding. This is actually slightly easier to deal with.

The Solution

Below is the code I used to fix this whole thing up. I’ll appreciate any critiques that are available. I’m not sure if this is a common problem for people or not; if people want it, I’ll package this up and put it on Hackage.

The basic code flow is: for each file in the source directory, convert the directory and file name to UTF-8 encoding, create the destination directory, and create a hard link. The caveats: if you specify that you want to convert back to Latin-1 (which was necesary for the music collection), the conversion process will go from UTF-8 to Latin-1 and then your specified encoding (mine was Windows-1255) back to UTF-8. If you do not wish that step (as in the photo collection), it only does the second conversion.

Additionally, it seems that Haskell- or at least the directory package- does not properly convert Strings to UTF-8 when making system calls. Thus I have an ugly function (utf8StringHack) to address this. I hope that in the future this won’t be necesary.

The Code

import System.Directory
import Codec.Text.IConv
import qualified Data.ByteString.Lazy as B
import Data.ByteString.Class
import qualified System.UTF8IO as U
import Control.Monad
import Data.List
import System.Posix.Files
import System.Environment

usage :: String
usage = "<convert to latin-1 first> <source encoding> <input dir> " ++
        "<output dir>"

main :: IO ()
main = do
    args <- getArgs
    when (length args /= 4) $ error usage
    let [toLatin1Str, encoding, input, output] = args
    -- convert the string version to a Bool version
    -- this variable specifies whether we need to convert from
    -- UTF-8 to Latin-1 first (see comments below)
    let toLatin1 = case toLatin1Str of
                    ('y':_) -> True
                    ('Y':_) -> True
                    _ -> False
    allFiles <- getTree input
    mapM_ (fixFile toLatin1 encoding input output) allFiles

-- | Convert the filename encoding of a single file.
--
-- Creates necesary directories and uses hard links.
fixFile :: Bool -- ^ whether to first convert to Latin-1 from UTF-8
        -> String -- ^ encoding
        -> FilePath -- ^ top of source directory
        -> FilePath -- ^ top of destination directory
        -> [String] -- ^ subpath of the file to fix
        -> IO ()
fixFile toLatin1 encoding input output path = do
    -- Fix the encoding of the subpath.
    let path' = map (convertName toLatin1 encoding) path
    -- The name of the directory which must be created.
    let destdir = utf8StringHack
                $ output ++ "/" ++ intercalate "/" (init path')
    -- The ultimate file destination.
    let destfile = utf8StringHack
                 $ output ++ "/" ++ intercalate "/" path'
    -- And the current filename, in all its badly-encoded glory.
    let srcfile = input ++ "/" ++ intercalate "/" path
    createDirectoryIfMissing True destdir
    createLink srcfile destfile

-- | I hope that this function will not be necesary in the future.
-- This takes a sequence of Unicode characters, encodes them to bytes
-- using UTF-8 encoding, and then puts those bytes back into a string.
--
-- This is needed for passing off to the System.Directory calls
-- like createDirectoryIfMissing and createLink.
--
-- In theory, all functions touching the outside world could properly
-- do the character encoding/decoding themselves.
utf8StringHack :: String -> String
utf8StringHack = map (toEnum . fromIntegral) . B.unpack . toLazyByteString

-- | Simply determine if the filename begins with a period.
notHidden :: String -> Bool
notHidden ('.':_) = False
notHidden _ = True

-- | Get all of the files in the given path.
getTree :: FilePath -> IO [[String]]
getTree f = getTree' f []

getTree' :: FilePath -- ^ containing path for the directory currently worked on
         -> [String] -- ^ current subpath
         -> IO [[String]]
getTree' dir prev = do
    -- Immediate children.
    contents <- getDirectoryContents dir
    -- Unhidden children.
    let contents' = filter notHidden contents
    -- Generate the full path for a file here.
    let addDir :: String -> String
        addDir s = dir ++ "/" ++ s
    files <- filterM (doesFileExist . addDir) contents'
    dirs <- filterM (doesDirectoryExist . addDir) contents'
    -- Tack the current filename onto the running subpath.
    let files' = map ((++) prev . return) files
    -- Recursive part.
    dirs' <- mapM helper dirs
    -- Stick together current files and files in subdirs.
    return $! files' ++ concat dirs'
    where
        -- Recursively call getTree' for a subdir here.
        helper :: FilePath -> IO [[String]]
        helper dirPart = do
            let dir' = dir ++ "/" ++ dirPart ++ "/"
                prev' = prev ++ [dirPart]
            getTree' dir' prev'

-- | Convert an incorrectly encoded file name to a proper Unicode string.
--
-- Often times the filename will be incorrectly translated at some point
-- from Latin-1 to UTF-8. This is all well and good- if you're dealing
-- with LATIN1. Otherwise, you now need to do two things: convert from
-- UTF-8 to LATIN1 to undo the incorrect conversion, and convert from
-- your real encoding to UTF-8. That is the purpose of the first parameter.
convertName :: Bool -- ^ convert from UTF-8 to Latin-1 first
            -> String -- ^ character encoding of the filename
            -> String -- ^ incorrectly encoded filename
            -> String -- ^ corrected filename
convertName toLatin1 encoding =
    fromLazyByteString .
    convert encoding "UTF-8" .
    (if toLatin1 then convert "UTF-8" "LATIN1" else id) .
    B.pack .
    map (toEnum . fromEnum)

Functors and Monads (containers)

June 2, 2009

Introduction

In a philosophy class in college, I remember learning the idea that in order to understand something, you can’t simply study it. For example, if you study the heart, you’ll understand the function of the valves, what causes the muscle to relax and contract, and so on, but you’ll have no idea what a heart is and what its purpose is. In order to learn that, you need to study the human body.

I’m going to take the same approach to Monads and Functors (and hopefully later Applicative). I hope this doesn’t turn in another Monads are Burritos. I’ll avoid as much as possible type signatures and such so as not to obscure the main point.

Containers

Two scary concepts in Haskell, Functors and Monads, both deal with containers. These aren’t as limited as containers from a language like Java. There, containers are linked lists, arrays maps, sets; some way of managing collections of stuff. In Haskell, we use containers to represent actions, logging, and- our topic today- things which might exist. I’m speaking about Maybe.

I’ve chosen Maybe since I find it easy to think about. You either have Just a result, or Nothing. To further simplify, we’ll only deal with Maybe Int; a number that might exist.

How can such a thing occur? Let’s say these numbers represent the price for an item, given in cents. We might have a getPrice function which takes an Item as an argument. If a Toothbrush costs $1.25, then the function would return Just 125.

On the other hand, say you don’t have a price for Toothpaste. Then the function would return Nothing. (Think for a moment how you’d deal with this in Java, should a 0 represent unknown? What if you have a buy one-get one free sale? Use -1? You’ll have to remember to check for it everywhere. But I digress.)

Monads

Anyway, let’s now say that you have another function to add tax onto the purchase price. It doesn’t deal with Maybes; it adds 5% to an Int. But our getPrice function returns a Maybe Int. How can we stick these two functions together (ie, compose them)?

This is where we can use the trusty old Monad. Let’s start with do notation:

getPriceWithTax item = do
    price <- getPrice item
    return $ addTax price

That’s a little verbose for something so simple. Let’s drop the do:

getPriceWithTax item = getPrice item >>= return . addTax

Great… except why the return? We have two types at play: Int and Maybe Int. When dealing with one-argument functions, you get four possible function signatures:

  1. Int -> Int
  2. Int -> Maybe Int
  3. Maybe Int -> Int
  4. Maybe Int -> Maybe Int

The third option is unwrapping a contained value; it is not a topic for Monads or Functors, so ignore it for now. (If you care, look up fromJust.) The first option is a regular old function, like addTax. Option 2 is like getPrice: it ends up wrapping a container. (We’ll address option 4 later.)

Composition

In order to compose two functions, the output of one must be the same type as the input of the other. With containers, we in general can only add containers, not take them away. So in our getPriceWithTax function above, we have getPrice returning a Maybe Int, and addTax taking a plain Int.

To make them compatible, one of them will have to change. Guess which? That’s right, we need to make addTax take a Maybe Int instead of an Int. Also, since we can’t remove the container in the middle, we end up returning a Maybe Int as well, which leads us to option 4 above.

Creating an option 4 function is where the Monadic bind function (>>=) comes in handy. It converts an contained -> contained function into a contained -> contained one. This gives us back composibility! But wait: addTax returns an uncontained value. No problem: we just add a return to make addTax return contained values.

Functors FTW

You might be thinking that it’s kind of silly to do business this way, and I’ll agree with you. What we really want is a way to convert totally uncontained functions (option 1) to totally contained ones (option 4). This is what Functors are for. A Functor has a single function, fmap, which does exactly that. So our code becomes:

getPriceWithTax3 item = addTax `fmap` getPrice item

Monads are still good

Now, there are times- many of them- when you’ll need the full power of Monads to deal with things. I hope to address that in a future post. But for now, the moral is: if you are using return with Monadic bind, you might want to consider fmap instead.

Wordify: RESTful Haskell web apps

May 20, 2009

Here’s an incredibly simplistic Haskell RESTful web application. If you look at the code, Web.Restful is the beginning of a proper RESTful framework, built on a bunch of smaller libraries I’ve been uploading to Hackage recently. Once I’ve stabalized it a bit, I’ll release it as its on library. For now, it lives in wordify-web and some personal site.

Anyway, here’s wordify!