Archive for the ‘Strings’ Category

String-like

December 4, 2009

While working on a type-safe method for embedding HTML fragments, I was reminded of some of my annoyances with my web-encodings package. In particular, I hated how I was doing all of these automatic conversions from and to lazy bytestrings, strict bytestrings and strings (ie [Char]). It’s always caused me a few headaches:

  • Often times I need to explicitly set types with a type signature.
  • I know that I’m needlessly wasting cycles.
  • There is more than one way to convert between a String and a ByteString; in particular, Latin-1 encodings (ie, Data.ByteString.Char8.pack) versus UTF-8.

I decided that it would be a good idea to provide these functions for all string-like data types. In addition to strings, strict bytestrings and lazy bytestrings, I also want to support strict and lazy text. My first idea was to provide five different modules in web-encodings. I did not relish the thought of writing it, much less maintaining it.

class StringLike

Then I had an idea. When doing html escaping, for example, all I really need to do is call “concatMap escapeHtmlChar”, where escapeHtmlChar might look like:

escapeHtmlChar '<' = "&lt;"
escapeHtmlChar '>' = "&gt;"
...
escapeHtmlChar c = [c]

I could obviously write 5 versions of the escapeHtml function, each calling a specialized version of concatMap. In fact, it’s very simple to do so: all five data types involved provide a concatMap function. I might need a little tweaking for packing at some points, but it’s very simple.

But of course I still didn’t want to have five functions. So I decided to create the “StringLike” typeclass. It looks something like this:

class StringLike a where
    head :: a -> Char
    tail :: a -> a
    lengthLT :: Int -> a -> Bool
    concatMap :: (Char -> String) -> a -> a
    ... (many more function)

As simple as this looks, there are a few things to note:

  • The basic type is always a Char. This means that we are treating bytestrings as if they are encoded in Latin-1.
  • Based on a suggestion by Daniel Fischer, there is no length function. Instead, there are length comparison functions, which is probably what’s needed in general.
  • There’s a fine line of when to use String and when to use the type itself. For example, I think the first argument to concatMap should be a function returning a String, not the specific type. tail should most definitely return the type itself. But there are some corner cases, such as the isPrefixOf function.

You can see the whole StringLike typeclass on github.

The ugly

Well, since my functions (encodeHtml, decodeUrl, etc) are still dealing with type classes instead of concrete values, I might still need an occasional type signature to get it to work. However, since there’s only one type involved, it should be much easier. For example, stringing together a number of these functions is completely unambiguous.

Also, I’ve lost the ability to pattern match strings. Instead, I must manually check the length and use head and tail functions. This is made most clear by the decodeUrl function. I have a feeling view patterns might be of assistance here, but I haven’t looked into it yet.

Useful?

I’m curious if the community would find this useful as a standalone package. If I were to release it, it would probably be two modules:

  • Data.StringLike would simply be the basic operations any string-like type should provide.
  • Data.StringLike.Extra would be higher-level functions built on top of this. Most likely, it would all go in a typeclass so individual types could provide more efficient versions of specific functions.

Look forward to hearing some opinions on this.

Advertisements