String-like

While working on a type-safe method for embedding HTML fragments, I was reminded of some of my annoyances with my web-encodings package. In particular, I hated how I was doing all of these automatic conversions from and to lazy bytestrings, strict bytestrings and strings (ie [Char]). It’s always caused me a few headaches:

  • Often times I need to explicitly set types with a type signature.
  • I know that I’m needlessly wasting cycles.
  • There is more than one way to convert between a String and a ByteString; in particular, Latin-1 encodings (ie, Data.ByteString.Char8.pack) versus UTF-8.

I decided that it would be a good idea to provide these functions for all string-like data types. In addition to strings, strict bytestrings and lazy bytestrings, I also want to support strict and lazy text. My first idea was to provide five different modules in web-encodings. I did not relish the thought of writing it, much less maintaining it.

class StringLike

Then I had an idea. When doing html escaping, for example, all I really need to do is call “concatMap escapeHtmlChar”, where escapeHtmlChar might look like:

escapeHtmlChar '<' = "&lt;"
escapeHtmlChar '>' = "&gt;"
...
escapeHtmlChar c = [c]

I could obviously write 5 versions of the escapeHtml function, each calling a specialized version of concatMap. In fact, it’s very simple to do so: all five data types involved provide a concatMap function. I might need a little tweaking for packing at some points, but it’s very simple.

But of course I still didn’t want to have five functions. So I decided to create the “StringLike” typeclass. It looks something like this:

class StringLike a where
    head :: a -> Char
    tail :: a -> a
    lengthLT :: Int -> a -> Bool
    concatMap :: (Char -> String) -> a -> a
    ... (many more function)

As simple as this looks, there are a few things to note:

  • The basic type is always a Char. This means that we are treating bytestrings as if they are encoded in Latin-1.
  • Based on a suggestion by Daniel Fischer, there is no length function. Instead, there are length comparison functions, which is probably what’s needed in general.
  • There’s a fine line of when to use String and when to use the type itself. For example, I think the first argument to concatMap should be a function returning a String, not the specific type. tail should most definitely return the type itself. But there are some corner cases, such as the isPrefixOf function.

You can see the whole StringLike typeclass on github.

The ugly

Well, since my functions (encodeHtml, decodeUrl, etc) are still dealing with type classes instead of concrete values, I might still need an occasional type signature to get it to work. However, since there’s only one type involved, it should be much easier. For example, stringing together a number of these functions is completely unambiguous.

Also, I’ve lost the ability to pattern match strings. Instead, I must manually check the length and use head and tail functions. This is made most clear by the decodeUrl function. I have a feeling view patterns might be of assistance here, but I haven’t looked into it yet.

Useful?

I’m curious if the community would find this useful as a standalone package. If I were to release it, it would probably be two modules:

  • Data.StringLike would simply be the basic operations any string-like type should provide.
  • Data.StringLike.Extra would be higher-level functions built on top of this. Most likely, it would all go in a typeclass so individual types could provide more efficient versions of specific functions.

Look forward to hearing some opinions on this.

About these ads

12 Responses to “String-like”

  1. Neil Brown Says:

    It looks to me like, apart from pack, you have really created a ListLike type-class. If you instead had a type-class (I can’t remember the associated type syntax off the top of my head):

    class ListLike a where
    type Elem a

    You could capture everything in your class except pack, which is perhaps the function you really want a String-related type-class (and unpack — show is not a substitute). Although a ListLike type-class should probably have a length function.

    • Michael Snoyman Says:

      Before generalizing the concept, I’d like to develop it more to make sure I’ve got all the functionality I want. But I like the idea a lot.

      As far as the pack and unpack functions: I had actually been considering going in the opposite direction: not providing pack at all. The more I think about it, however, I think pack needs to stay, and unpack is probably a welcome addition.

      Just for the record, I only put in the typeclass currently exactly what I needed in order to get web-encodings ported. If I make this its own package, I will definitely be adding more functions. Most likely I’ll be trying to target the text package API, but I’ll be happy for any recommendations people have.

  2. PermEA Says:

    May be you do not need it, yet string-like class will be incomplete without equivalents for map, mapM , takeWhile, dropWhile etc
    Some kind of tokenizing|lexing facility can be useful as well.

  3. Neil Mitchell Says:

    I’ve already got a StringLike which I use in the darcs version of TagSoup, amazing the similarities with names and domains!

    http://community.haskell.org/~ndm/darcs/tagsoup/Text/StringLike.hs

    • Michael Snoyman Says:

      I agree, that is quite amazing. I suppose if two people independently come up with exactly the same idea, it must mean it’s worth something (or we’re both idiots).

      The basic difference I see is that I was trying to model directly the underlying functions, even unsafe head, tail, etc. I also tried to keep the names the same, assuming people would use qualified imports. I see it as a bit of an open question which approach is truly better.

      • Neil Mitchell Says:

        I originally kept the names the same, but then changed my mind – I’m writing an optimiser on top of it, and having name clashes was just annoying with no real benefit (since I’m the only user). I also didn’t include unsafe head/tail as I intend the optimiser to magic those operations (or probably unsafeIndex) out of thin air later in the process, so it wasn’t necessary).

      • Michael Snoyman Says:

        @Neil,

        Are you planning on making this optimiser and associated library available as a separate package?

  4. Brent Yorgey Says:

    Are you aware of the ListLike package on Hackage? I haven’t looked too carefully so I’m not sure of the differences or whether this would be appropriate for your use. Perhaps StringLike could just be a small subclass of ListLike?

    • Michael Snoyman Says:

      @Brent,

      After a quick look at ListLike, I get the feeling it’s got too much power for my purposes, but I’m not certain. I will definitely look into it more, but I’m not sure if it would actually provide much benefit to use it as a basis for StringLike.

      • John Lato Says:

        Did you notice that the ListLike package also provides a “StringLike” type?

        I’ve actually felt that the StringLike in ListLike is not powerful enough, since most comparisons require unpacking to a List.

        I would welcome a more fully-developed StringLike type as well.

  5. Kevin Jardine Says:

    Hmmm – I’m not sure I understand. Are you saying that StringLike does not support UTF8? In which case, how useful would it be for web applications?

    • Michael Snoyman Says:

      @Kevin

      What I’m saying is that, when dealing with ByteStrings, it assumes a Latin-1 encoding. For the specific cases of HTML and percent escaping, this is the right behavior for UTF-8 as well. This would of course cause problems with UCS-2 encoding, for example.

      However, when dealing with String and Text (lazy or strict), these functions do not need to deal with encoding issues at all.

      In short: it does “the right thing” for web applications.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: