Aligning data from separate files using Haskell

Aligning data from separate files using Haskell

Posted by itkovian on Fri, 05/30/2008 - 23:51 in

If you need to align the data from multiple files fast, Haskell is the way to go. For example, given a number of files, with an equal number of lines, each containing a number. How can we easily create a single file where each line contains items from each of the original files, separated by a separator of your choice?

One way to do it is like this.

  1. module Main() where
  2. import Control.Monad(mapM)
  3. import Data.List(transpose, intersperse)
  4. import System.Environment(getArgs)
  5. import System.IO
  6. main = do
  7. header:lookups:files <- getArgs
  8. handles <- mapM (\x -> openFile x ReadMode) files
  9. contents <- mapM hGetContents handles
  10. let output = map (concat . (\x -> intersperse ":" (lookups:x))) $ transpose $ map lines contents
  11. putStrLn header
  12. mapM_ putStrLn output

First of all, I think it is good practice to only import the things you need. Hence the e.g., import Control.Monad(mapM). Yet, I can't seem to import IOMode from System.IO so there I imported the lot.
Second, it is paramount that each line inside a do statement has the same type, namely IO () in this case. This means we use a mapM_ in the last statement (mapM_ :: (Monad m) => (a -> m b) -> [a] -> m (), where m becomes IO).

Essentially, we handle all files simultaneously, by mapping functions manipulating them into the IO monad using mapM (the type is mapM :: (Monad m) => (a -> m b) -> [a] -> m [b], where once again m becomes IO as we're executing actions in main :: IO ()). For example, we read the contents of each file in a lazy manner using hGetContents, which results in something that has the type IO [String]. Since we assigned this to the contents name, using <-, contents :: [String]. So, we have a list of Strings, representing the content of each file. Using pure functions, we get the data in the shape we like: we split the contents of each file with lines, and transpose the result. Then we can inject the separator using intersperse and we concatenate the result. Finally, we need to show the result. We put everything to stdout, using putStrLn :: String -> IO (). Because we like a header in our new file, we print it out first. Then we print each line of the new file.

Admittedly, I used to have a hard time grokking this, but since I looked at the Write yourself a Scheme in 48 hours things have picked up.

Paste

What's wrong with 'paste'?

Posted by Michiel R. (not verified) on Wed, 07/09/2008 - 11:26
Nothing is wrong with paste.

Nothing is wrong with paste. I just didn't know it ;-)

Posted by itkovian on Mon, 08/04/2008 - 15:26