Archive of May 2008

Another coding folly

Posted on 2008-05-08 19:25:22

A few weeks ago, I fixed a problem with the same scheduler service I talked about earlier: one of the plugins written is used to extract data from the database and re-pack it in such a way that another (external) program can read it and make pretty management graphs from it. The plugin works by executing a (huge) view on the database, exporting the result to a CSV file, and then running a tool on the CSV files so that another file is generated which can be read from the management program.

Anyway, what the plugin first did:

  1. Execute the view
  2. Read the results into a data table
  3. Write the data table to disk as CSV file

This is fine and dandy, if the data set you read from the database is fairly small. In this case, it had to execute at least 4 views that were around 250MB when exported to CSV file. I don’t quite know exactly how much memory that would take up when it’s stored as DataTable in memory, but I think that’d be at least twice as much.

With the lazy garbage collection in .NET, you’re running into a bit of a memory problem if you execute four such views consecutively. And indeed: the plugin took 2.5 to 3 hours to export the data of several views. Most of the time was spent swapping memory around: the server on which this is running only has 512MB RAM.

My solution to this was: don’t store it in a data table, but write the rows directly to the CSV file as they come in from the database.

The result: the plugin now executes in 30-40 minutes, a speedup of 4.5 to 5, just by changing the way data was read!

Posted in: Work