Wednesday, February 20, 2008

HTML Monkey

Hello fellow nerds!

Question: If you had a bunch of HTML data, in a semi-regular format (exhibit a: Software Engineering Dissertations) and you wanted to jam all that information into a SQL database which has already been created, how would you most efficiently create the INSERT statements to avoid any mind-numbing work?

I've done things like this in the past with Java/Perl/Awk regexps, but they've never really worked perfectly due to the irregular structure. Any better ideas?

1 comment:

  1. I always hack out some bit of ad-hoc mess to do this kind of thing. I know you hate python, but it has some nice features that can help to deal with strings, and sometimes learning a new language can make a super boring job kind of fun. If you want to go really nuts you could check out Boost's Spirit