Hello fellow nerds!
Question: If you had a bunch of HTML data, in a semi-regular format (exhibit a: Software Engineering Dissertations) and you wanted to jam all that information into a SQL database which has already been created, how would you most efficiently create the INSERT statements to avoid any mind-numbing work?
I've done things like this in the past with Java/Perl/Awk regexps, but they've never really worked perfectly due to the irregular structure. Any better ideas?
I always hack out some bit of ad-hoc mess to do this kind of thing. I know you hate python, but it has some nice features that can help to deal with strings, and sometimes learning a new language can make a super boring job kind of fun. If you want to go really nuts you could check out Boost's Spirit
ReplyDelete