I was doing a program today for a cron job we have at work. Basicly, the cron job needs to import some data from a source MSSQL server into our own MySQL server for data analysis and root cause analysis.
I ended up learning quite a few new things when I had this job done.
Firstly, the program needs to read a set of text files to configure itself. There is the source odbc connection string from one text file and destination connection string from another. Then there is a database mapping csv text file which has a line format of [tablename],[config_file] which may contain multiple entires for multiple tables. Then there will need to be a configuration file for each of these table as specified in the database mapping text file. Each of these configuration file will have a line format of [source column name],[destination column name],[data type].
That sums up all the configuration files needed for the program to function. Now it comes to the logic of the program. The program will work out a drop table create table script and run it to create the destination tables. As it does a fresh import instead of delta update import, the drop table part is thus incorporated. And, the program will then work out the extraction query for the source database and insert statement for destination database. It will then execute the select statment on the source database and iterate through all records from the odbc data reader. For each of the record, it will then build a insert statement and execute it on the destination database. And this loops until all tables specified in the database mapping text file is finished.
The program is not very scalable but would work for them to transfer only 180MB of data daily. It takes about 20 minutes to transfer 180MB of data which is a shame. But this can be vastly improved if multi-threading is implemented.
I will post the source code up a little later.