Skip to main content

A JDBC Application to Copy Data Across Databases


Requirement


It is required to copy data stored in one data source to another data source having the schema structure using JDBC. Whatever data is present in selected tables of origin data source should be copied to destination data source.

Challenges


Different JDBC drivers require somewhat different URLs and drivers. Even though Java.sql.DriverManager simplifies need to specify correct JDBC driver by looking at the URL, it is still necessary to make sure required driver is in classpath. Furthermore Driver Manager will not be able to load sun’s JdbcOdbcDriver when it is required to access Odbc Data Sources like MS Access DB files, thus it should explicitly be loaded at application startup.  Also there might be situations as it is in MS Access, which requires matching DSN configuration, additional configurations might be necessary. 

As to JDBC URLs, since they are not completely standard among different drivers, user needs to know correct construction of required JDBC URLs.

Different SQL Data Sources require different syntaxes. For example MySQL uses back-quote to escape column names with spaces while SQLServer uses square brackets. One solution to this is to leave selection of right quotes to user by putting them into application properties.

While it is good practice to specify fetch size of data to both reduce network bandwidth usage and memory footprint of application, JdbcOdbcDriver will throw exception instead of simply ignoring this parameter.

Data Sources may use and require different charsets. Which should be known and specified by the user which is sometimes hard to query? For example, for MS Access there is no easy way to learn internal charset of *.mdb file. Specifying wrong charsets may yield weird column names and incorrectly constructed SQL statements will break the correct execution of application.


Implementation


Database Specific parameters are read from properties file to avoid source code change in case of data source changes.

Name of tables to copy are read from properties file as a comma separated list. Then table schema is determined using ResultSetMetadata. This makes it very easy to copy from any number of database tables without any effort. However, this approach requires that source and destination databases shares exactly the same table schemas.

For each table name, a query is sent to origin database to retrieve table metadata. Then using this metadata, respective select and insert queries are constructed as in code listing below.

ResultSet rs = statement.executeQuery("SELECT * FROM " + tableName + " WHERE 1<>1");
ResultSetMetaData rsm = rs.getMetaData();
List<String> attributeList = new ArrayList<String>();

StringBuilder insertQueryBuilder = new StringBuilder("INSERT INTO " + tableName + " (");
StringBuilder selectQueryBuilder = new StringBuilder("SELECT ");

for (int i = 1; i <= rsm.getColumnCount(); i++) {
    String columnName = rsm.getColumnName(i);

    if (rsm.isAutoIncrement(i))
        continue;


    insertQueryBuilder.append(DST_COLUMN_QUOTE_START).append(columnName).append(DST_COLUMN_QUOTE_END).append(LIST_SEPARATOR);
    selectQueryBuilder.append(SRC_COLUMN_QUOTE_START).append(columnName).append(SRC_COLUMN_QUOTE_END).append(LIST_SEPARATOR);

    attributeList.add(columnName);
}

// insert
insertQueryBuilder.replace(insertQueryBuilder.length() - 1, insertQueryBuilder.length(), ")");//replace last comma
insertQueryBuilder.append(" VALUES(");
for (int i = 1; i <= attributeList.size(); i++)
    insertQueryBuilder.append("?" + LIST_SEPARATOR);
insertQueryBuilder.replace(insertQueryBuilder.length() - 1, insertQueryBuilder.length(), ")"); //replace last comma

//select
selectQueryBuilder.replace(selectQueryBuilder.length() - 1, selectQueryBuilder.length(), " "); //replace last comma
selectQueryBuilder.append(" FROM ").append(tableName);


Data for each table is copied in the order which is specified in properties file. Batch size number of rows are fetched and inserted at a time. After all rows are inserted transaction is committed and connection is set back to auto-commit mode.

destinationConnection.setAutoCommit(false);
PreparedStatement preparedStatementSelect = sourceConnection.prepareStatement(selectStatements.get(tableName));
PreparedStatement preparedStatementInsert = destinationConnection.prepareStatement(insertStatements.get(tableName));
List<String> attributeList = attributeListMap.get(tableName);

// for JdbcOdbcDriver, setting fetch size yields exception
preparedStatementSelect.setFetchSize(batchSize);
ResultSet rs = preparedStatementSelect.executeQuery();

preparedStatementInsert.clearParameters();
preparedStatementInsert.clearBatch();
preparedStatementInsert.clearWarnings();
int insertCount = 0;

while (rs.next()) {


    for (int i = 1; i <= attributeList.size(); i++) {
        Object obj = rs.getObject(i);
        preparedStatementInsert.setObject(i, obj);
    }

    insertCount++;
    preparedStatementInsert.addBatch();

    if (insertCount % batchSize == 0) {
        preparedStatementInsert.executeBatch();
        preparedStatementInsert.clearBatch();
    }

}

// ensure that all batches are executed
preparedStatementInsert.executeBatch();
preparedStatementInsert.clearBatch();


destinationConnection.commit();


Conclusion


This application realizes a simple condition where data copy across different data sources needed. This application shows both powers and weaknesses of JDBC. Metadata information, Batch Processing, Prepared Statements are powers of JDBC, while incompatibility of different driver implementations is weakness of JDBC.

Whole application can be downloaded from this link.

Comments

Popular posts from this blog

Obfuscating Spring Boot Projects Using Maven Proguard Plugin

Introduction Obfuscation is the act of reorganizing bytecode such that it becomes hard to decompile. Many developers rely on obfuscation to save their sensitive code from undesired eyes. Publishing jars without obfuscation may hinder competitiveness because rivals may take advantage of easily decompilable nature of java binaries. Objective Spring Boot applications make use of public interfaces, annotations which makes applications harder to obfuscate. Additionally, maven Spring Boot plugin creates a fat jar which contains all dependent jars. It is not viable to obfuscate the whole fat jar. Thus obfuscating Spring Boot applications is different than obfuscating regular java applications and requires a suitable strategy. Audience Those who use Spring Boot and Maven and wish to obfuscate their application using Proguard are the target audience for this article. Sample Application As the sample application, I will use elastic search synch application from my GitHub repository.

Hadoop Installation Document - Standalone Mode

This document shows my experience on following apache document titled “Hadoop:Setting up a Single Node Cluster”[1] which is for Hadoop version 3.0.0-Alpha2 [2]. A. Prepare the guest environment Install VirtualBox. Create a virtual 64 bit Linux machine. Name it “ubuntul_hadoop_master”. Give it 500MB memory. Create a VMDK disc which is dynamically allocated up to 30GB. In network settings in first tab you should see Adapter 1 enabled and attached to “NAT”. In second table enable adapter 2 and attach to “Host Only Adaptor”. First adapter is required for internet connection. Second one is required for letting outside connect to a guest service. In storage settings, attach a Linux iso file to IDE channel. Use any distribution you like. Because of small installation size, I choose minimal Ubuntu iso [1]. In package selection menu, I only left standard packages selected.  Login to system.  Setup JDK. $ sudo apt-get install openjdk-8-jdk Install ssh and pdsh, if not already i

Java: Cost of Volatile Variables

Introduction Use of volatile variables is common among Java developers as a way of implicit synchronization. JIT compilers may reorder program execution to increase performance. Java memory model[1] constraints reordering of volatile variables. Thus volatile variable access should has a cost which is different than a non-volatile variable access. This article will not discuss technical details on use of volatile variables. Performance impact of volatile variables is explored by using a test application. Objective Exploring volatile variable costs and comparing with alternative approaches. Audience This article is written for developers who seek to have a view about cost of volatile variables. Test Configuration Test application runs read and write actions on java variables. A non volatile primitive integer, a volatile primitive integer and an AtomicInteger is tested. Non-volatile primitive integer access is controlled with ReentrantLock and ReentrantReadWriteLock  to compa