Tuesday 23 November 2010

CSV unification

Unifier on GitHub


A simple problem

Given an excel spreadsheet with three sheets within it, all with approximately the same rows but with differing columns, produce a unified CSV file. If a sheet does not contain a row then insert blank columns. The unique key for each sheet is column 2, called ID, move this column to column 1. No doubt you can see all the issues, as I can in retrospect, but after a week and a bit the job is done. High test coverage enabled me to refactor and add features through to a completely different beast.

Lessons learned

hashcode() for enum constants

The members of an enum inherit their hashCode() method from Object so it varies between JVM invocations. Hence I used
result = prime * result + unificationOption.ordinal();
in CsvTable.hashCode().

Bridging methods

During the compilation of generic code Java quietly generates what are called bridging methods these are visible to Cobertura but not to you, the coder, so Cobertura tries to tell you that you have not exercised these methods by marking the class definition line as not covered. Using the following code to print out all methods
for (Method m : CsvTable.class.getMethods()) {
  System.out.println(m.toGenericString());
}
I discovered the generated bridging methods
public net.pizey.csv.CsvRecord get(java.lang.Object)
public java.lang.Object get(java.lang.Object)

public net.pizey.csv.CsvRecord put(java.lang.String,net.pizey.csv.CsvRecord)
public java.lang.Object put(java.lang.Object,java.lang.Object)

public net.pizey.csv.CsvRecord remove(java.lang.Object)
public java.lang.Object remove(java.lang.Object)
The put(Object key, Object value) cannot be accessed normally, as it is masked by the generic version. So we have to introduce a reflective mechanism of invoking it.
public void testBridgingPut() {
  CsvTable t = new CsvTable("src/test/resources/sheet2.csv", UnificationOptions.LOG);
  Object o = t.get((Object) "1");
  Method method = t.getClass().getMethod("put", new Class[] { Object.class, Object.class });
  method.invoke(t, "jj", o);
}
Finally I had to deal with the generated bridging methods for clone. This is done by not supplying a generic clone method but using the pre-generics signature.

I have come to enjoy Java6 however Generics do feel like an expensive compromise.

Tools and sources

The CSV parser was written by WilliamC and incorporated in a CSV importer in 2000 by MylesC. I started from that framework, but as it was written before Java Collections, let alone Java 6, not a lot remains of the original.
The tools used: GitHub, Hudson CI, Maven, Eclipse and Cobertura.

Sharing and Deployment

The code is at https://github.com/timp21337/unifier.

To include the jar in Maven:
<dependency>
 <groupId>net.pizey.csv>/groupId>
  <artifactId>unifier</artifactId>
  <version>1.0</version>
</dependency>
Deployed to a Maven repository at http://pizey.net/maven2/