Wednesday 21 November 2012

A Security Policy Model for Clinical Information Systems

A Security Policy Model for Clinical Information Systems appeared at the 1996 IEEE Symposium on Security and Privacy. It presents the BMA policy model to the computer security community in a format comparable to policies such as Bell-LaPadula and Clark-Wilson. It had some influence on later US health privacy legislation (the Kennedy-Kassebaum Bill, now HIPAA).

A classic paper by Ross Anderson found here, more properly from Ross' own site.

These are just notes and pickings. Though all nine Principles are given.

The paper sets out a model that consciously parallels the Bell LaPadula model for military systems and the Clark - Wilson Model for the world of banking.

Extract from the Hippocratic oath

Whatsoever I shall see or hear in the course of my dealings with men, if it be what should not be published abroad, I will never divulge, holding such things to be holy secrets.

Though this is a different translation to that on Wikipedia.

Consent must be informed and voluntary.
The main new threat comes from abuse by insiders.

Unnoticed failures from data corruption are also new, non-digital information stores tend to fail completely.

The ill-considered aggregation of systems has grown apace, culminating in cloud service provision.

BMA recommends a callback authentication protocol to defend against private investigators impersonating other health professionals.

Principle 1

Each identifiable clinical record shall be marked with an acccess control list naming the people or groups of people who may read it and append data to it. The system shall prevent anyone not on the access control list from accessing the record in any way.

Immediate bad smell: The system shall prevent. This suggests a failure mode where access is allowed. The requirement needs to be framed in the opposite sense: The system shall enable anyone on the access control list from accessing the record. There will be no other mechanism of access.

Patients must be informed of a care team's access control policy when they first enrol, and have the opportunity to restrict access further if they wish.

Principle 2

A clinician may open a record with herself and the patient on the access control list. Where a patient has been referred, she may open a record with herself, the patient and the referring clinician(s) on the access control list.
Apart from the patient himself, only clinicians may have access to his records.

This seems incredibly strong and calls for encrypted data storage.

Principle 3

One of the clinicians on the access control list must be marked as being responsible. Only she may alter the access control list, and she may only add other health care professionals to it.

This may be one principle but these are three separate requirements:

  1. Write access to the ACL is controlled by an entry in the ACL.
  2. Only one member of an ACL may have write access
  3. Only health care professionals may be added.
I can't see one of these standing up. One of the default roles will have to be super user, of course it will.

Where access has been granted to administrators, as in the USA, the result has been abuse.

Confused reference to UK safe havens and their role in difusing dispute.

German and Canadian examples of hospital billing data: aggregation is done prior to billing government.

All legal outputs, required by police, social workers etc, should be produced on paper.

Principle 4

The responsible clinician must notify the patient of the names on his record's access control list when it is opened, of all subsequent additions, and whenever responsibility is transferred. His consent must also be obtained, except in emergency or in the case of statutory exemptions.

Principle 5

No one shall have the ability to delete the clinical information until the appropriate time period has expired.

There is always a need to correct data. Current version, like Wikipedia, would need links to previous versions.

Patient consent is not immutable, but rather a continuing dialogue between the patient and the clinician.

Principle 6

All accesses to clinical records shall be marked on the record with the subject's name, as well as the data and time. An audit trail must also be kept of all deletions.

Remember here subject means data accessor, not patient.

Principle 7

Information derived from record A may be appended to record B if and only if B's access control list is contained in A's.

If there exist two records referring to a patient, with different ACLs, where one record is more restricted than the other, should the existence of the restricted record be shown to users of the less restricted record?

We expect that clinicians will decide in favour of discrete flags that indicate only the presence of hidden information.

Principle 8

There shall be effective measures to prevent the aggregation of personal health information. In particular, patients must receive special notification if any person whom it is proposed to add to their access control list already has access to personal health information on a large number of people.
In this policy model, the primary control is notification, and the secondary control is to keep a list somewhere of who has accessed what record outside their own team.

Principle 9

Computer systems that handle personal health information shall have a subsystem that enforces the above principles in an effective way. Its effectiveness shall be subject to evaluation by independent experts.

White hat job creation scheme?

Ideally a non-NHS accreditation and monitoring organisation. A Trusted Computer Base (TCB) must not itself leak personal data.

X.509 is probably not enough.

Information access audit must be effective, that is it must be quite likely that an intruder will be caught.

Our approach has been to provide two auditors, both of whom have an interest in detecting abuse and acting upon it.

Firstly the patient and secondly an audit office eg the GMC.

'show me the records of all females aged 35 with 2 daughters aged 13 and 15 both of whom suffer from eczema'
Suppose that I walk into a hospital and claim that my demons are bothering me. When asked my name I reply 'John Major'. May the psychiatrist get the prime ministers record and append a diagnosis of schizophrenia?

Can we trust the patient to self identify? How do we treat patients anonymously? What about people who claim identity change?

Re the unified patient record:

The onus is on proposers of such systems to provide a clear statement of the expected health benefits, and to analyse the threats, the cost of added countermeasures and the likely effects of the residual risk.

Wednesday 7 November 2012

A Binary Chop Java method to discover memory limits

If you want to discover exactly how many Objects you can store in your JVM with its current memory configuration this little functional style method might help.

  public void testChop() { 
    // -Xmx=256m
    //assertEquals(33554433, between(1, Integer.MAX_VALUE));
    // -Xmx=1966m
      assertEquals(343535293, between(1, Integer.MAX_VALUE));
  }
 
  public int between(int from, int to) {
    if ((to - from) <= 1) 
      return (to);
    int i = from + ((to - from) /2);
    try { 
      Object[] a = new Object[i];
      a[1]="1";
      a=null;
      return (between(i, to));
    } catch (OutOfMemoryError e) {
      return (between(from, i));
    }      
  }

Another thing to note is the speed that this runs at: memory allocation can be very slow.

Obviously, in hindsight, this routine will always take 30 steps.

Tuesday 30 October 2012

Implementing Java interfaces: when is it OK to add extra methods?

I have just found myself offended by a usage and I want to write out why.


public interface Stack {
  public int size();
  public boolean isEmpty();
  public Object top()  throws StackEmptyException;    
  public void push (Object element);
  public Object pop()  throws StackEmptyException;   
}

public class ArrayStack implements Stack {

  ......

  public Object atPosition(int i) throws StackOutOfScopeException {
    if (i > tos)
      throw new StackOutOfScopeException("Attempt to pass top of stack");
    else
      return S[i - 1];
  }
}

My claim is that the atPosition should either be in the interface or not in the class, or ArrayStack should be called AugmentedArrayStack or that an intermediate interface called AugmentedStack which extends Stack is needed.

My expectation is that if a class implements only one interface and that interface is named for a Noun and does not have a pertinent name (eg ...Mixin) then the class will only implement methods declared in that interface (and possibly override those inherited from Object).

Sunday 21 October 2012

Preparation for Software Testing Course

  • The Impossibility of Complete Testing by Cem Kaner

    Bad software © 1997

    Article 2B October 30, 1998 of the UCC became UCITA which has only been adopted by two states (passed in 2000).

  • Exhausting Your Test Options by Douglas Hoffman

    By exhaustive testing a bug was caught, which would not have been spotted by inspection of edge cases.

  • The Patriot Missile Failure by Douglas N. Arnold

    Ironically, the fact that the bad time calculation had been improved in some parts of the code, but not all, contributed to the problem, since it meant that the inaccuracies did not cancel.

  • A bug and a Crash by James Gleick

    Fortunately, he points out, really important software has a reliability of 99.9999999 percent. At least, until it doesn't.

  • Other Papers from Testing Education

Case Study

Based upon The art of the elevator pitch describe your testing project.

Unifier

  • Open Source utility aimed at developers and office users.
  • A command line utility for the common problem of unifying spreadsheet sheets into one sheet.
  • A pure java command line tool.
  • A small utility with only 16 classes.
  • Code coverage testing has already been completed through test driven development and retrofitting to 100&percent;, however there are is a need to extend testing for unexpected and illegal input.

Melati

  • An Open Source ORM framework aimed at developers and office users.
  • A DSL for writing database backed systems, with especial focus on web pages.
  • A pure java stack on top of any JDBC capable database, typically used with a template engine.
  • 4792 files, 180k lines of code, 8 developers over 12 years.
  • Code coverage testing has already been completed through retrofitting. Still some benefits to be gained: caching, speed tests, technology changes.

Tuesday 16 October 2012

The truth about blogging the truth about health research: blogged

I attended a quick, lunchtime, course entitled Blogging the truth about health research.

Sadly @cebmblog could not attend, but ably deputised by @peterjgill.

This course was a quick tour of why the social web is important for primary care researchers.

Blogging helps to formulate your own thoughts.

It enables you to gain a style understandable by non-academics.

Blogging bypasses peer review, which is great for junior academics.

By informing others of your views before you meet in person blogging can enable the most productive use of face to face time.

Case study Rosiglitazone: the debunk message did not get out.

Sadly the lack of debunking led to a number of deaths before the drug stopped being widely used.

Case study: twitter as a collaborative tool

Autism brain scan: 20 adults with vs 20 without. 90% accuracy!!

This paper was summarised in a Wellcome Trust press release.

@bengoldacre tweeted that he could not find the original paper.

@cebmblog tweeted the original and blogged it. This was picked up by the Guardian and led to the press release being altered.

Twitter enables you to participate in the hive mind.

Trial by twitter: can be a bit traumatic.

PLOS - Measuring the impact of research

Before the web there were only slow, late methods of measuring research's impact

  • peer review
  • citation counting

PLOS - altmetrics

PLOS enables a number of measures of a papers impact. Altmetics are more nuanced and detailed

  • more complex
  • measure aggregate impact of research
  • fast and open
  • track impact outside of academia

Rather preaching to the choir, as far as this blogger is concerned, but a very good session, which gave us all something to think about.

Adding launcher icon for STS to Ubuntu 12.04

Googling for ubuntu add menu icon is an unfruitful experience, as Ubuntu keep changing the default windowing system.

I eventually had more luck with unity sts launcher

With the help of a colleague I arrived at:

  1. Create a file called ~/.local/share/applications/sts.desktop
    [Desktop Entry]
    Name=STS
    GenericName=Spring Tool Suite
    X-GNOME-FullName=Spring Tool Suite
    Comment=Java Editor from SpringSource
    Exec=/home/timp/bin/sts
    Icon=/home/timp/springsource/sts-3.1.0.RELEASE/icon.xpm
    Terminal=false
    Type=Application
    Categories=GNOME;
    StartupNotify=true
    X-Ubuntu-Gettext-Domain=sts
    
    
  2. Invoke by searching for sts in Unity search ("Unity Dash Home")
  3. When application is invoked right click on menu icon and click Lock to Launcher

Note that full paths are required for files referred to in the .desktop content model; tilde form file names do not work.

Tuesday 18 September 2012

Creating a Bare Git Repository and Polling it

What we are trying to do is create a database on one machine, dump it in a way which is most diff friendly and commit the dump to a change control system (git).

On another machine we want to poll git and when there is a change rebuild the database.

Creating a Bare Repository

A bare repository is one which does not have a working copy checked out.

A bare repository is how you implement a system similar to a CVS or SVN repository: a hub repository.


sudo su
apt-get install git-core
cd /srv
mkdir git
mkdir git/repository.git
cd git/repository.git
git init --bare

cd ..
chown -R jenkins:tomcat7 repository.git

Done.

To create the database on another machine first time through:

git clone timp@app4.wwarn.ox.ac.uk:/srv/chassis-data/git/repository.git
cd repository 
echo 'create database r2;' |mysql -u root

mysql -u root r2 < repository.sql

Now we can poll for changes:

git fetch > build_log.txt 2>&1
if [ -s build_log.txt ]
then
   git pull
   echo 'drop database r2;' | mysql -u root
   echo 'create database r2;' | mysql -u root
   mysql -u root r2 < repository.sql
fi

Then just pop it into Jenkins

Monday 17 September 2012

Ensuring Jenkins can delete files created by Tomcat7

The problem is that Jenkins is not root. In the past the work Jenkins is doing would be done by root invoked from cron.

Our scenario is to build and deploy to Tomcat7 and then invoke the deployed web service from Jenkins. The web service downloads a lot of files to a cache. Once a week we want to clear the cache, from Jenkins, but the files in the cache are owned by Tomcat7.

The solution is to put Jenkins and Tomcat7 into the same group.

Edit the tomcat startup script /etc/init.d/tomcat7 and change the umask value to 002.

start)
if [ -z "$JAVA_HOME" ]; then
log_failure_msg "no JDK found - please set JAVA_HOME"
exit 1
fi
if [ ! -d "$CATALINA_BASE/conf" ]; then
log_failure_msg "invalid CATALINA_BASE: $CATALINA_BASE"
exit 1
fi
log_daemon_msg "Starting $DESC" "$NAME"
if start-stop-daemon --test --start --pidfile "$CATALINA_PID" \
--user $TOMCAT6_USER --exec "$JAVA_HOME/bin/java" > /dev/null; then
# Regenerate POLICY_CACHE file
umask 002
echo "// AUTO-GENERATED FILE from /etc/tomcat6/policy.d/" > "$POLICY_CACHE"
echo "" >> "$POLICY_CACHE"
cat $CATALINA_BASE/conf/policy.d/*.policy >> "$POLICY_CACHE"

Make Jenkins a member of the tomcat7 group.


usermod -G tomcat7 jenkins

Restart tomcat7 and Jenkins


/etc/init.d/tomcat7 restart
/etc/init.d/jenkins restart

change premisions on existing cache


chmod -R g+w .

Run Jenkins manually:

The job should succeed next weekend.

Friday 14 September 2012

Rebuilding MySQL databases without filling your disk

To reclaim disk space used by an old MySQL database it is not sufficient to drop the database. This is a famous 2003 bug. It is unbelievable that an infrastructure element as crucial and ubiquitous as MySQL has any outstanding bugs, let alone loads.

Assuming you are using InnoDb tables (the only sane choice if you want normal SQL functionality, such as transactions) then by default the data will be stored all on one big file. You will not be allowed to move the data unless you configure appArmor.

Another approach is to dump, drop, delete and recreate your databases.

Machine spec

The machine is a VM running Ubuntu

Linux version 3.2.0-23-generic (buildd@crested) (gcc version 4.6.3 (Ubuntu/Linar
o 4.6.3-1ubuntu4) ) #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012

The default MySQL setup is not great so setup some more sensible ones. Additionally lest put databases in separate files:

[mysqld]

innodb_file_per_table

I am, writing this from the position of recovering from a filled disk, with a single innodata1 file. I have more space on another partition but you cannot just use a symlink because of this bug. So the first we need to get to a working position, DON'T


cd /var/lib/mysql
rm ibdata1
rm ib_logfile0
rm ib_logfile1

So lets move our modified /etc/mysql/my.cnf out of the way and reinstall mysql.


mv /etc/mysql/my.cnf . 
apt-get remove --purge mysql-server
apt-get install mysql-common
apt-get install mysql-server

However this still left stuff hanging around, so:


rm -rf /etc/mysql
rm -rf /var/lib/mysql

Then reinstall and copy our saved config back:


apt-get install mysql-server
/etc/init.d/mysql stop
cp ~/my.cnf /etc/mysql/
/etc/init.d/mysql start

FAT CHANCE.


apt-get install apparmor-utils
aa-complain /usr/sbin/mysqld

NOPE


apt-get remove --purge apparmor
/etc/init.d/mysql stop
cp ~/my.cnf /etc/mysql/
/etc/init.d/mysql start


/etc/init.d/mysql stop
rm /var/lib/mysql/ib_logfile0
rm /var/lib/mysql/ib_logfile1
/etc/init.d/mysql start

Setup mysql user


create user 'repo_builder'@'localhost' identified by 'scrt';
grant all privileges on *.* to 'repo_builder'@'localhost' with grant option;

create database chassisPruned;
create database curated_files;

Restart Jenkins then kick off the job


/etc/init.d/jenkins start

Just need to enable this process to be repeated before each build...

Tuesday 11 September 2012

HTC Desire Android will not power off - SOLVED

My HTC Desire lost the ability to power off. Holding down the off button would send it to sleep, but the power down menu no longer appeared.

Initial googling suggested that this was due to an update, but as my wife and i have the same model on the same network I thought that unlikely.

I found the answer here Lost power menu (Android Revolution HD™ 6.1.1), it is described as an odd bug in 6.1: the issue with this can be resolved by activating secure credentials. go to settings, security and check use secure credentials and you'll be able to access your power menu again

Goto Settings/Security/Use secure credentials
(This will only be available if you have set a password for the credential store)
Check Use secure credentials.
Magic: power down now works.
you can uncheck it again and it continues to work.

Tuesday 7 August 2012

Trusting a secured Maven repository

There are two parts to this

Set up ~/.m2/settings.xml

<settings 
    xmlns="http://maven.apache.org/SETTINGS/1.1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd"> 
  <servers> 
    <server> 
       <id>wwarn-read</id> 
       <username>mavenuser</username>
       <password>*****</password>
    </server>
<!-- for deployment -->
    <server>
      <id>wwarn</id>
      <username>user</username>
      <privateKey>/home/user/.ssh/id_rsa</privateKey>
      <passphrase>mypassphrase</passphrase>
      <directoryPermissions>775</directoryPermissions>
      <filePermissions>664</filePermissions>
    </server>
  </servers> 
  <profiles> 
    <profile> 
      <id>wwarn</id> 
      <activation> 
        <activeByDefault>true</activeByDefault> 
      </activation> 
      <repositories> 
        <repository> 
          <id>wwarn-read</id> 
          <url>https://maven.wwarn.org/</url> 
        </repository> 
      </repositories> 
    </profile> 
  </profiles>
</settings>

See http://maven.apache.org/guides/mini/guide-deployment-security-settings.html

Set up self-signed cert from maven.wwarn.org

The default password really is changeit. I would advise not changing it :)


sudo su
scp maven.wwarn.org:/etc/ssl/certs/server.crt .
cp /etc/ssl/certs/java/cacerts /etc/ssl/certs/java/cacerts.$$.bak
# The default password is changeit
# If this is the second time through
keytool -delete -alias wwarn-maven-repo -file server.crt \
  -keystore /etc/ssl/certs/java/cacerts
keytool -import -alias wwarn-maven-repo -file server.crt \ 
  -keystore /etc/ssl/certs/java/cacerts

Or on windows something like


keytool.exe -import -file c:\users\richardc\Desktop\stuff\server.crt -alias maven.wwarn.org -keystore "C:\Program Files\Java\jdk1.6.0_21\jre\lib\security\cacerts"

Monday 9 July 2012

Tools for Total Test Coverage in Java

100 percent test coverage is the starting point, not the impossible end point.

For interpreted languages every line must be exercised by the tests or you will discover a syntax error caused by a typo during a demo. The tests take the place of the compiler in other languages.

For compiled languages the main, or original, purpose of tests is to ensure against regressions: new code breaking old code.

A relatively new purpose for tests has been as the medium of Test Driven Development. The tests are written before the code, as scaffolding, and high test coverage is a side effect.

Perfectly reasonable unit tests in Java can lead to less than complete coverage. Tests which assert that the right thing happens under normal circumstances are of great value in ensuring backwards compatibility. The value of the remaining tests can be questioned: why test exception handling? This question should perhaps be reversed: if it is not worth testing then why is it in the code?

So lets take for granted that a high percentage test coverage is normal. I want to argue that it is worthwhile putting in the effort to get from high to total test coverage.

Reasons to achieve total coverage

Martin Fowler in his article on Test Coverage sets up a straw man that is really a rant against Goodhart's Law : writing tests to hit a percentage is not the reason to write tests. By achieving 100 per cent coverage you finesse the whole 'acceptable percentage' argument.

In my experience there are bugs in uncovered code. Uncovered code is likely to be code that has never been run. How often do programmers write code which runs as expected first time?

The final push to one hundred per cent can be seen as a separate review stage of the project, highlighting problems with algorithms or the remnants of ideas which were abandoned during development. This is particularly valuable in one person projects, where changing to the review perspective can be difficult.

A practical benefit to the programmer of Total Test Coverage, if you are guided by a test coverage tool such as Cobertura and JaCoCo, is that you do not keep returning to the same nearly-covered class, to remember you have tested it to the current limits of your tools. Removing the need to remember that there is a good reason why a particular class is not fully covered is a significant benefit.

One such limit in Java has been with the language since its creation. It is caused by exception handling forced on the developer by language classes which throw checked exceptions (IOException I'm looking at you) or by checked exceptions in an API (eg SQLException in the JDBC API).

A silly snag is the private constructor pattern. This involves writing a constructor with private access for static classes to ensure that its methods are only accessed statically. An unwanted side effect of this approach is that the constructor is never called and so shows up as uncovered in our coverage report. There is a mechanism for instantiating these objects in tests, which is so ugly it would never be used in live code, is shown.

Another limit to complete coverage I have discovered was introduced with Java Generics: inaccessible generated bridging methods.

Here I present techniques for addressing these issues to remove the barriers to complete coverage.

Instantiating static classes with private constructors

We do not care about the non-coverage of constructors which have private access, we just don't want them to appear on the report and we do not want to clutter the code with annotations to instruct the coverage tool to ignore the method.

  @Test
  public void testPrivateConstructors() throws Exception {
    excercisePrivateConstuctor(StaticUtils.class);
  }

  public static void excercisePrivateConstuctor(final Class<?> clazz)
      throws NoSuchMethodException, InvocationTargetException,
      InstantiationException, IllegalAccessException {
    assertTrue("There must be only one constructor", clazz.getDeclaredConstructors().length == 1);
    final Constructor<?> constructor = clazz.getDeclaredConstructor();
    assertTrue("The constructor is accessible", constructor.isAccessible() == false);
    assertTrue("The constructor is not private", Modifier.isPrivate(constructor.getModifiers()));
    constructor.setAccessible(true);
    constructor.newInstance();
    constructor.setAccessible(false);
    for (final Method method : clazz.getMethods()) {
      if (method.getDeclaringClass().equals(clazz)) {
        assertTrue("There exists a non-static method:" + method,
            Modifier.isStatic(method.getModifiers()));
      }
    }
  }

Calling hidden generated methods by introspection

During the development of a CSV Unifier I discovered that the bridging methods generated by Java to enable generics are shadowed and hence not available to the programmer other than by introspection.

My model of a CSV sheet implements the Java generic interface Map, ie a sheet is a Map<String, CsvRecord> and a CsvRecord is a Map<String, CsvField>.

The generic interface method Map.put

    V put(K key, V value);

The implementation which, due to generic type erasure, has the same type as the non-generic form.

  @Override
  public CsvField put(String key, CsvField field) {
    if (!field.getColumn().getName().equals(key))
      throw new CsvInvalidKeyException(
          "Key (" + key + ") not equal to " +
          "Field column name (" + 
          field.getColumn().getName() + ")");
    return addField(field);
  }

Bridging methods

During the compilation of generic code Java quietly generates what are called bridging methods these are visible to Cobertura but not to you, the coder, so Cobertura tries to tell you that you have not exercised these methods by marking the class definition line as not covered. The following code will print out all methods including generated bridging methods

for (Method m : CsvTable.class.getMethods()) {
  System.out.println(m.toGenericString());
}
public net.pizey.csv.CsvRecord get(java.lang.Object)
public java.lang.Object get(java.lang.Object)
public net.pizey.csv.CsvRecord put(java.lang.String,net.pizey.csv.CsvRecord)
public java.lang.Object put(java.lang.Object,java.lang.Object)
public net.pizey.csv.CsvRecord remove(java.lang.Object)
public java.lang.Object remove(java.lang.Object)

The put(Object key, Object value) method cannot be accessed normally as it is masked by our generic version.

We can however invoke it using introspection:

public void testBridgingPut() {
  CsvTable t = new CsvTable("src/test/resources/sheet2.csv", 
                 UnificationOptions.LOG);
  Object o = t.get((Object) "1");
  Method method = t.getClass().getMethod("put", 
                    new Class[] { Object.class, Object.class });
  method.invoke(t, "jj", o);
}

Another approach to bridging methods is drastic: do not implement the generic version. The interface Clonable is not a generic interface, but it can be applied to generic classes. The generic version of clone() for the CsvRecord class would return a generic class CsvRecord

  @Override
  public CsvRecord clone() {
    return this.clone(this.getTable());
  }

Unfortunately this would generate a hidden, non-generic, bridging method:

  @Override
  public Object clone() {
    return this.clone(this.getTable());
  }

This bridging method could only be accessed by introspection. In this case I recommend writing the non-generic version in your code, avoiding it being generated for you.

JDBC Exception handling sniper

All calls to the JDBC API throw SQLException. Databases are expensive to mock and best practice is to test against a pure java database such as HSQLDB or Apache Derby. However this will lead either to untested catch blocks or declaring SQLException in your method calls.

To address this I wrote a JDBC Error Injector which enables the programmer to target a particular exception handling block and exercise it.

How it works

Each interface within the API has a decorator whose constructor takes  an instance. Any of the methods which return another instance of the API  will now return a decorated instance.

The decorated instance can be told to throw an Exception, either  whenever it is called or after being called a number of times. This enables you to cover cases that would otherwise be impossible to  cover without a custom mock.

Using a decorated driver

You can decorate any JDBC Driver in your tests such that it returns  a ThrowingConnection  instead of a Connection,  then use the ThrowingConnection  as you would the Connection

package org.melati.poem.dbms.test.sql;

import java.sql.Driver;

import org.hsqldb.jdbcDriver;
import org.melati.poem.dbms.test.sql.ThrowingDriver;

/**
 * A decorated Hsqldb jdbcDriver.
 */
public class HsqldbThrowingJdbcDriver 
    extends ThrowingDriver 
    implements Driver {

  public HsqldbThrowingJdbcDriver() {
    super(new jdbcDriver());
  }
}

The package provides the following classes.

package org.melati.poem.dbms.test.sql;

import java.sql.Driver;

/**
 * A {@link Driver} decorated to throw an SQLException on command.
 *
 */
public class ThrowingDriver 
    extends ThrowingDriverVariant 
    implements Driver {

  /**
   * Constructor.
   * @param d the driver to decorate
   */
  public ThrowingDriver(Driver d) {
    it = d;
  }
}

The code is complicated by conditional inclusion of JDBC3 or JDBC4 code dependant upon JDK version:

package org.melati.poem.dbms.test.sql;

public abstract class ThrowingDriverVariant extends ThrowingDriverJdbc4 {

}
public abstract class ThrowingDriverJdbc4 
    extends ThrowingDriverJdbc3 
    implements Driver {
}

For each method which returns an object from within the API, ie a JDBC object, the shouldThrow method is called to check whether this method should throw an exception on this invocation.

package org.melati.poem.dbms.test.sql;

import java.sql.Connection;
import java.sql.Driver;
import java.sql.DriverPropertyInfo;
import java.sql.SQLException;
import java.util.Properties;

/**
 * The JDBC3 members of a {@link Driver}, decorated to throw 
 * an SQLException on command.
 * 
 * @author timp
 * @since  5 Feb 2008
 *
 */
public abstract class ThrowingDriverJdbc3 
    extends Thrower 
    implements Driver {

  Driver it = null;


  /**
   * {@inheritDoc}
   * @see java.sql.Driver#acceptsURL(java.lang.String)
   */
  public boolean acceptsURL(String url) throws SQLException {
    if (shouldThrow(this.getClass().getInterfaces()[0], "acceptsURL"))
      throw new SQLException("Driver bombed");
    return it.acceptsURL(url);
  }

  /**
   * Return the decorated Connection.
   * {@inheritDoc}
   * @see java.sql.Driver#connect
   */
  public Connection connect(String url, Properties info) 
      throws SQLException {
    if (shouldThrow(this.getClass().getInterfaces()[0], "connect"))
      throw new SQLException("Driver bombed");
    return new ThrowingConnection(it.connect(url, info));
  }

  /** 
   * {@inheritDoc}
   * @see java.sql.Driver#getMajorVersion()
   */
  public int getMajorVersion() {
    return it.getMajorVersion();
  }

  /** 
   * {@inheritDoc}
   * @see java.sql.Driver#getMinorVersion()
   */
  public int getMinorVersion() {
    return it.getMinorVersion();
  }

  /** 
   * {@inheritDoc}
   * @see java.sql.Driver#jdbcCompliant()
   */
  public boolean jdbcCompliant() {
    return it.jdbcCompliant();
  }

  /** 
   * {@inheritDoc}
   * @see java.sql.Driver#getPropertyInfo
   */
  public DriverPropertyInfo[] getPropertyInfo(
      String url, Properties info) throws SQLException {
    if (shouldThrow(this.getClass().getInterfaces()[0], "getPropertyInfo"))
      throw new SQLException("Driver bombed");
    return it.getPropertyInfo(url, info);
  }  
}

Each JDBC Class is decorated by extending the Thrower class:

package org.melati.poem.dbms.test.sql;

import java.util.Hashtable;


/**
 * A class which can throw on demand.
 * 
 * @author timp
 * @since 10 Feb 2007
 *
 */
public abstract class Thrower {
  
  static Hashtable throwers = new Hashtable();

  protected Thrower() {}
  
  /**
   * Tell named method to start throwing exceptions.
   * @param i Interface class object
   * @param methodName name in class.methodName format
   */
  public static void startThrowing(Class i, String methodName) {
    String fullName = i.getName() + "." + methodName;
    throwers.put(fullName, new Integer(1));
  }
  /**
   * Tell named method to start throwing exceptions.
   * @param i Interface class object
   * @param methodName name in class.methodName format
   */
  public static void startThrowingAfter(
      Class i, String methodName, int goes) {
    String fullName = i.getName() + "." + methodName;
    throwers.put(fullName, new Integer(1 + goes));
  }
  /**
   * Tell named method to stop throwing exceptions.
   * @param i Interface class object
   * @param methodName name in class.methodName format
   */
  public static void stopThrowing(Class i, String methodName) {
    String fullName = i.getName() + "." + methodName;
    throwers.put(fullName, new Integer(0));
  }
  /**
   * Check whether method should throw, 
   * called once for every method invocation.
   * @param i Interface class object
   * @param methodName name in class.methodName format
   * @return whether method named should throw exception
   */
  public static boolean shouldThrow(Class i, String methodName) {
    String fullName = i.getName() + "." + methodName;
    if (throwers.get(fullName) == null) 
      throwers.put(fullName, new Integer(0));
    int toGo = ((Integer)throwers.get(fullName)).intValue(); 
    if (toGo == 0)  
      return false;
    else { 
      toGo = toGo - 1;
      throwers.put(fullName, new Integer(toGo));
      return toGo == 0 ? true : false;
    }
  }

}

Simple Example

You notice that there is uncovered Exception handling associated  with failure of ResultSet.close() during database initialisation, tested here in the method getDb().

  public void testConnect() {
    ThrowingResultSet.startThrowing(ResultSet.class, "close");     
    try { 
      getDb();
      fail("Should have blown up");
    } catch (SQLSeriousPoemException e) {
      assertEquals("ResultSet bombed",
                   e.innermostException().getMessage());
    }
    ThrowingResultSet.stopThrowing(ResultSet.class, "close");
  }

Sub-classed Test Example

The test (DatabaseTest) is written to test the functionality in the normal way, then subclassed with a ThrowingConnection  to test the exception handling.

public class org.melati.poem.test.throwing.DatabaseTest 
     extends org.melati.poem.test.DatabaseTest {

  private static Database db;

  // connect using throwing driver
  private static Database getDb() {
    db = new PoemDatabase();
    db.connect("m2", "org.melati.poem.dbms.test.HsqldbThrower", 
            "jdbc:hsqldb:mem:m2", 
            "sa", 
            "",
            4);
    assertEquals("org.melati.poem.PoemDatabase",
                 db.getClass().getName());
    assertEquals(4, db.getFreeTransactionsCount());
    return db;
  }

  public void testFirstObject() {
    ThrowingResultSet.startThrowing(ResultSet.class, "next");
    try { 
      super.testFirstObject();
      fail("Should have bombed");
    } catch (SQLSeriousPoemException e) { 
      assertEquals("ResultSet bombed",
                   e.innermostException().getMessage());
    }
    ThrowingResultSet.stopThrowing(ResultSet.class, "next");
  }
}

Throw on Third Call Example

The exception handling you want to excercise is actually the third call  to that method in your test's trajectory to the method under test.

public void testGetObjectInt() {
  ThrowingConnection.startThrowingAfter(
      Connection.class,"prepareStatement", 2);
  try { 
    super.testGetObjectInt();
    fail("Should have blown up");
  } catch (SimplePrepareFailedPoemException e) { 
    assertEquals("Connection bombed", 
                 e.innermostException().getMessage());
  } finally { 
    ThrowingConnection.stopThrowing(Connection.class, 
                                    "prepareStatement");
  }
}

I hope that this is of use, or helps you scratch that itch and that you too can achieve total test coverage.

Thursday 21 June 2012

Debian setup

# used by jenkins
apt-get install daemon
#apt-get install jenkins
apt-get install tomcat6
apt-get install tomcat6-admin
apt-get install curl
apt-get install r-base
apt-get install r-cran-rmysql

apt-get install texlive-xetex
apt-get install texlive-science
apt-get install texlive-latex-extra
apt-get install subversion
apt-get install cvs
apt-get install git
apt-get install wkhtmltopdf
apt-get install okular

apt-get install emacs23-nox

apt-get install jedit

apt-get install git-svn

apt-get install mailutils # smarthost

apt-get install graphviz


R
> install.packages("Rserve")
> install.packages("car") # may not be used
> install.packages("AER") #used by ParasiteClearance
> install.packages("DBI") #used by clinRepRep
> install.packages("RMySQL") #used by clinRepRep
>

Others currently installed on app-dev:
Cairo                   R graphics device using cairo graphics library
                        for creating high-quality bitmap (PNG, JPEG,
                        TIFF), vector (PDF, SVG, PostScript) and
                        display (X11 and Win32) output.
car                     Companion to Applied Regression
Formula                 Extended Model Formulas
lmtest                  Testing Linear Regression Models
sandwich                Robust Covariance Matrix Estimators
strucchange             Testing, Monitoring, and Dating Structural
                        Changes
zoo                     S3 Infrastructure for Regular and Irregular
                        Time Series (Z's ordered observations)


cairoDevice             Cairo-based cross-platform antialiased graphics
                        device driver.
Rserve                  Binary R server

chmod o+w /var/lib/tomcat6/webapps/ cp  /home/timp/.m2/repository/org/springframework/spring-instrument-tomcat/3.0.5.RELEASE/spring-instrument-tomcat-3.0.5.RELEASE.jar /usr/share/tomcat6/lib/
Ensure /etc/tomcat6/tomcat6-users.xml contains
<role rolename="poweruser" />
   <role rolename="poweruserplus" />
   <role rolename="probeuser" />
 
   <user username="admin" password="" roles="manager,admin" />

Get scrollbars in Eclipse:


 sudo su -c 'echo export LIBOVERLAY_SCROLLBAR=0 > /etc/X11/Xsession.d/80overlayscrollbars'

Get innotop working


perl -MCPAN -e shell
CPAN> install Term::ReadKey 

Ensure Rserve and Jenkins start at reboot


cd /etc/init.d
update-rc.d jenkins defaults
update-rc.d Rserve defaults

Set up networking.

Set up GMail as smart host for postfix.

Send me the IP address on boot.

Saturday 9 June 2012

Siemens S16-39 washing machine: code F18 - fixed

Our Siemens S16-39 washing machine started to display F18 and beeping (not the normal 'finished' beep).

We switched it off. Googling suggested that the impeller was broken/unable to move.

Switching back on made a straining noise - switched it off again.

I turned the large pipe (front right bottom) through ninety degrees.

Switched back on, turned to Empty. No bad noise and emptying started.

When empty I placed a towel under outlet and completely undid the large pipe.

There was human hair, a shirt stiffener and piece of slate, removed these and all well!

Friday 1 June 2012

Representing empty CSV columns in a database

The particular case I want to nail down is a CSV file where field values are not quoted:

key,name,options,
1,one,,

The example can be represented as a three field table with non-nullable fields key and name and a nullable field options (yes I know about the trailing comma - they do occur in the wild).

For these purposes the data can be divided into Strings and others, as an empty string ("") is not a legal value for any other type.

The case is straight forward for XML. Paired tags with no content represent a zero length string. Unpaired tags (empty elements) or missing tags represent null.

For CSV the situation is more confused. When importing CSV files there are three possibilities for a column: it is not present, empty or filled with a value.

If the column is missing then the value in the database should be null.

If the column is present and empty and the column type is not String then the value should be Null; for a String column the value might be either an empty string or null: we have to decide.

Both string treatments can be argued for:

  1. A present, empty column, represented by two consecutive commas, is just the limiting case of a normal string and so represents a string of zero length. This ensures that no null value will ever enter that column.
  2. String handling should be consistent with the handling of other types where an empty column represents null, hence an empty string is an illegal value in a non-nullable column.

The problem with position 1. is that there is no way to represent null,
the problem with position 2. is that there is no way to represent the empty string.

Our hand is forced, as it happens, by the behaviour of MySQL, which in characteristic fashion chooses the wrong option.

MySQL chooses option 2, so if we wish to gain nulls we must either hand craft the import or post-process the column.

I have chosen to post process all nullable string columns to convert empty strings to null.

I do not believe there is a use case for storing zero length strings in nullable database columns.

Thursday 17 May 2012

Networking, and hence sshd, does not start until after login on Ubuntu

I updated my Ubuntu system and restarted it, as prompted, and left the office.

When I got home I could not ssh into my work machine.

This was because I had not logged into the machine and so networking and hence sshd had not started.

To get networking started before login add the following to /etc/network/interfaces

auto eth0
iface eth0 inet dhcp

Wednesday 2 May 2012

Setup Postfix to use Gmail as a smarthost

If you wanted Jenkins to send emails via gmail for instance.


 sudo apt-get install mailutils 

 echo timp.well.ox.ac.uk > /etc/mailname
 emacs /etc/postfix/main.cf

smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
html_directory = /usr/share/doc/postfix/html

 emacs /etc/postfix/sasl_passwd # create password file

smtp.gmail.com GMAIL_USER_NAME@gmail.com:GMAIL_USER_PASSWORD

 postmap hash:/etc/postfix/sasl_passwd
 /etc/init.d/postfix reload

Send me the DHCP allocated IP address on boot

Occasionally I reboot, occasionally I get allocated a new IP address.

Create /etc/init.d/sendIpToTimP :

#!/bin/sh                                                                                                                                      
ifconfig eth0 \ 
 |grep "inet addr" \
 |cut -f2 -d":" \
 |cut -f1 -d" " \ 
 | mail timp@paneris.org -s "Work IP address" 


 update-rc.d sendIpToTimP defaults

Thursday 19 April 2012

Configuring MySQL

The defaults that MySQL comes with are probably not what you want.

We got caught by this at work and followed the instructions given in How to calculate a good InnoDB log file size

Edit /etc/mysql/my.cnf to reflect:
query_cache_limit = 4M
query_cache_size  = 32M
table_cache       = 768

innodb_open_files = 512

innodb_buffer_pool_size = 512M

innodb_additional_mem_pool_size = 512M
innodb_log_file_size = 128M
innodb_log_buffer_size = 8M
innodb_thread_concurrency = 8
innodb_concurrency_tickets = 500
innodb_lock_wait_timeout=200
innodb_autoinc_lock_mode= 2
innodb_commit_concurrency=4
innodb_flush_log_at_trx_commit=2
innodb_support_xa=false
innodb_checksums=0
innodb_doublewrite=0
innodb_max_dirty_pages_pct=15

Discover the location of datadir typically datadir = /var/lib/mysql

service mysql stop
mv /var/lib/mysql/ib_logfile0 /var/lib/mysql/ib_logfile0_old
mv /var/lib/mysql/ib_logfile1 /var/lib/mysql/ib_logfile1_old
service mysql start

Thursday 9 February 2012

Jenkins, Github and Blogger make sharing workable

I was refactoring a project, which would entail deleting a little R script which I could see myself wanting to use again.

Step one: create a new repository on GitHub

Step two: Checkout the repository locally, add the code, test, commit.

Step three: Create a new project in your Jenkins CI installation (if you do not have Jenkins, go sign up for a free account at CloudBees)

Write up at Blogger - here we are!

Friday 20 January 2012

MySQL VIEW error "Table doesn't exist"

mysql> select table_name
    -> from information_schema.tables
    -> where table_type='VIEW';
+---------------+
| table_name    |
+---------------+
| ClinicalDrugs |
| Studies       |
| StudySites    |
| pkAnalytes    |
| pkSamples     |
| pkdetails     |
+---------------+
6 rows in set (0.00 sec)
mysql> select * from chassisDb.pkDetails;
ERROR 1146 (42S02): Table 'chassisDb.pkDetails' doesn't exist
The listing gives the named view but we are told it does not exist. I was also able to provoke
mysql> select * from viewname;
ERROR 1356 (HY000): View 'dbname.viewname' references invalid table(s) or 
column(s) or function(s) or definer/invoker of view lack rights to use them

If you try to access the view from JDBC then you will see

java.sql.SQLException: View 'chassisDb.Studies' references invalid table(s) or 
column(s) or function(s) or definer/invoker of view lack rights to use them
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)

This is caused by MySQL's weird, newish, security defaults for the CREATE VIEW command. 

create view Studies as 
select * from Entry, Study where Study.EntryId = Entry.Id;
is translated, after defaults are applied, to
CREATE VIEW SQL SECURITY DEFINER Studies AS 
SELECT  Entry.Id, Study.Id FROM Entry, Study WHERE Study.EntryId = Entry.Id;
If you now try to use this view as any user other than the definer of the view you will get the error above. What you wanted, and have to write explicitly to avoid the default, is
CREATE VIEW SQL SECURITY INVOKER Studies AS 
SELECT  Entry.Id, Study.Id FROM Entry, Study WHERE Study.EntryId = Entry.Id;

This page in the hope that it helps!

Thursday 19 January 2012

Simple MySQL reporting database

We have created an XML based repository of clinical trials study data.

To report upon this we create an isomorphic MySQL database using HyperJAXB. Our main configuration of hyperjaxb is to use an information preserving naming convention plugin.

The builds are under Continuous Integration. The database is created during the course of the Maven build, after this views are created and the database dumped.

mysqldump chassisDb  --complete-insert --skip-opt --add-drop-table  > $file

The dump file is then copied to an Amazon Web Services T1 instance, which only has MySQL installed. Two users exist: reader and uploader.

These two users have usernames and passwords configured in .my.cnf in their home directories containing

[client]
user=user
password=password

uploader has a cron job configured to update the database from any uploaded files:

mysql chassisDb < chassisDb_20120119.sql 

The database can now be accessed using the Pentaho reporting tool.