Friday 29 July 2011

Data Sharing: The Quality Dilemma

Posted - Matt Beare: Earlier this year the ESDIN project concluded; a collaborative project "Underpinning the European Spatial Data Infrastructure with a best practice Network" that has occupied much of my time since 2008.

Throughout this period, the research, the meeting of people, the development of new ideas and the application of best practice have afforded me the opportunity to learn about the spatial data sharing aims of INSPIRE and the needs of the community it seeks to bring together. Importantly, it has taught me to look upon INSPIRE not as an end goal, but as a facilitator, a means to a greater good. A greater good that will be different for everyone, which means everyone needs to work out what it means to them or their business or their nation or our environment.

So at this year's INSPIRE conference I was encouraged to see many seeking to do just this, contemplating "going beyond INSPIRE" (an often used phrase during the conference). In doing so, the conference stimulated debate; around the views of whether too much prescription will stifle innovation or whether specification and compliance are necessary to ensure data is combinable, accessible and usable.

Recent blogs, such as those from Rob Dunfey and Don Murray, have continued the dialogue and offered further observations on this matter and the need to be practical.

I can empathise with both sides of the debate, and in my own presentation on "Driving Government Efficiency with Improved Location-based Data", I condensed the data sharing challenges that INSPIRE seeks to encourage us to address, to five points:
  1. Facilitate collaboration (bringing people together)
  2. Increase availability (publicising what data exists and how to get it)
  3. Ease accessibility (use the web for what its good at, accessing information quickly)
  4. Ensure compatibility (ensuring common exchange and knowledge of what the data means)
  5. Improve quality (understanding fitness for purpose)
Ultimately I feel all are important if the full potential of data sharing is to be realised, but I also understand that there are benefits to be had in approaching the challenges in a chronological order.

I think most would agree that INSPIRE is succeeding in the first of these, mobilising governments, organisations and individuals to engage with each other and reach out to the opportunity to, quite simply, achieve what they have long needed to achieve; to share data more intelligently in order to better accomplish existing and future business needs.

We now have the prospect to succeed in the second and third of these, but as one side of the debate suggested, only if we don't get too bogged down with the specifics of the fourth. That's not to say that data doesn't need to be compatible and combinable, but in the first instance just get the data out there. This in itself gave rise to an interesting discussion around being swamped in unintelligible data versus having too little intelligible information. In reality we need both, the innovators amongst us will do wonders with the former, whilst decision makers and policy makers need the latter (and more of it).

So on to the fifth challenge – quality – and is this another obstacle to data availability or is it crucial to making good decisions and achieving business objectives? Again the answer is both. The conference plenaries gave insight to the viewpoints this poses, with quotes like "undocumented data quality leads to unmeasured uncertainty” (Leen Hordijk, Joint Research Centre) and “accessibility is more important than data quality" (Ed Parsons, Google).

The concern for many is that the fear of having data categorised as "bad quality" will mean that data providers may withhold data, which is counter-intuitive to the aspirations of INSPIRE and other Directives, like PSI, seeking the effective re-use of public sector information.

But what is quality? ISO 9000 defines it as the "Degree to which a set of inherent characteristics fulfils requirements".

So for the user scenario that no one knows about yet, there are no requirements, therefore quality has no immediate importance. But as soon as the use case becomes known then the question of "is the data fit for that purpose?" becomes prevalent.

Any application, analysis or decision using data, without knowledge of its suitability to the business purpose at hand, will carry an unknown element of risk. This risk will need to be balanced against the value of the application or the decision being made, which relates to the presentation of Roger Longhorn (Compass Informatics) on assessing the value of geo data, where he asks whether the value of data is determined by the value of the services it supports or the decisions being made. Here value and quality become inexplicably linked, and one will value and should demand quality, if the services provided or the decisions being made are of value.

So "quality is important", but it's not until you know what business purpose it fulfils that it really comes of value. It is then that data providers need to be able to react to these new user needs, as fresh customer-supplier relationships are formed, and provide information and assurances on the quality of data for those known purposes.

That's why here at 1Spatial we make quality our business, providing automated data validation services and improvement processes, enabling data providers and custodians of data to rapidly respond to new and changing user requirements.

So, if the services you provide and the business decisions you make are of value, then value the data that underpins the services and informs the decisions, and demand knowledge of its quality (specific to your requirements), enabling you to trust the data, manage the risk and share with confidence.

I'd like to know what you think. How important is data quality to you? Is it seen as just a technical exercise or is your data a corporate asset that is relevant and valued by your business?