Friday, 29 July 2011

Data Sharing: The Quality Dilemma

Posted - Matt Beare: Earlier this year the ESDIN project concluded; a collaborative project "Underpinning the European Spatial Data Infrastructure with a best practice Network" that has occupied much of my time since 2008.

Throughout this period, the research, the meeting of people, the development of new ideas and the application of best practice have afforded me the opportunity to learn about the spatial data sharing aims of INSPIRE and the needs of the community it seeks to bring together. Importantly, it has taught me to look upon INSPIRE not as an end goal, but as a facilitator, a means to a greater good. A greater good that will be different for everyone, which means everyone needs to work out what it means to them or their business or their nation or our environment.

So at this year's INSPIRE conference I was encouraged to see many seeking to do just this, contemplating "going beyond INSPIRE" (an often used phrase during the conference). In doing so, the conference stimulated debate; around the views of whether too much prescription will stifle innovation or whether specification and compliance are necessary to ensure data is combinable, accessible and usable.

Recent blogs, such as those from Rob Dunfey and Don Murray, have continued the dialogue and offered further observations on this matter and the need to be practical.

I can empathise with both sides of the debate, and in my own presentation on "Driving Government Efficiency with Improved Location-based Data", I condensed the data sharing challenges that INSPIRE seeks to encourage us to address, to five points:
  1. Facilitate collaboration (bringing people together)
  2. Increase availability (publicising what data exists and how to get it)
  3. Ease accessibility (use the web for what its good at, accessing information quickly)
  4. Ensure compatibility (ensuring common exchange and knowledge of what the data means)
  5. Improve quality (understanding fitness for purpose)
Ultimately I feel all are important if the full potential of data sharing is to be realised, but I also understand that there are benefits to be had in approaching the challenges in a chronological order.

I think most would agree that INSPIRE is succeeding in the first of these, mobilising governments, organisations and individuals to engage with each other and reach out to the opportunity to, quite simply, achieve what they have long needed to achieve; to share data more intelligently in order to better accomplish existing and future business needs.

We now have the prospect to succeed in the second and third of these, but as one side of the debate suggested, only if we don't get too bogged down with the specifics of the fourth. That's not to say that data doesn't need to be compatible and combinable, but in the first instance just get the data out there. This in itself gave rise to an interesting discussion around being swamped in unintelligible data versus having too little intelligible information. In reality we need both, the innovators amongst us will do wonders with the former, whilst decision makers and policy makers need the latter (and more of it).

So on to the fifth challenge – quality – and is this another obstacle to data availability or is it crucial to making good decisions and achieving business objectives? Again the answer is both. The conference plenaries gave insight to the viewpoints this poses, with quotes like "undocumented data quality leads to unmeasured uncertainty” (Leen Hordijk, Joint Research Centre) and “accessibility is more important than data quality" (Ed Parsons, Google).

The concern for many is that the fear of having data categorised as "bad quality" will mean that data providers may withhold data, which is counter-intuitive to the aspirations of INSPIRE and other Directives, like PSI, seeking the effective re-use of public sector information.

But what is quality? ISO 9000 defines it as the "Degree to which a set of inherent characteristics fulfils requirements".

So for the user scenario that no one knows about yet, there are no requirements, therefore quality has no immediate importance. But as soon as the use case becomes known then the question of "is the data fit for that purpose?" becomes prevalent.

Any application, analysis or decision using data, without knowledge of its suitability to the business purpose at hand, will carry an unknown element of risk. This risk will need to be balanced against the value of the application or the decision being made, which relates to the presentation of Roger Longhorn (Compass Informatics) on assessing the value of geo data, where he asks whether the value of data is determined by the value of the services it supports or the decisions being made. Here value and quality become inexplicably linked, and one will value and should demand quality, if the services provided or the decisions being made are of value.

So "quality is important", but it's not until you know what business purpose it fulfils that it really comes of value. It is then that data providers need to be able to react to these new user needs, as fresh customer-supplier relationships are formed, and provide information and assurances on the quality of data for those known purposes.

That's why here at 1Spatial we make quality our business, providing automated data validation services and improvement processes, enabling data providers and custodians of data to rapidly respond to new and changing user requirements.

So, if the services you provide and the business decisions you make are of value, then value the data that underpins the services and informs the decisions, and demand knowledge of its quality (specific to your requirements), enabling you to trust the data, manage the risk and share with confidence.

I'd like to know what you think. How important is data quality to you? Is it seen as just a technical exercise or is your data a corporate asset that is relevant and valued by your business?

1 comment:

  1. Matt - thanks for your post. I was directed to this site by a 19 August article in Vector 1 by Jeff Thurston.

    I wrote a comment there. We tend to divide participants into providers and users, but in my experience everyone is both. The key seems to me to get a standard approach to updating metadata at every stage or a metadata continuum. With ESRI and other organizations placing huge quantities of data online for free, I am reminded of Winston Churchill's comment that "A lie is half way round the world before the truth gets its pants on"!

    I absolutely agree that "undocumented data quality leads to unmeasured uncertainty" (not sure that "unmeasured uncertainty" isn't a redundant statement) - I quite like "Geography without Geodesy is a Felony"!

    I really think that the tough part is to get schools (as in grade schools), technical colleges and universities to provide a basic level of curriculum in all disciplines that have a need for surveying and mapping, and indeed perhaps some that don't at first blush, appear to. For instance, I think lawyers should be given some grounding in the principles of good mapping and how it affects boundary mapping and ownership in both surface and subsurface contexts, as well law of the sea....

    A second attack needs to be in the software. The metadata capture needs to be a continuum that is not application dependent. The language should be common, and the software needs to automatically capture changes and added to a common database dialogue of the spatial provenance. I am not at all knowledgeable about how to use the cloud for this and in my mind, that may add another layer of complexity to a problem that might be better solved at a simpler level first.

    Is there an OGC or Inspire standard for software that creates a standard approach for automatic capture and ongoing building of a metadata continuum through a given operational workflow through several user/suppliers and across application boundaries/transfers? On another discussion in linked in, I got the impression that everyone loves applications like FME, but I was not sure of the level of discernment with respect to the understanding of the impact on spatial quality of non specialists using these types of applications. In other words easy to use can be counter productive to the delivery of known quality.

    I like the approach I took at Devon which was to evaluate as much incoming data as I was allowed to see before it was used and then use that information to review datasets on a regular basis and provide alerts when it had changed without some documentation. Somewhat manual though and therefore somewhat expensive, but not compared to a catastrophe caused by failing to understand quality.

    There seems to me to be a need for all data types to have a standard definition of the data quality on say a 1-10 scale with a specific description of what 1-10 mean. This could be a pretty big project in one industry let alone across several industries and government applications. As you say, getting people to actually use it is another matter!

    Hope this is helpful. I maybe thinking about quality in a different context than your intent. I also may have missed some work that has already been done in this area. The O&G industry is sometimes lagging a lot of other areas, and/or our needs vary from other industries.

    ReplyDelete