Wednesday 13 April 2011

Here come the DIP's

Posted - Bob Chell: Three GeoDATA events down (London, Birmingham and Leeds) - review and retrospective time.

To begin with, I am presenting a subject that I am passionate about, but one that, to a certain degree, people have always been happy to ignore or pretend is not there - Data Quality. However, the reaction during my presentation, and more importantly afterwards through conversations with delegates, shows that in the spatial world this is right at the top of their to do lists.

In the end, two of the most important assets to any organisation are its staff and its data. And as well as the focus on efficiency, people are being encouraged to make their data publically available. This is strategic goal. But with it, comes a set of responsibilities, because people will be accountable for that data. To become more efficient, organisations are also looking to consolidate systems and streamline how they work.

The nature of the spatial data that I deal with joins up many different types of systems, managed by and for all sorts of personas. This means that most of my clients work in data-intensive businesses. More and more, spatial data accuracy, spatial data cleansing and spatial data quality health checks are now being mentioned in publications that are not focussed on users of geospatial technology. Take a look in any 2011 issue of Government Computing and you'll see this. Businesses are making direct relations from poor quality spatial data to missing revenue, or lost efficiency - missing or inaccurate address records resulting in lost income tax or missing out on new home bonuses.

As soon as people have made this connection between the data and its business value, they start to see some tangible quick wins. Typically, an organisation might have 100,000 address records in a single system. When you go through the numbers, even if that organisation can make data quality improvements of a small 2 or 3 per cent, it still adds up to 1000s of improved records, which you can put a financial value on. It then becomes possible to decide if the benefit of improving the data outweighs the cost of the improvement process itself.

However, the data-intensive business of the spatial world means that without some form of automation, it could literally take years for businesses to get an understanding of the quality of their data. Without being lean and efficient by automating these tasks, the business case for improving the data is not always there.
I tried to make the GeoDATA events educational, and took the opportunity to talk people through getting organised and setting up a Data Quality Process. It's a data-driven process, rather than product-driven. It shares key concepts with many other Data Quality Processes, but that is because there are key things you must do to ensure everything works efficiently. We encourage people to take a positive approach to this.

The GeoDATA Events continue in Dublin and Edinburgh, so now I know that people like what we are saying at 1Spatial, I need to get our marketing team to print out some more A5 flyers that have nothing more than the Data Quality Improvement Cycle on it. People can really relate to it, and it offers some real genuine guidance on how people can start their own programmes of work.

Monday 4 April 2011

What’s in an Address?

Posted - Chris Wright: Recently I have been working with one of our customers, Staffordshire County Council (SCC), around address matching.  The area of addressing is one not traditionally ‘addressed’ by 1Spatial, as in some cases there is no geocode and hence the data is deemed to be non-spatial. However, our Radius Studio solution is just as proficient with non-spatial data as it is with spatial data.


SCC needed to provide accurate address locations to the emergency services, enabling them to carry out safety checks on homes within their areas in a co-ordinated manner. This was all part of their Data Quality Mission Statement. However, the problem was that no specific details about levels of data quality were available, so we helped SCC conduct an initial baseline assessment of their data.  This initial assessment confirmed that it was of varying quality - much of the information did not appear to meet any address standard format such as BS7666. The BS7666 standard enabled us to build a useful catalogue of rules for addressing this area.

After the data quality baseline assessment was completed, we helped SCC move to the next stage and supported them in putting a Data Quality Management process in place. This stage of the exercise meant we could:
  • Retain the original address data entries
  • Automate the validation and correction of as many errors in the data as possible
  • Find the exact/best match to trusted national address datasets
  • Add value to the data by adding a Spatial Reference
  • Provide an indication of ‘confidence’ level of match
The whole process validated around 77,000 addresses against 3,500,000 national address records.
Working through the exercise allowed us to realise a number of benefits for the customer.
  • Data Conformance - Using the rule based methodology we were able to identify errors within the data including; syntax/typing errors, invalid characters, invalid postcodes, redundant records, etc.
  • Perform Data Reconciliation - Using the action abilities within Studio we were able to fix common problems in the data, such as replacing or removing invalid syntax. 
I have just started to look at the new National Address Gazetteer (NAG) which will be replacing the National Land and Property Gazetteer (NLPG) and Address Layer 2 in the fullness of time. I’ll post more on this in the near future.