Data Quality and Local Search – Quid Est Veritas?
Hope all of you had a Happy Easter.
Many interesting things continue to happen in the world of geospatial, but over the past few weeks my attention has been on the basics. After all, Geography has now been discovered by, well, just about everyone and spatial data and analysis are as common as Starbuck’s shops (or maybe as common as Starbuck’s shops were two years ago). However, before we do a victory celebration, maybe we need to go back and look at …data quality.
As those of you who have been reading this blog for a long time know, I wrote a series of blogs on this topic in 2007. Now, almost two years later and starting on my second hundred blogs (yep, the last one was number 100), I am going to take a new look with slightly different focus.
I started thinking about this based on an article sent to me by Duane Marble on the problems the LA Times found in a crime mapping system used by the Los Angeles Police Department. I presumed that they spent over $300K on the system to help them understand the pattern of crime in Los Angeles. Unfortunately, two inept contractors apparently never tested the crime mapping system and neither did the LAPD (of course, it is possible that LAPD actually stands for Lost All Positional Data).
Through the sharp eyes of the LA Times reporters, it was revealed that the most dangerous place to be in Los Angeles was a block-face near the Times Building, City Hall and the new LAPD headquarters. Oh, wait, the reported discovered that this actually was a relatively safe location and the crime wave at City Hall (oops, I meant near City Hall – sorry Mayor VillarAIGosa) was just a geocoding error that neither the system provider nor the data provider (Psomas) ever checked.
Apparently crimes in Los Angeles were being mapped in Lancaster (way far away), Catalina Island or allocated to a block near City Hall when the geocoding process did not provide a valid result. In order to correct geocodes that cannot be mapped correctly due to some mismatch, the data gurus responsible for the project have now decided to take advantage of the distance decay factor and to map them at the lat/lon of 0,0 in the Atlantic Ocean due south of Accra, Ghana and west of the Gulf of Guinea. Perhaps they were hoping these crimes would be claimed by the Somalian pirates, but of course, they placed the crimes on the wrong side of the continent for that to happen.
I am sure placing the crimes off the coast of Africa is not a public relations coup for the LAPD or pleasing to the residents of western Africa. Moreover, I am sure it is inconvenient for the LAPD to find the erroneous data whenever they want to have a quick look. Perhaps there is some contractor wisdom here? Perhaps it is “If you cannot see the error, it must not be there.” (No, I did not steal the quote from Johnnie Cochran or OJ)
Seems to me that this was another good use for that Homeland Security money. Perhaps the geocoding errors were actually paid for by Homeland Security to scare off potential terrorists who might have been planning an attack on City Hall. I can hear the terrorists now “It’s just too dangerous in downtown Los Angeles. Let’s go the OC.”
Geocoding errors and data quality seem to plague most mapping applications, but are especially pernicious in Local Search, store finders, brand finders and other systems that pretend to deliver consumers to buying opportunities. Open almost any Internet mapping application and look at some areas you know, and you will find map errors, geocoding errors and other data quality errors. How can it be so bad?
Since I used Starbucks as one of the examples in my original articles on this topic, I took a look at the current Starbucks website and noticed that their store locator contact information now lists the nearest cross street. That seems like a good idea. Imagine the increase in mapping accuracy if the officers would enter the closest cross streets near the crimes they were reporting in Los Angeles. Yep, it would be an additional step, but it should improve the quality of the mapping. Wouldn’t it?
So, I queried Starbucks for stores in Laguna Hills and got this map.
You will notice that the listings include a cross-street as a store identifier followed by an address and contact information. Now most people would think that Starbucks would know where their shops are located, but my experience tells me this is an unlikely scenario. In fact, it appears that neither Starbucks nor Microsoft knows all the locations. Nor does Google. Let’s look into this next time, because the continuation of the data quality problem is both an embarrassment and an opportunity.