Exploring Local
Mike Dobson of TeleMapics on Local Search and All Things Geospatial

Gate Keepers, Digital Gazetteers and Folksonomies – Part Five

March 17th, 2009 by MDob

Last time I ended up with this –

Well, I think we have some of the thought behind Google’s strategy, but now we should turn to how Google is determining the accuracy of its UGC, its fitness for use, updating these data and committing to collecting and maintaining them in the long run. Let’s look at these issues next time and hopefully start weaving this back to “authoritativeness” in mapping.

So, how is Google Determining the accuracy of its UGC?

Measuring the accuracy of spatial data, not to mention the accuracy of spatial data that has been mapped, is a difficult process. Oort discusses many of the problems with measuring accuracy here and in his PhD dissertation. Although van Oort is much more careful with his definitions than I will be in this blog, he focuses on error (the differences between the original data, the collected data and the representation of that data), vagueness (e.g. rules about data collection), and ambiguity (disagreements about what constitutes the truth) as components of spatial accuracy. Certainly the mix of these components can be found in Google’s MapMaker data. Measuring this error may be very difficult.

For example, one could reasonably argue that the map data that have been collected for Google by people using Map Maker, are newer and possibly more comprehensive than many of the “authoritative” sources in the countries where the data have been collected. In fact, it may be the standard for comparison. Hmm, what to do?

While I suspect that Google has created a number of functions to test the data they receive against the imagery they provide, it does little good to speculate, because I just do not have a way to evaluate their data. So, let’s turn to another example that, I think, probably bears on all data collected through UGC.

During preparation for my presentation on Spatial Data Quality and Neogeography for an online series produced by David Unwin, I became familiar with the work of Muki Haklay. His presentation can be found at the link above and you may want to check out his excellent blog.

Muki and his partners have spent a great deal of time examining the correspondence between the OpenStreetMaps database of the UK and the Meridian database produced by the Ordnance Survey. In his comprehensive study “How good is OpenStreetMap information?” Dr. Haklay concludes that OSM information is “… fairly accurate: on average within 6 metres of the position recorded by the OS, and with approximately 80% overlap of the motorway objects between the two datasets.” He continued by noting that in four years OSM had captured 29% of the area of England. Another of his findings was the OSM’s data was more comprehensive in urban areas than rural areas.

While the urban/rural road/population dichotomy is a difficult issue for TeleAtlas and Navteq (see images below), it may not be as much a difficulty for Google, even though the accuracy of their Map Maker data will decline as some function of distance between population centers.

The distribution of road lane miles and population for the United States

The distribution of paved roads by type worldwide

I think the need for uniform coverage may be less of a problem for Google because it would seem reasonable to assume that the majority of its revenues from Local Search result from searching urban areas, with revenue contributions related to size of population. If UGC is similarly predisposed by population, then Google might get better data where it needs it most and less relevant data where it needs it least, through UGC.

In other words, the shortcoming of UGC for navigation companies may not be a limitation for Google, but perhaps we need to look at the fitness for use issue. Most companies that are collecting mapping data on the scale of Google are doing so to create navigation databases. In order to create navigation databases these companies need to compile and update the types of information shown in the following illustration (By the way, this image was “borrowed” from the Ertico Site on GDF and is buried somewhere within that site. I noticed it years ago and use it here without attribution, only because I do not know who originally created it, although I suspect it was a Navteq employee).

Generalized  schema of layers in a map/navigation database

As noted, compiling the layers of data shown in the illustration is of interest to those companies creating map data bases for navigational uses. Google, however, does not need navigation quality data to meet its goals. Although it is interested in routing, it may need to collect only limited data to meet this need (connectivity, directionality, turn restrictions, etc.). Similarly, it benefits from some path information, but does not need the detail required by a Navteq or TeleAtlas.

While Google is interested in some geopolitical data, this interest appears important only to the extent that it helps them to deliver customers to addresses. Since most non-address data elements in the geopolitical layer are not visible, they are not easily gathered by UGC (at least, not without potentially infringing a copyright or two) and this may be why geopolitical data is lacking from so many of the maps created in MapMaker. It would appear that Google is interested in collecting POIs, but perhaps not concerned with the elements of the cartography layer. Of course many of these “map” elements are not visible, or perhaps only marginally visible from streets or roads and are another set of features not easily gathered by UGC. Similarly, the layers Special Features and Customer Specific Features (both are the bane of most mapping companies) are likely not required by Google, since these needs do fall within its own needs for advercartography (and I use the term respectfully).

Now lets get to how Google might approach map updating and committing to spatial data maintenance in the long run.

Wait, this thing is getting too long again.

Let me leave you with a tasty morsel that might keep you interested in where I am going with this series. Muki’s study of OSM’s UK database revealed that “Of the users, 26 contributed over 50% of the data, and 92 contributed 80% of it. There are many hundreds of users who only contributed a few nodes.” (p. 19). Perhaps more interesting is Muki Haklay’s finding that collaboration in the project was low with 51.3% of the features in OSM having been mapped by a single source (not by a single person but only by “one” person). Let’s be clear here, this means that collaboration was low. Dr. Haklay noted that the content for 89.5% of the total area of the database was covered by three or fewer users. Wow, is this really the reward of crowd sourcing? You may want to take a look at Muki’s slideshow on this topic http://povesham.wordpress.com/tag/openstreetmap/.

Next time let’s think about what these kinds of numbers mean for compilation and maintenance of spatial databases over time. Maybe that will help us predict possible strategies for dealing with these issues.

By the way, thanks to Duane Marble, I saw a reference to the new and emerging field of Neocartography. If it is as promising as neoneurosurgery, I’m in. Maybe its all these “neo” people have never been to a library and do not know that books actually exist on subjects like GIS, geography and cartography. But then, I guess that does not fit the “It’s about me” mold. Anyway, I am likely to blog about Neocartography when I finish with Gate Keepers, which I hope to do next time, maybe the time after. Stay tuned.

Bookmark and Share

Posted in Authority and mapping, Data Sources, geographical gazetteers, Geospatial, Google, Google Map Maker, Local Search, map updating, Mapping, Mike Dobson, Navteq, place based advertising, routing and navigation, TeleAtlas, User Generated Content


(comments are closed).