Exploring Local
Mike Dobson of TeleMapics on Local Search and All Things Geospatial

Google And Map Updating – Part II

January 14th, 2010 by MDob

Last time, I stuck the term “Ground-truth surrogate” into the list of terms that I felt we needed to define in order to move forward with our exploration of Google and map database updating. The definition of this term is – “Reference source of sufficient completeness and/or accuracy that may be substituted for field verification when measuring the completeness and/or accuracy of map database features, attributes or properties.”

The problem for producers of map databases is that there isn’t an available source of information comprehensive enough to be the catch-all that can be used to verify the completeness and or accuracy of all features, attributes and properties across the compiled map database – other than the earth itself. In turn, the traditional approach to verifying a compilation of information comprising a database created for mapping, navigation and route guidance is by field checking the quality of its content and usability. While field checking is undertaken to reveal specific errors and misinterpretations in the database, such efforts also reveal the shortcomings of the sources, sensors and methods used to create the database.

I think we need to evaluate the techniques and data sources that were used to compile the Google-Mapbase. More specifically, I will discuss their limitations, as this will help us understand what Google will need to do to remedy these lacks. (For those of you who have joined this blog recently, we covered the sources used to create the Google base here.)

What are the weaknesses of the techniques used by Google to create the Google-Mapbase?

Limitations of Satellite and other aerial imagery

Lack of uniformity – collected using variable specifications and imaged with platforms exhibiting different imagery characteristic.
Not up-to-date – collected at different points in time.
Obscurations and Obstructions preclude accurate mapping – coverage used by Google can be obscured by clouds. Road detail sometimes difficult to sense (manually or automatically) due to difficulties in object discovery and identification resulting from obstructions in line of sight (including leaf-on photography).
Spatial resolution inadequacies
Spectral resolution inadequacies
Heterogeneous approach, since the imagery varies in quality, specification, and up-to-datedness.

Take a look at some of these problems – all can be found on Google Maps.
Three image sources in the same scene

Just where does Route 281 end? I guess Route 281 is just one of those roads that starts and stops a lot.

And underneath the clouds?

Street View
Not up-to-date – two generations of sensors and specifications.
Not comprehensive – favors high-population density areas and major cities.
Resolution inaccuracies -imagery characteristics vary between early sensors and current sensors.
Cultural rejection – not all areas may accessible to the Street View platforms (cars, bikes, etc.). See this article and this for potential restrictions.

Conflation
We discussed some of the limitations of conflation in our last blog. Suffice it to say, conflating data often produces errors of one sort or another. Common problems include:
Lack of homogeneity in data quality across database
Data sets vary in data content,
Data sets vary in data quality,
Data sets vary in precision of measurement.
Mixing unique data sets potentially produces errors.

User Generated Content Related Compilation Processes

UGC is an uncontrolled ground-truth surrogate. There are at least three sources of UCG used by Google: contributed edits to Google maps, contributed maps to Google Map Maker, and probe data from persons using Google’s navigation app on cell phones.

In general, the people who contribute UGC exhibit self-selection (they update what interests them), limited spatial focus/interest, and often provide map edits for self-benefit. While none of these factors are unexpected, they do make it difficult to direct the data collectors to resolve specific problems by geographic areas.

In addition, user priorities may lead to unreliability, usability of the contributed data may be low and there may be prejudices in the responses. With enough contributors these limitations may become moot. However, there are some important questions that we cannot answer about this process. One is, “How many updates does Google receive from users of their maps?” Another is, “What is the spatial distribution of these users and their edits?”

Probe data can provide different benefits to map revision efforts, since mining the probe data provides a considerable amount of detail on the current configuration of roadways, their geometries, their flow speed and directionality, etc. However, once again, the user is self-directed and their probe-paths reflect self-focus/interest. Of course, the value of probe data will increase as the number of users of Android phones and Google’s navigation application increase, especially if that distribution is wide-spread in terms of spatial coverage. At present, the benefit to Google of probe data is likely to be insignificant for map updating.

Improving the Google-Mapbase

So, with all of these potentially flawed data gathering techniques, what is Google going to do to improve the quality of its Google-Mapbase?

Let’s, take a short step back. NAVTEQ, TeleAtlas and Google used many of the same techniques to create their map/navigation/route guidance capable databases. Google created its database faster than its two competitors by taking advantage of tools and capabilities that did not exist when NAVTEQ and TeleAtlas started compiling map database for navigation over twenty-five years ago.

NAVTEQ and TeleAtlas have had a longer period to refine their databases. In general, the belief in the industry is that NAVTEQ, who uses fieldwork both in the collection and evaluation of their data, has slightly better data quality than TeleAtlas in the United States and Europe. One surrogate for this ranking is the number of adoptions of one database or the other by manufacturers of in-car navigation systems, a field dominated by NAVTEQ.

Now, let’s turn to how Google, a newcomer, might catch-up with the market leaders. I suppose we could make the argument that Street View is a form of field collection and its results could be used as a form of field verification. I suspect that Street View will be part of the solution, but the significant event is something different. (At this point, I want to thank Allan Snell who sent me this link to Tim O’Reilly’s blog on the new Nexus-One phone from Google, as this article helped crystallize my thinking on the map updating topic.)

Let’s spend a second contemplating the source of Google’s success. In the article linked to above, O’Reilly states his belief that “Collective intelligence is the secret sauce of Web 2.0…” He includes a telling quote from Peter Norvig, Google’s Chief Scientist, who once told Tim “We don’t have better algorithms. We just have more data.” Change that quote to “We don’t have better algorithms. We just have more map-related data.” Now you have the difference between Google’s approach to map updating, as contrasted with that of the other players in the mapping marketplace.

Let’s look

Here is the layer cake that describes how Google created their Mapbase. As you may remember, I postulated that Google started its mapping efforts with satellite and other aerial imagery, added in what it learned from Street View and UGC, did some data mining to fill in the gaps and conflated new data to their geometry layer to create the Google-Mapbase.

Google's map compilation process

Next, I present a figure that shows how Google will update the Google-Mapbase. (The colors used to identify a process are the same above and below, but their different positions shows the change in the importance of a category in the map revision process). Note that UCG will drive the map update process. It is my belief the UCG will be used to help Google understand the quality of its data and the spatial distribution of that quality. UCG will fundamentally influence all other map update processes for Google.

The process Google will use to update it map base

Now let’s look at an illustration of how Google will use UGC to direct their update process.

How UGC will direct Google's map update process

Click here for a PDF of this image (which you can zoom to your heart’s content). As noted above, UGC will become the bandleader indicating the what and where of data gathering efforts needed to improve the Google-Mapbase.

Finally, the following illustration shows why Google could dominate the map database market.

User Generated Content will unlock a wealth of rich map updating information for Google.

It is this UGC Tower of Power that will allow Google to progress in map update and quality faster than most people suspect. The key here is that Google has more map data collected by more people on the ground than it is possible for any existing map database company to collect, in any manner. I realize that TeleAtlas will want to argue about this, but Google either leads them now or soon will.
One more thing about the Tower of Power – remember back in grade school there was always some “brain” in the class who seemed to know more about everything than anyone else? Well, perhaps, you also remember those days when “braino” crapped out and the teacher began posing questions to the entire group about the subject.

It was this point that the magic started, as a number of people seemed to know a little different piece of information that when combined accurately described the issue of interest. Google realizes that our actions online can become a repository capable of reflecting the knowledge that we have about things that interest us in the real world. In turn, the collective intelligence of all of its users can provide valuable, authenticated, spatial information to help in the updating the Google-Mapbase. (Now you know why I don’t like the term “Volunteered Geographic Information” – much of the information we will provide Google will not what we intended to volunteer!)

How much improvement and how long it will take to realize the benefits of “collective intelligence” are another issue. Let’s talk about the details behind these diagrams next time. (I may be on the road next week (talk about waiting for last-minute contracts), but I will do my best to get the next blog out to you while I am on the road.

Oh, there is another fly in the ointment for Google. Seems that Microsoft is proposing creating a new, comprehensive, imagery base of the United States at a resolution of 1 foot. It is attempting to pull in states as its partners and will allow them the use of the data for an extremely attractive one-time price. Now why, do you suppose, Microsoft would be interested in having a homogeneous, current, large-scale imagery base of the United States? Hmmmm. Maybe because no one else has one! At least, not yet.

If you have not taken a look at Microsoft’s new Bing mapping package, it has some great features. Perhaps of more importance, their imagery base in Europe is much better than that currently provided by Google. Maybe something fun is shaping up?

Click for contact Information for TeleMapics

Bookmark and Share

Posted in Bing maps, crowdsourced map data, Data Sources, Google, Google Map Maker, map updating, Mapping, Microsoft, Mike Dobson, Navteq, routing and navigation, TeleAtlas, User Generated Content


(comments are closed).