Blame it on …Google!
(Editor’s note: I usually print my blogs and edit them before they are posted. However, I do not have access to a printer while on the road in Europe – so please forgive the typos I did not catch. I started sketching out this blog in Vienna last week. I kept cycling back to Google and their mapping program. I wrote some notes in the middle of the night and have put this blog together in Munich, where I had to pay 18€ for the Internet connection that I used to send it to you. I hope you enjoy it.)
Over here in Europe you do not hear about Google maps every day and I guess I am having some sort of withdrawal. Just in the nick of time my friend Duane Marble sent me a blurb from the Las Vegas Sun indicating that Google had misnamed Henderson, Nevada, calling it Rochester. One of the comments on the article said the location was neither of those names and should be called Hidden Valley Ranch. Oh, Google, how could you have gone so wrong so quickly?
If Google wants to be in the mapping business, it better develop a thick skin and hire an authoritative spokesperson to handle the public relations fiascos that are ahead. As many of you know, I was Chief Cartographer at Rand McNally for over a decade and read and responded to many of the consumer letters we received about real or perceived errors on the company’s cartographic products. Ouch!
From a cartographer’s perspective I am interested in examining the compilation process behind the new Google-Base (my term for their navigable map database available for the United States), since modeling their compilation process may tell us a lot about the integrity of the Google-Base and the problems that may lie ahead of Google in respect to map updating. As you know, Google is incredibly secretive. For that reason, I propose a model of the Google Map Compilation Process that I am developing just for the fun of it.
Let’s start this examination by looking at Figure 1. I do not have any inside information on what Google is actually doing in map compilation to support the Google-Base, but I suspect that they are following the path shown in the illustration. My intuition and experience tell me that the boxes in the illustration are in the correct sequence. In addition, note that each successive stage is used to augment the information provided in the previous stage and to correct any errors that are resolvable using their algorithmic approach to map compilation.
Algorithms? Yep, you are not going to find any warehouses full of people digitizing maps for Google – at least that’s the corporate story. I suspect however, they are creating an increasingly large stable of the type of non-algorithmic map editors known as humans. However, the question that surrounds Figure 1 is “Has Google managed to create a virtuous map compilation cycle that is as accurate and automated as is practicable at this time?” It is going to take a lot of noodling to come to a conclusion on this question and we won’t get to the total answer today. However, let’s make a start.
Most map compilation processes are similar to cutting trees, then producing lumber, followed by storing and aging the lumber in warehouses prior to distribution. Unfortunately, spatial data should be regarded as a perishable product that starts aging once it is collected. The faster a company can get spatial data into the distribution channel in the form of products, the better for all concerned, especially the end-user of these data.
As an example, many Personal Navigation Devices are provisioned with a new extract from a navigation database when the PND model is created and the map database is never updated again. Paper maps are even worse, as the data is usually collected six months before the product hits the distribution chain and the maps sit in a rack, sometime for years, before anyone buys them.
Due to problems of integrating navigation data with other data used in an application, some navigation system providers release updates only quarterly and others even less frequently. It has long been the goal of NAVTEQ to find a way to provide real-time date to its customers, so that they could have the latest updates as soon as the new updates were committed to the master database. NAVTEQ has had difficulty in achieving this goal, as their customers apparently do not see the value, or perhaps reward, of updating the data in their product that frequently.
Now, stage left, Google enters the scene. From a simplistic point of view, Google can be thought of as a company that has realized NAVTEQ’s goal. Google serves the best data they have from the “cloud” and can refresh it, augment it, correct and extend it whenever they feel a need to do so. Unlike NAVTEQ, Google controls its distribution mechanism. I think Google may be the first provider of navigable map databases, or even map databases, to completely control its own distribution network. While this may not sound the least bit interesting to you, it changes the game for everyone!
In a sense, Google is a new paradigm in mapping and navigation, since the company has created a new notion about how to update maps and has implemented new and unique technologies to support their vision. To Google, mapping and navigation are all about selling advertising and, as we have seen from their most recent financials, selling ads is big business. This is important because Google has, presumably, decided to spend tons of money getting the map compilation process just right.
Now Back to the Model
So, let’s return to general comments on Figure 1. While the use of high resolution aerial imagery (including satellite) is the first step in Google’s compilation process, the data derived from this process will require enhancement and Street View is the likely source for augmenting and extending these data, at least where Street View coverage exists.
Next, it seems likely that Google must turn to sources that it regards as “authoritative” to fill-in the areas where its imagery base is lacking (in terms of coverage, detail and content). It is likely that the preferred sources are national mapping agencies (e.g. the U.S. Geological Survey or other federal agencies (e.g. the U.S. Bureau of the Census) that create map databases as part of their mandate. Google also solicits data from state and local governments that are widely regarded as the authorities on data that is more local than national in scope and scale.
As you can imagine, the data from the sources used by Google exhibit a wide variety of accuracy, comprehensiveness and age. Note that access to some of these data is based on collaborative agreements; while other relationships are based on contractual licenses from vendors specializing in specific types of data (e.g. parcel boundaries as related attributes). I suspect that in some countries Google has or will have to create joint ventures, under the guidance provided by national governments, to create map databases (not to mention navigable map databases).
Google’s compilation process ingests the data from these various sources using a process called conflation that is optimized/trained to match road and street geometries and other positional information between the Google-Base and the data source, as a precursor to evaluating, assigning and transferring these detailed attributes (e.g. road class, street address, boundaries including road segments, road attributes, navigation attributes, etc.) to Google’s road geometry.
After the conflation process has completed, Google likely evaluates the match between its store of user generated content and the Google-Base that has resulted from the first three stages of the process shown in Figure 1. All of these inputs (suggested corrections, Google Map Maker, probe data from Android phones, etc.) get mixed into the Google-Base subject to some set of quality assurance procedures and fed into applications that require mapping, routing and navigation.
So, let’s start again at the top with a little more detail –
I have reasoned that Google must be leveraging its image database to create the geometry for the road networks in its map database. As far as we know, Google has not yet launched its own satellite for imaging and must be relying on suppliers (licenses) or public domain imagery to create the street and road geometry for its map database. Those of you who have spent some time with Google Maps and Google Earth realize the enormous variability in the quality (accuracy and resolution) and the age of the imagery used by Google to represent geographies across the world. What this means from compilation perspective is that Google cannot create all of the road geometry it needs (coverage) at the level of detail it requires from satellite or other aerial imagery everywhere it desires to provide maps that are owned by Google.
You can detect a limited set of cartographic attributes from satellite and other aerial imagery, if the imagery has a high enough resolution (and control). Road geometry may be a given, but street names, highway designations, road classes and other important information cannot be extracted from these sources. As a consequence, it is likely that Google is using Street View to provide some portion of the “missing attributes”, at least in the locations where they have Street View coverage available.
Another reason that a Street View-like process is beneficial is that the satellite or aerial imagery (e.g. rectified orthophoto quadrangles or the like) used by Google (and everyone else) is out-of date by the time the images are initially processed. I am not picking on Google, but the datedness of imagery reflects the nature of its acquisition process and the delays between mission planning, sensor tuning, image collection, image processing and the availability of the product to the user community (i.e. Google). The optimal solution is to own your own satellite and task it with imaging a specific area to remedy a conflict in your data sources. However, even the CIA can’t afford to do that for everyday problems and neither can Google.
Next, imagery is not always of the desired quality. While leaf-off and clear days are the optimal times to image, image capture does not always occur under these conditions. When the images are non-optimal (leaf on, cloudy), you cannot always see the roads, determine road width, see that a link is a cul-de-sac and does not connect to the street on the other side of the leafy tree, or notice a variety of critical attributes that are occluded in the imagery. What is unknown here is whether a Street View-like process can add value to a map update process once the initial data have been collected and require revalidation.
I suspect Google is applying image processing and image recognition software to extract street and road names from signage (and other road furniture), as well as other attributes (address ranges, directionality of the street, turn restrictions, etc.) from Street View imagery. Of course, the coverage from Street View is not comprehensive in a spatial sense, so the map data that Google acquires directly needs to be augmented by conflating the attributes of other data sources to the geometry captured by from its image base and Street View. It is important to note that the Street View data is, also, out-of-date as some function of collection date and the elapsed processing time that occurs before it is available for map compilation and augmentation.
Street View may capture data from roads/streets and other visible sources that are not available from the “imagery base” described in stage 1 (e.g. streets and roads not visible in the imagery or constructed after the imagery was “flown”). As a consequence, the location information that is gathered by the Street View instrumentation can be used to accurately position the geometry of these “new” streets and roads to match the data extracted from the Google image base (i.e. the gantry that is used to collect the Street View image and its associated instrumentation allow the capture of GPS, cell tower strength, Wi-Fi network strength and other location information from its three laser scanners). Street View is a powerful addition to Google’s satellite and other aerial imagery, since it can be used to provide many of the attributes required to create a navigable map database.
Given that address information is not always shown on street signs, or even on building surfaces, Street View is likely an inadequate source for providing a comprehensive address framework that would allow for geocoding. In general, geocoding can be thought of as a process for creating a map coordinate pair (e.g. Longitude, Latitude) from a postal address, allowing the address to be plotted in the correct location on a map – and a route to be calculated to the correct location of a destination identified by an address. The Google compilation process depends on conflation to provide the bulk of the addressing information they use, whether that data originate with public sources or from licensed parcel information. It is my belief that the quality of address information will plague Google and the problem will become more concerning as they expand their Google-Base outside of the United States.
But There’s More
The skeptics among you have realized that there is an enormous amount of map data that is not visible and cannot be sensed with satellites, Street View or any other imagery-based sensor now known. For example, it is not possible to fully determine city boundaries from street signs. It is not possible to determine neighborhood boundaries from street signs, or census tracts, property parcels, vanity addresses (One Miracle Mile) and a host of information that is required to produce not only a map data base, but one that can be used for purposes of navigation.
Of course, that is the reason that Google uses conflation and has a variety of agreements with data sources it considers reliable. However, if these contacts are not reliable, Google is betting that its use of User Generated Content will catch these errors and help them find the right solution. Hmmm. I think the Google is on the right path here, but suspect that they will experience lots of heartbreak before they understand how to manage UGC to their benefit.
While Google is now basking the pleasure of having created a navigable map database of the United States, they have also crossed the Rubicon and will have to support updating the Google-Base. I think they will find being in the game as provider is much more difficult than being in the game as a licensee.
One of the aspects of map updating that was practiced by paper map publishers was to update maps where the most people were likely to travel. Why is that? Well, where people travel is where your errors are going to be noticed. While this is humorous in some ways, it is instructional in that the databases created by paper map publishers did not have the accuracy or currentness requirements necessary for creating navigable map databases. Faced with these stringent requirements NAVTEQ and TeleAtlas concluded that they could not create navigable map databases using only data sources that relied on imagery and conflation. This notion is why NAVTEQ and TeleAtlas have field research teams. While we can argue that Street View and UGC are the field research teams of Google, it appears that both of these sources are biased by population.
More to the point, did you know that Feature Class 5 Roads (as defined by NAVTEQ), which are neighborhood streets and local roads in the countryside, make up an astounding 77% to 80% of the road network in Europe and about the same percentages in the United States? Do you suppose those Street View vans are being driven on all of these road miles? Do you suppose that the federal and local data used by Google to conflate these streets and roads to the Google geometry layer is up-to date? Do you suppose that the majority of those contributing UGC are rural residents or, more likely, urbanites?
Now the clinker! Know where most accidents with fatalities occur? Yep, those roads that most sources ignore – Feature Class 5 roads.
Has Google done a better job than those who came before? Maybe.
Has Google done a better job than NAVTEQ or TeleAtlas? Don’t know. However, Google apparently believes that its data is better than that of NAVTEQ or TeleAtlas since it has spent more creating these data than it could ever have spent licensing them from TA or NT.
However, since all three companies create navigation databases that are used to provide real-time maneuvers/directions to people driving moving vehicles we hope they are getting this right. How to determine the superiority of one solution over another seems like an interesting project – I’ll have to think about how to do that in a satisfactory manner.
Finally, let’s add one more concern and call it a day. When you combine the reservations described above with the notion that map sources used for conflation are not updated daily and in most cases not even annually, you just might begin to suspect that conflation may not be able to solve all map updating problems for Google. Consider this – at least historically, the U.S. Census has not updated the Tiger files until they began the cycle of preparing for the next census. Similarly, some USGS quadrangles have never had a complete revision since they were originally created – in the 19th century. If your sources are not up-to-date, how can your conflation-based map be up-to-date? Hmmm.
Well, now I’d like go back to my vacation. However, before I can do that I need to chair a session on the application of 3D Road Geometry to advanced vehicle applications at the TeleMatics Update Conference in Munich. It sounds like an exciting conference and I’ll fill you in when it’s over. Also, I plan to continue my examination of Google and map updating in the near future.
Until next time – Mike