Google Map Maker Goes Crowdsourced in the United States on “Judgment Day”
Last Tuesday was a scary day for me. After listening to a Discovery Channel show about how the Pacific Coast of the United States was the next in line for a catastrophic earthquake, I decided to pay my California Earthquake Insurance, which was billed at $666. I hoped this was not an omen. Later in the day, I was on the road driving to Santa Clara, California to attend the Where2.0 Conference. As many of you know, last Tuesday, April 19, 2011, was “Judgment Day”, the day that Skynet, the AI-based, U.S. Military Defense System, went live (at least according to the television series based on the Terminator movies, which is perhaps more authoritative than the Discovery Channel with some of you).
While driving, I was waiting for my XM- Satellite Radio to announce that the Where2.0 meeting had been declared illegal by Skynet, followed by a message that my GPS signal was now being managed by Skynet and that I needed to turn around and go home, since knowing where I was going was no longer my concern. Thankfully, Skynet did not go live. Alternatively, Tuesday, April 19, 2011 was the day that Google decided to announce it was opening up its U.S. Mapbase to crowdsourcing through Google Map Maker – perhaps not as shocking as a Skynet stand-up, but maybe more important.
And the Crowd Went Wild
Apparently Google’s announcement was cause for celebration among those who would now be allowed to donate their time and industrious endeavor correcting and augmenting maps for Google while allowing Google to own the copyright in and to the contributed geographic information, as well as to sell the rights to use it though the company’s new Google Earth Builder product. Pretty cool, huh? Or is this the reason that many people contribute to OpenStreetMap and ignore Google’s efforts?
Capitalism aside, the nagging question is “Why has it taken Google so long to add active crowdsourcing as a tool in its map compilation efforts?” (Note – for those of you who do not read this blog regularly, I regard “active” crowdsourcing as a situation where the person contributing map data has to take an active role in the process, such as using his or her computer to enter data they have endeavored to collect. “Passive” crowdsourcing involves the use of probes, such as Personal Navigation Devices that record the user’s path as they drive and require no specific, dedicated effort on the part of the user.)
It was quite clear that Google was not going to be able to pull away or even reach parity with Navteq in terms of map database quality using its standard approach to data fusion. Those of you who have followed my blogs on this topic will remember that in the summer of 2010 I addressed the value of local knowledge in map updating in a multi-part series titled “Better Maps Through Local Thinking” (the concluding article is here – it has some good stuff in it if you have not read it before). You may also remember that in January of 2010 I wrote another series on Google and map updating and Part II of the series has some useful illustrations supporting my contention that Google needed to turn to crowdsourcing to improve its data fusion process. Another blog from January 2010 titled “More on Google’s User Generated Content Tower of Power” predicted that Google would eventually need to find a way to provide meaningful incentives to prompt its map data gatherers to continue to provide “free” updates for the company’s spatial databases.
You know, Google could make this a lot easier on by talking to me about its plans and asking for advice, but since that seems unlikely, I am going to tell you the changes that I think you will see Google undertake to improve the quality of their map database in the United States now that they have opened it to crowdsourcing – but first a little “color” commentary on my point of view. (Of course, if Apple wants to ask me for some advice on strategy, I would be willing to help out ( I guess Google’s form of capitalism is infectious).)
1. Regular readers of this blog know that I am a fan of crowdsourced spatial data. However, my position on crowdsourcing is that it is just another of the numerous tools available for collecting spatial data. Similar to other tools, crowdsourcing has benefits and weaknesses. Whether Google can harness the power of crowdsourcing in a manner that improves the quality of the data in its U.S. database will depend how the company implements and evolves the crowdsourcing platform (currently known as Google Map Maker).
2. I may be the only one alive who thinks the Google announcement was interesting because it marked a capitulation by Google in the map compilation wars. Yep, relying only on data fusion was not going to get Google to the accuracy level required for success in the navigation and advertising markets. The error correction process implemented by Google to remedy inadequacies in their map database was so inconsistent that users were losing faith in the system’s ability to recognize and retain corrected spatial information. When Google was forced to abandon their algorithms and manually change their data due to user outrage over a particularly inept gaffe, the company would eventually revert back to an algorithmically generated, but inaccurate depiction of the situation based on the fact that they had ingested a source that they felt was more authoritative than the data provided by the users on the ground who had asked them to correct the original error.
The most recent case of this that I know of, documented by Mike Blumenthal in his blog , involves the trials and tribulations of the towns known as the Yorks of Maine in their effort to remain on Google’s map of the United States. Another situation was sent to me by Duane Marble. The article noted that Google had bowed to public pressure from the government of Brazil and agreed to remap its depiction of Rio, as it seems to have represented the city as being one large shanty town with few other attractions or neighborhoods. With the decision to accommodate crowdsourcing as a compilation tool, it would seem that Google has finally come to understand that map editing is something that cannot yet be solved by algorithm – or at least not using the approach they have adopted.
As a consequence of dissatisfaction with its past attempts at a data fusion approach to map compilation, Google has now decided to open the doors to crowdsourcing and hopes that somehow spatial accuracy will result from listening to millions of opinions about the current state of geography on the ground. How many opinions? Well, we really don’t know, but at the recent Where2.0 Conference, Marissa Mayer, the Google VP for Local, showed a slide indicating that Googlers around the world spend one million hours browsing geo-content each and every day. The only other data point that I have was the one mentioned in this blog in February 2010 in which Google indicated that every hour of each day it receives over 10,000 corrections or additions to Google Maps. I guess we can conclude, without much difficulty, that Google should have a significant user base willing to provide corrections, updates and augmentations of its U.S. map base.
However, numbers alone do not tell the tale on the utility of crowdsourced systems. What are the some of the concerns for Google in its use of crowdsourced data for map compilation?
1. While crowdsourced data is thought to add “local” knowledge to street level mapping, there is no systemic limitation on people contributing change information for areas far from their home location. For example, many of the contributors to OSM in the UK focus their contributions on digitizing streets from satellite imagery across the country, including areas with which they lack familiarity that are distant from their homes and for which they do not know any of the attributes that apply to the streets and roads that they may digitize in these areas. In effect, it is unclear whether or not the “local” information that is contributed by crowdsourcing is contributed by local people who actually might know what is happening on the ground in a given location.
2. Where does all this crowdsourced knowledge actually come from? I do not doubt for a second that some of could be gathered by field observation. But is it? Yes, but how much? Years of scientific research suggests that most people have a relatively poor memory for spatial location and are even worse remembering the specific positions and attributes of objects within these locations. I suspect that some crowdsourcers will pull out their Rand McNally Road Atlas or Thomas Brothers Guide to find the exact names of those streets and items they cannot remember. Oh my, did I infer copyright infringement? Maybe, maybe not, as it remains an open question, at least in the United States, as to whether a compilation of facts such as a map database deserves any copyright protection.
3. Some geographic data aren’t visible and will be difficult for the casual crowdsourcer to collect. For example, legal borders that are marked on maps in all their glory are often delineated on the ground only by the occasional sign along a street that indicates you are entering or leaving a legal jurisdiction. While the U.S. Census provides details on these boundaries through their yearly Boundary and Annexation Survey, sometimes these data are not publicly published for several years. Other “invisibles” include bus routes, park boundaries, property boundaries, and rights of way, etc. Perhaps Google intends to provide these data from its fusion process, but that won’t work out too well for them. So, just where will the crowdsourcers get these data – see the discussion immediately above.
4. Haklay’s research into OSM’s mappng efforts in the UK indicates that there may be a socio-economic bias in the OSM crowdsourced data for the UK towards providing comprehensive coverage in more affluent places. Not everyone has access to broadband or a computer, or has the spare- time to sit and digitize Google’s imagery all day, or, perhaps, the spare-time and know-how to report a local map error that has caught their eye. By extension, it is likely that data contributions in areas of U.S. cities that have high crime rates and urban blight are areas largely unfamiliar to the sample population who might be willing to contribute crowdsourced data to Google. In essence, it could be difficult to manage crowdsourcing in a manner that produces comprehensive map coverage across large geographical areas. See below.
5. Next, the efficacy of crowdsourcing is some function of the geographic distribution of people willing to contribute data to the effort. The vast majority of the mileage in paved streets and roads in the United States is in rural areas, where low numbers of potential contributors may limit the comprehensiveness of the coverage that could be provided through the use of crowdsourced data. However, if Google is interested in having reasonably accurate maps only in “urban” areas, crowdsourcing could serve them well. (Indeed, I understand that a company in the Seattle area (not Microsoft) is using passive-probe data (similar to that used to collect traffic data) to build a street database of the twenty largest cities in the United States).
6. It is, also, important to note that there is not a focused, organizing force urging Google’s crowdsourcing contributors to complete coverage in location X by date Y. Instead, the schedule of coverage is rather like topsy, it continues to grow at its own pace, where and when contributors feel interested in adding, correcting or augmenting data. Time frames and formal correction cycles are not a part of the crowdsourced world. Having a crucial error in your data that needs to be fixed may not be of interest to the contributors of crowdsourced data that you attract – at least not without incentives of some sort.
7. Recent significant research by Girres and Touya on the quality of the French OpenStreetMap Dataset raised questions on the heterogeneity of the crowdsourcing process, the scale of production and the compliance of contributors to standardized and accepted specifications. The authors concluded that OSM has great promise, but that its data was of variable quality and would remain so until the tension between “openness” and standardization of requirements was restructured with specific requirements for data entry and attribution. It is my belief that the structure of the contributed information will plague Google, unless it imposes more rigorous constraints than Map Maker has today.
8. Google representatives, as discussed in an earlierblog have made this statement “Carefully considering Google’s mission, guidance from authoritative references, local laws and local market expectations, we strive to provide tools that help our users explore and learn about their world, and to the extent allowed by local law, includes all points of view where there are conflicting claims.” Let’s see, “Google’s mission is to organize the world‘s information and make it universally accessible and useful.” How does that part about local market expectations jive with the mission statement? Is Google’s map comprehensiveness governed by organizing the world’s information conditioned by local market expectations? Will Google’s vetting process pay as much attention to a map change in Truth or Consequences, New Mexico as it will to one in Dallas, Texas? Google seems to have avoided proclaiming itself as agnostic in respect to map changes. Reference publisher really cannot afford to do that – they need to stand for something, but in Google’s case it is not exactly clear what that might be and how it will influence their evaluation of crowdsourced map data.
9. Due to their “always editable” status, crowdsourced spatial databases are constantly changing and, as a consequence, their error signatures are considered (at least theoretically) to be self-healing over time. The reason that most crowdsourced systems update in near-real-time is to provide other contributors the opportunity to correct erroneous representations as soon as possible after these data have entered the system.
Whether the healing of crowdsourced map databases actually takes place in a uniform and helpful manner is a complex issue that involves interactions between the number of participants contributing spatial data, their intentions and motivations, their interest in contributing data over long periods of time, and the spatial distribution of these contributors required for comprehensive map coverage. In essence, it is an open question whether or not the spatial data quality of crowdsourced mapping efforts can be managed to meet specific requirements on a timely and reliable basis. Curating crowdsourced data can be especially vexing and this is likely the key problem that Google faces going forward with crowdsourced data in the United States.
So what, if anything, is Google going to do about these problems?
Google representatives have indicated that the company wants to use crowdsourcing as a method for harvesting map corrections, as well as for collecting data elements that you usually do not find on maps, but that people use. Their goal is to harvest local knowledge to realize these goals and the keys to making crowdsourcing work for Google can be found in the edit and authority systems they intend to use. However, it is my opinion that Google does not yet have its approach set up right and will need to change it over time to gain the benefits they desire. Next time, I will write about the Google edit and authority system, compare their efforts to that used by Navteq and prognosticate who should have the better data and coverage, along with a discussion of the wild cards that will make this competition quite interesting.
1. Haklay, Mordechai (Muki) and Clare Ellul. (forthcoming). Completeness in volunteered geographical information – the evolution of OpenStreetMap coverage in England (2008-2009). Journal of Spatial Information Science
2. See Girres, Jean Francois and Guillaume Touya, 2010, Quality Assessment of the French OpenStreetMap Dataset, Transactions in GIS 12(4), pp. 435-459