Crowdsourcing – How Much Is Too Much?
Today’s blog deals with paradox management – at least I think it does. You know, paradoxes – like reconciling the fact of globalization of the world’s economy with the apparent “tribalization” (cultural, racial, religious and social segmentation) of the world’s countries. Of course, I am not going to try and tackle anything as momentous at the world’s economy or social evolution. Nope, I manage to get into enough trouble just thinking about maps, addresses and addressing. So for today, I am going to look at the issue of “how much is too much” in the world of crowdsourcing?” Is it even possible to have too much crowdsourcing? Am I delirious? Let’s get started and see how it all rolls out.
Crowdsourcing – How much is too much?
I and two of my colleagues (Dr. David Cowen (Professor Emeritus in Geography at the University of South Carolina) and Dr. Stephen Guptill ( a retired Poobah from the mapping branch of the USGS) have been spending time working our way through a number of research topics for a project focused on providing the Geography Division of the U.S. Bureau of the Census with our perspectives on technologies and trends in the world of addresses, addressing and spatial data that might be of assistance in developing their Geographic Support System for the 2020 Decennial Census, as well as support for the annual American Community Survey. Although, perhaps lesser known, the American Community Survey is an ongoing survey that provides data vital to planning economic development, as well as data that help determine how more than $400 billion in federal and state funds are distributed each year.
One of the many topics that we are examining is whether crowdsourcing/Volunteered Geographic Information could play a role in supporting the needs of the Bureau of the Census for timely and comprehensive address and map update information. As of yet, we have not progressed to the stage of conclusions, so note that the topic I address below is just a waypost on a thought process that is not yet complete. Please note that I don’t intend to discuss or infer the details of our ongoing research or any details about the Bureau of the Census, the Geography Division, their databases or their update strategies. Instead, I wanted to provide a context for my interest in and the importance with which I regard the topic that follows.
Several years ago, few knew what the term crowdsourcing meant or what it might portend. Wikipedia, and then OSM certainly opened the eyes of many to the potential of crowdsourcing and how it could promote an “information age” that was more democratic and, perhaps, more expansive than the information compilation efforts that we had previously witnessed. My interests today is in writing about the use of crowdsourcing to build better and more comprehensive address and map databases than previously available and whether one could depend on crowdsourcing as a reliable reporting and compilation mechanism over extended periods of time.
A colleague recently tweaked my attention on an issue at the heart of crowdsourcing that I realized I had not spent enough time contemplating. (You must know that those of us who are professional consultants, spend a great deal of our time contemplating the obvious, because most of the time we realize that nothing is as obvious as it first appears).
From a practical standpoint, map-based crowdsourcing websites are built on the immortal words of Ray Kinsella. Oh come on, you remember Ray from the movie the “Field of Dreams’” He’s the guy who hears a voice in his cornfield whisper “If you build it, he will come.” Well, most people misremember and hear the voice saying “If you build it, they will come.” Regardless, of your memory of the quote, a belief in this method of overcoming inertia (building a web site and asking the public to populate the database) is one of the main tenets behind operating crowdsourced websites whose supporters hope to use to remedy the problems of inadequate, publicly available databases for maps, addresses, business registries and the like.
In response to this “crowdsourcing call to action”, we have volunteers (I will call them “scouts”) who are willing to donate their time and the effort required to populate a database intended to remedy the problem of sparse, publicly available spatial databases. Scouts are the major change agent in the world of crowdsourcing and their care and well-being gates the success of all crowdsourcing effort.
Just so you don’t forget, there is, also, a “mule” model of crowdsourcing, where you let a beast of burden assist you, but require nothing out of the ordinary from it, other than the mule performs its usual duties while you track their location. Mules create the “probe data” (sometimes called floating point data or passive crowdsourced data) that can be collected and agglomerated to create map databases. Often, “probe-based” map databases are less comprehensive (in terms of spatial coverage and number of variables (attributes)) than those in which the data is collected by “Scouts”. Conversely, the accuracy of probe built databases is often higher than ones created by active participants because of the repetition with which the mules drive their daily routes, such as the journey to work, deliveries or errands.
The universe of mules, however, is somewhat limited by the fact that most of these contributors are actually an extension of a commercial brand. For example, those contributing to MapShare have a TomTom PND or a device that uses TomTom software (e.g. an iPhone). In the future, it may be that there will be tribes of “mules”, perhaps associated with car companies (e.g. OnStar Mules from GM), wireless providers (e.g. Verizon mules) or platform providers (e.g. Android Mules or the now patented iMules). In some sense, this future fragmentation of the “probe-based” market may limit the usefulness of mules unless the size of the probe fleet is extremely large (such as all vehicles produced by Volkswagen or Ford). Smaller car companies, like Saab, for example, would not have a chance of successfully competing with the major manufacturers.
Now, you know where I am going with this, don’t you? Every opportunity taken to create a new crowdsourced map, address or business listing website dilutes the pool of potential contributors and raises question about the size of the pool of contributors, their distribution and their motivations for contributing spatial data over extended periods of time. Even though I love maps, I am not unrealistic enough to think that our “scouts” are going to dutifully report their road data (however collected) to every website that might want it, or even to those websites deserving of it.
Should my initial premise (“build it and they will come”) be rephrased? Maybe the wording should be something like, “if everybody comes and contributes their data and updates it when the data needs to be refreshed, then this will be the mother of all crowdsourced spatial database websites.” How droll.
“Build it and they will come” could be rephrased into some other catchy slogan, but no matter how you say it, or the terms you use to convey the notion, it’s not going to happen! Nope, there just are not enough contributors to go around. Better yet, there may not be enough “authorities” to create all of these multiple representations of reality (or at least, meaningful representations of reality). Can you imagine – brand preference for crowdsourced databases? Hmmm. On the other hand, isn’t that the purpose of Cloudmade (founded by founders of OSM)?
So what are the real issues here?
Well, we could spend quite a bit of time arguing the merits of these points, but the real problem is dividing the number of contributors who might be willing to contribute map data by providing them numerous needy causes (crowdsourced websites) that really need their help. Adding a third column to the chart shown above relays my current beliefs about some of the issues. Here goes.
Wow. That should start some of you moaning.
Before you get too far into torching my argument, take a second and reflect on the Google Experience (similar, but not to be confused with the Jimi Hendrix Experience). Google, for all its prowess and thoughtwatts (brainpower) has been unable to create a competitive map database, even though they combined innovative technology with wide ranging efforts to use crowdsourcing as a data input mechanism.
Yep, that warehouse in the State of Washington full of three hundred and some would-be, should-be, but can’t-be “map compilers” (that’s not what the “Big Googie” calls them, but that’s the job these software engineers are doing for Google) are a response, in part, to the inadequacy of depending on volunteer labor to build a commercial grade, navigation quality map database, at least in the United States.
For those of you who do not know, many people in the map industry believe that Google Mapbase is 12 to 18 months (maybe more) away from being able to effectively compete with NAVTEQ or Tele Atlas on quality in the United States and further from that goal in Europe. (You did know that Google is attempting to build a European database using its standard toolkit, as well as partnerships with AND, other commercial sources and governmental sources? However, those pesky privacy laws in Europe are limiting the effectiveness of Google’s key tool, StreetView.)
On the business listing database issue, Google keeps re-upping its contract with infoUSA (and similar contracts with others) to keep its business listings database up-to-date. Yes, Google does have a crowdsourced-program where it partners with business owners who can claim and update their business listing for free (how gracious). Why those who join the program (register) can even consider advertising with Google! However, the problem with this program and others like it is that business owners simply do not have the resources to fill out a form on every website that a potential customer might use to find the goods or service they provide. Think about that – these retailers and service providers have a vested interest in the outcome of being found during a local search, but are overwhelmed by the number of websites they would have to register at and provide their business information (even though they know this information by heart) to accomplish this goal.
Now, rethink the role the gracious “scout” who is trying to decide where contributing their data would have the greatest impact. Most contributors are not going to take the time to provide data to every crowdsourced website that needs it. Even OSM, which I think we could agree is the most successful of the mapping-based crowdsourced websites, seems to be having problems attracting enough “scouts” in the US to create a map database that is equivalent to the one that they have been able to create for much of Europe. While we could identify a number of reasons why the take rate may be slower in the US, is it possible that one difficulty is that people in the US generally don’t think that there is a “map availability” problem? In addition, is it possible that so many potential contributors to OSM fatigued themselves supporting Google’s efforts that they simply are not interested in participating in the crowsourced version of the movie Groundhog Day?
Another issue, and one that may be unique to the United States, is that a number of federal, state and local governments have spent a great deal of time, effort and money to create GIS databases of the areas under their jurisdiction. While some are willing to share these data with crowdsourced websites (like OSM or ESRI’s Community Maps), most of these data sources make the data available as is, without promise of future updating, since doing so would depend on budgets and other contingencies outside of their control.
The crowdsourced-organizations to which these agencies contribute their data have the unenviable task of actively harmonizing the data that they import. In turn, this harmonization raises a number of issues such as: Who sets the standards for collecting crowdsourced data from an editorial perspective? Who sets the quality controls standards for crowdsourced data? Who provides external guidance for crowdsourced systems? Well, in the classic model of crowdsourcing, the answers to these three questions are no one, no one and no one. Of course, that is the way crowdsourced systems are designed to function. In other words, crowdsourced systems remove the management layer and depend on the notion of random, but purposeful, editing and re-editing to correct map inaccuracies. In turn this method can result in quick resolution of inaccurate spatial data, but often results in mismanaging the efforts of the contributors on which the system depends for map data.
In a more concrete example, who decides if Joe Scout can edit the data that was provided by an authoritative state agency? Once again, the answer is “no one.” If Joe incorrectly edits significant amounts of authoritative data supplied by a governmental agency, what incentive will the agency have to continue to contribute data? In turn, what happens when Joe Scout knows his edits are correct and an automated system beats his changes out of the database? What happens when some neophyte who really does not know what he is doing, edits Joe Scout’s contributions out of the database? Do we expect that Mr. Scout is going to continue to return and correct those who have wronged him, or perhaps take action and begin contributing his efforts to some other website that seems to pay attention to his information?
Reading the comments in some of the OSM blogs doesn’t give me the warm fuzzies that all is peaceful in the editing arena in the world of crowdsourced maps. Active contributors don’t appreciate their work being replaced by that of a bonehead, even if the bonehead is an automated system that was built with the best of intentions. I suppose that I should note that people who are paid to compile maps don’t accept this kind of treatment.
Of course, in some crowdsourced systems, some editors are more “authoritative” than others. Ohh? How do you get those credentials? Wait in line, spends hours online? Sorry mate (see, I have been reading those OSM blogs) but I’m maybe just going to move to another crowdsourced mapping website! Or, maybe just start my own? But if everyone looks for greener pastures (you’re welcome, Mr. Kinsella) or starts their own map database, how will this impact the contributors and the overall quality of data in crowdsourced systems? To me, that seems to be the problem we are beginning to experience with crowdsourcing as a method of building map, address and business listing databases. Perhaps there are too many crowdsourced sites contending for the attention of the scarce resource in the system.
It is my intuition that, at least in the United States, we may move away from pure crowdsourced systems to hybrid systems in which some level of authority, greater than we have witnessed so far, will guide the data collection efforts including coverage, standards and quality control. While hybridization could take away some of the advantages of a true crowdsourced system, I think that some level of control might be required to create databases that could be relied on to provide quality data over long periods of time, such as the intercensal gaps that now have caught my attention. It is for that reason that I have become interested in ESRI’s Community Map program, but I think it may be a little too early in the game to focus on them, so be prepared for a future blog on the topic.
I am still bullish on the use of crowdsourcing to help build map, address and business listings databases, but I am not sure what the crowdsourced model will look like in a few years’ time. For example, MapQuest and a partner organization in AOL (Patch) have bellied up to the bar to support OSM in Europe as well as establishing a development fund in the US to the tune of one million bucks . The fund is designed to support “improved” data over the next year (see this link for more details on the investment). Hmm, does this mean that AOL-Mapquest-Patch will be giving OSM some editorial direction? Hybridization – I hear it’s the way of the future – or at least it’s the future for crowdsourced map databases.
And now, something completely different
Say, for those of you who missed it and might be interested, I commented on the topic “What Facebook Places Really Means to LBS” in Kevin Dennehy’s column in GPS World. Speaking of Kevin, his one-day, GPS Wireless 2010 meeting will be held in San Francisco next month on October 5. I will be speaking on a panel titled “New Platform and Technology Markets” and hope to see some of you there. Note, that the meeting is held at the Moscone Center this year and is a prequel to the CTIA Enterprise and Applications meeting that start on October 6 in the same venue. In addition, registration for GPS Wireless provides you access to the CTIA Exhibit Floor as well as to the CTIA Keynotes.
Posted in CloudMade, Geospatial, Mike Dobson, Navteq, OSM, Tele Atlas, User Generated Content, Volunteered Geographic Information, crowdsourced map data, infoUSA, map compilation, map updating, openstreetmap