Comparing continental scale vector datasets can be challenging. Azavea compared the 125 million US building footprints released from Bing with the 28 million building footprints found in OpenStreetMap for the USA. This was in effort to test VectorPipe, an open source library developed at Azavea that supports working with OpenStreetMap (OSM) vector data, and is powered by GeoTrellis and Apache Spark. VectorPipe produces a Spark DataFrame containing columns of JTS Geometry objects, enabled by the user-defined types provided by GeoMesa. For visualization purposes, this data is converted to Vector Tiles.
This demonstration utilizes a couple of different building matching algorithms to show: (1) which buildings are only present in OpenStreetMap, (2) Only present in Bing or (3) present in both.
The technique can be used to compare other large scale polygonal vector datasets.
This talk will provide an overview of the open source tools used to generate these results in addition to discussing other processing techniques for doing large scale vector processing at scale using VectorPipe.