James Brennan

A blog about statistics, data science, and remote sensing.

06 Mar 2020

Essential packages for geospatial analysis in python

Python is fast becoming the defacto standard for geospatial work – primarily because of its great ecosystem of packages, but also probably whatever you want to do someone has posted an answer for it on stackoverflow with attached python code. To save you some of that trouble below is a collection of “essential”" packages for everything geospatial in python.


Data I/O and management

  • Rasterio - Reads and writes GeoTIFF and other forms of raster datasets.
  • (…and GDAL) - Geospatial Data Abstraction Library (GDAL) provides a lower-level but more powerful reader and writer of 100s of geospatial formats.
  • satpy - Easy to use custom readers for satellite imagery formats from sensors such as MODIS, Sentinel-2…
  • fiona - reading and writing data in *shp, *json and many more formats.
  • pyproj - Python interface to PROJ (cartographic projections and coordinate transformations library).

Visualisation packages

  • matplotlib - “publication quality” plotting and visualisation.
  • cartopy - vector and raster geospatial data map making in numerous projections.
  • datashader - graphics pipeline system for multi-TB data visualisation suitable for point and raster data.
  • geoplot - raster/vector cartographic plotting library (choropleth, cartograms!, points, kernel density etc).
  • descartes - Plot Shapely or GeoJSON-like geometric objects with matplotlib.

Downloading data

  • sentinelsat - search and download Copernicus Sentinel satellite images from the command line or python.

  • pyModis - Collection of python scripts for downloading and mosaicing MODIS data.

“Computation stack”

  • scipy - modules for scientific computing in python: optimization, linear algebra, integration, interpolation, FFT image processing.

  • sk-learn - classifciaton, regression, clustering machine learning tools for python.

  • sk-image collection of algorithms for image processing.

  • statsmodels - package for statistical models (linear/logistic and robust regresson amongst others) as well as stastical tests (e.g. ANOVA, t-tests…).

  • tensorflow - “end-to-end” deep learning library.

  • numpy - efficient multi-dimensional array storage and processing.

  • xarray - multi-dimensional labeled array library including larger than memory functionality with dask.

  • dask / dask.distributed - Data structures for parallel and out-of-core data processing in python.

  • pandas - package for data analysis of tabular-style data.

  • geopandas - extends pandas for operations on geometric types like polygons and points + includes processing extensions for geometric join, merge etc.

  • whitebox-python Python package is built on WhiteboxTools which provides algorithms for distance buffering, raster classification, image enhancement and spatial hydrological analysis.

Organsing chaos

  • jupyter - Interactive computing for python (and many other languages) in the web browser – “create and share documents that contain live code, equations, visualizations and narrative text”.

  • pangeo - Combinations of python geospatial packages suitable for HPC and cloud computing.

  • luigi - workflow management package that can track dependencies between different scripts/processing tasks. Scalable to HPC batch systems with b2luigi.