With ESRI’s use of Python as their scripting language and the proliferation of open source GIS, Python became one of the required languages for GIS developers and hobbyists alike. What makes Python powerful is well documented throughout the web, but I want to highlight one very important aspects of Python today: Python Modules.

Python Modules are code someone else has written and distributed, in order to make life easier for the rest of us. You may be familiar with the standard modules that come with Python, like math or datetime, but there are numerous more resources out there for the GIS minded developers. I will be discussing some of the modules I find essential in my work apart from the famous ArcGISScripting module by ESRI: GDAL, numpy, NetworkX, xlrd and xlwt. Let’s dive in!

GDAL – Geospatial Data Abstraction Layer

It will come a time in every GIS Professional’s career when they will need to quickly access information from a random shapefile they have, but do not have access to any GIS software or geoprocessing functionality (think of a laptop on the road, a remote machine not running Windows, etc). GDAL comes to the rescue, providing us such functionalities.

GDAL is a translator library with Python bindings that allows access to raster data using a unified abstract layer. Bundled with it is OGR, which provides similar functionality for vector data. Download it here.

A quick example of using GDAL:

import gdal
from gdalconst import *
# Open the raster dataset
dataset = gdal.Open(filename, GA_ReadOnly)
# Print the projection of the data
print dataset.GetProjection()

Using OGR:

import ogr
# Get the driver
driver = ogr.GetDriverByName('ESRI Shapefile')
# Open a shapefile
dataset = driver.Open(shapefileName, 0)

numpy – Numerical Python

I cannot think of any GIS practitioner that did not have to manipulate raster data in a peculiar way, only finding that the software at hand doesn’t allow you to perform many customized functions. If one can interpret raster data (which GDAL above helps us with), then one can use them in Python as a matrix (algebraic matrix). numpy is the best Python package out there for this sort of situation.

numpy is a package that enables n-dimensional array manipulation in Python, as essential part of any scientific endeavor. It also provides linear algebra functionalities, Fourier transforms and random number generators. Get it here.

An example of the use of numpy:

from numpy import *
# Sample IO Table data
ioSample = [[1,2], [3,4]]
# Turn into a numpy array
ioMatrix = array(ioSample)
# Find the inverse of ioMatrix
ioMatrixInv = linalg.inv(ioMatrix)

NetworkX – Complex Networks Analysis

While most GIS software out there provides the ability to build networks, sometimes it is easier to build networks quickly and dirty, without having to involve complex GIS software. An analysis of participation by space in an experiment can easily be achieved using the simple, yet powerful NetworkX module.

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is hosted by the Los Alamos National Laboratory, and sees active development (presumably sponsored somehow by Los Alamos). Download it here.

A quick sample is shown below:

import networkx as nx
# Create a graph
g = nx.Graph()
# Populate the graph
# Create edges
# Print the neighbors of node 1 (returns 2)
print g.neighbors(1)

xlrd – Excelâ„¢ File Reader

All GIS practitioners have been sent “GIS data” in an Excel file, either a geocoding result or GPS waypoints, or anything similar. While ideally whoever sent the data would be educated on why it is a bad idea, most often we have to deal with the data without any additional help. xlrd comes into play, allowing you to read the said Excel formatted data into Python with little effort.

xlrd is a Python module that allows one to read Excel files without the need of Microsoft Excel or Windows. It provides access to XLS files for Microsoft Office 2003 or earlier. Download it here.

A quick example that will read an XLS file and print it to screen:

import xlrd
# Open the Excel file
book = xlrd.open_workbook("excelFile.xls")
# Read the first sheet in the Excel workbook
sheet = book.sheet_by_index(0)
# Read the first row from column A to E
rowValues = sheet.row_values(0, start_colx=0, end_colx=4)
# Print the row values
for value in rowValues:
    print value

xlwt – Excelâ„¢ file writer

Business requirements often want results in an Excel file, so some other person, in another department, can run some sort of analysis on the data. Building a distance matrix is fine, but Joe from accounting is using complicated Excel spreadsheets and does not want to bother with database connections or DBF files. This is a situation in which xlwt excels, writing data to an Excel spreadsheet without the need of Excel or manipulation by the mouse.

xlwt is a Python module that, similarly to xlrd mention above, allows for cross platform Excel file creation without the need of Microsoft Office. Download it here.

An example follows:

import xlwt
# Create a new workbook
book = xlwt.Workbook()
# Add a new sheet
sheet = book.add_sheet("My Sheet")
# Write the number 5 in the first row, first column
sheet.write(0, 0, 5)
# Save the file

These are my essential Python modules for GIS development. Notice I left quite a few behind, especially those relating to web-based GIS developments. Those will appear in another post in the future. But please, do share your thoughts on what Python modules are essential for your GIS work.

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.75 out of 5)
Loading ... Loading ...

Related posts:

  1. Python and Geography: Input Output Models and Graph Theory
  2. Manipulating Excel files using Python part 2: Writing Excel Files
  3. Loading the Geoprocessor Safely
  4. MakeFeatureLayer in Python
  5. The essential skills to succeed in a GIS career

13 Responses to “The Essential Python Modules for GIS”

  1. What about these? pyproj, shapely, psycopg2/sqlite3, geojson, lxml/elementtree, pil, pyexcelerator, scientificpython, and rpy2

    • Kurt,

      I like your suggestions. Here is why I left some of those out:

      pyproj: A very good package, but GDAL compiled with Proj4 support will actually perform transformations. Granted, not for single Points, but most people transform whole files.
      shapely: Indeed it offers some spatial analysis tools, bur OGR does allow the reading/writing of geometries as well. As the analysis tools are limited, I decided not to include it.
      geojson: As mentioned in the post, I avoided web-based packages. This will be included in a future article though.
      lxml: This is covered (to the best of my knowledge) by the xml module that comes standard with Python. Different implementations, but the possibility is there to manipulate xml files through the DOM or Minidom.
      pil: GDAL can export to image formats. If you want to do image manipulation (which is quite possible), indeed a package like PIL would be nice.
      pyexcelerator: xlwt and xlrd are derived from pyexcelerator and are more actively developed.
      scipy: Most manipulations for GIS can be done using numpy, the base package of scipy. Indeed, scipy can add some functionality for you.
      rpy2: Integration with R is important for spatial statistics. Perhaps I should include this indeed, even though I see more people using numpy to do their number crunching by hand. rpy2 is good though.


    • Once upon a time, I worked in a GIS shop, and let me tell you: there was more customer data in spreadsheets than in XML or relational databases, and apparently things haven’t changed all that much.

  2. Thanks for pointing out NetworkX. Very neat.

  3. [...] The Essential Python Modules for GIS The Essential Python Modules for GIS | michalisavraam.org blog [...]

  4. the GEOS bindings, OGR and PostGIS interfaces in *GeoDjango* are worth mentioning.. they are tied to Django, but are clear and thorough

  5. [...] Modules for GIS (on Michalis Avraam’s blog) Posted on August 2, 2010 by Arne This post by Michalis Avraam features a neat selection of Python modules useful to the GIS practitioner. Having experimented [...]

  6. I am working on a large GIS project that will end up using Monte Carlo simulations (either Crystal Ball or At Risk) – to be performed by another person – unless someone knows of the same kind of Analysis in GIS (which would be awesome). Anyway, I have LOTS of parcel data and I need to create a spreadsheet that includes the APN of every parcel being analysed, PLUS, every parcels that borders the original parcel. Does that make sense?
    Does anyone know of a script or macro that could run this portion of my task – b/c there in no feasible way to do this manually.
    I appreciate any help you can provide.

  7. it is very clear .i like it

  8. Does GDAL works with ESRI GRID data?

  9. you can use shapely with shapefiles (via ogr) without problem:
    reading shapefile:
    >>> from osgeo import ogr
    >>> from shapely.wkb import loads
    >>> source = ogr.Open(“testpoly.shp”)
    >>> couche = source.GetLayerByName(“testpoly”)
    >>> for element in couche:
    … geom = loads(element.GetGeometryRef().ExportToWkb())
    … if geom.geom_type == ‘Point’:
    … print geom.type
    … print geom
    … if geom.geom_type == ‘LineString’:
    … print geom.type
    … print geom
    … if geom.geom_type == ‘MultiLineString’:
    … print geom.type
    … print geom
    … if geom.geom_type == ‘MultiPolygon’:
    … print geom.type
    … print geom
    … if geom.geom_type == ‘Polygon’:
    … print geom.type
    … print geom

    POLYGON ((0.0909447004608295 0.8075576036866359, 0.1416359447004608 0.8014746543778801, 0.3606221198156682 0.7021198156682027, 0.2409907834101383 0.5480184331797234, 0.0868894009216590 0.5845161290322580, 0.0544470046082949 0.7426728110599077, 0.0909447004608295 0.8075576036866359))
    POLYGON ((0.2754608294930876 0.7933640552995391, 0.5370276497695853 0.8136405529953916, 0.4173963133640554 0.5926267281105990, 0.2267972350230415 0.6777880184331797, 0.2267972350230415 0.7447004608294931, 0.2754608294930876 0.7933640552995391))
    and same for writing shapefile

Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2010 Michalis Avraam Suffusion theme by Sayontan Sinha