With ESRI’s use of Python as their scripting language and the proliferation of open source GIS, Python became one of the required languages for GIS developers and hobbyists alike. What makes Python powerful is well documented throughout the web, but I want to highlight one very important aspects of Python today: Python Modules.
Python Modules are code someone else has written and distributed, in order to make life easier for the rest of us. You may be familiar with the standard modules that come with Python, like math or datetime, but there are numerous more resources out there for the GIS minded developers. I will be discussing some of the modules I find essential in my work apart from the famous ArcGISScripting module by ESRI: GDAL, numpy, NetworkX, xlrd and xlwt. Let’s dive in!
GDAL – Geospatial Data Abstraction Layer
It will come a time in every GIS Professional’s career when they will need to quickly access information from a random shapefile they have, but do not have access to any GIS software or geoprocessing functionality (think of a laptop on the road, a remote machine not running Windows, etc). GDAL comes to the rescue, providing us such functionalities.
GDAL is a translator library with Python bindings that allows access to raster data using a unified abstract layer. Bundled with it is OGR, which provides similar functionality for vector data. Download it here.
A quick example of using GDAL:
import gdal from gdalconst import * # Open the raster dataset dataset = gdal.Open(filename, GA_ReadOnly) # Print the projection of the data print dataset.GetProjection()
import ogr # Get the driver driver = ogr.GetDriverByName('ESRI Shapefile') # Open a shapefile dataset = driver.Open(shapefileName, 0)
numpy – Numerical Python
I cannot think of any GIS practitioner that did not have to manipulate raster data in a peculiar way, only finding that the software at hand doesn’t allow you to perform many customized functions. If one can interpret raster data (which GDAL above helps us with), then one can use them in Python as a matrix (algebraic matrix). numpy is the best Python package out there for this sort of situation.
numpy is a package that enables n-dimensional array manipulation in Python, as essential part of any scientific endeavor. It also provides linear algebra functionalities, Fourier transforms and random number generators. Get it here.
An example of the use of numpy:
from numpy import * # Sample IO Table data ioSample = [[1,2], [3,4]] # Turn into a numpy array ioMatrix = array(ioSample) # Find the inverse of ioMatrix ioMatrixInv = linalg.inv(ioMatrix)
NetworkX – Complex Networks Analysis
While most GIS software out there provides the ability to build networks, sometimes it is easier to build networks quickly and dirty, without having to involve complex GIS software. An analysis of participation by space in an experiment can easily be achieved using the simple, yet powerful NetworkX module.
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is hosted by the Los Alamos National Laboratory, and sees active development (presumably sponsored somehow by Los Alamos). Download it here.
A quick sample is shown below:
import networkx as nx # Create a graph g = nx.Graph() # Populate the graph g.add_node(1) g.add_node(2) g.add_node(3) # Create edges g.add_edge(1,2) g.add_edge(1,3) # Print the neighbors of node 1 (returns 2) print g.neighbors(1)
xlrd – Excel™ File Reader
All GIS practitioners have been sent “GIS data” in an Excel file, either a geocoding result or GPS waypoints, or anything similar. While ideally whoever sent the data would be educated on why it is a bad idea, most often we have to deal with the data without any additional help. xlrd comes into play, allowing you to read the said Excel formatted data into Python with little effort.
xlrd is a Python module that allows one to read Excel files without the need of Microsoft Excel or Windows. It provides access to XLS files for Microsoft Office 2003 or earlier. Download it here.
A quick example that will read an XLS file and print it to screen:
import xlrd # Open the Excel file book = xlrd.open_workbook("excelFile.xls") # Read the first sheet in the Excel workbook sheet = book.sheet_by_index(0) # Read the first row from column A to E rowValues = sheet.row_values(0, start_colx=0, end_colx=4) # Print the row values for value in rowValues: print value
xlwt – Excel™ file writer
Business requirements often want results in an Excel file, so some other person, in another department, can run some sort of analysis on the data. Building a distance matrix is fine, but Joe from accounting is using complicated Excel spreadsheets and does not want to bother with database connections or DBF files. This is a situation in which xlwt excels, writing data to an Excel spreadsheet without the need of Excel or manipulation by the mouse.
xlwt is a Python module that, similarly to xlrd mention above, allows for cross platform Excel file creation without the need of Microsoft Office. Download it here.
An example follows:
import xlwt # Create a new workbook book = xlwt.Workbook() # Add a new sheet sheet = book.add_sheet("My Sheet") # Write the number 5 in the first row, first column sheet.write(0, 0, 5) # Save the file book.save("myExcelFile.xls")
These are my essential Python modules for GIS development. Notice I left quite a few behind, especially those relating to web-based GIS developments. Those will appear in another post in the future. But please, do share your thoughts on what Python modules are essential for your GIS work.