It is often the case that the freely available data online are in Excel format. If one has Excel, then one has the ability to do some sort of basic manipulation of the files. But if Excel is not available, or your analysis software does not read Excel files, there is another way: use Python to manipulate Excel files.

excelPreviewBefore continuing, let’s discuss the basic idea of Excel. This programs stores data in what is called a Workbook. Think of this as the file on your computer (the .xls). A Workbook can then have one or more Sheets, the little tabs on the bottom left corner usually. Within each Sheet, you can find an arrangement of Cells in a matrix form, which are referenced by Column Name and Row Number (so the top left cell is A1, the one directly to its right is A2, the one below is B2, etc. The image should help shed some light if you are not familiar with the concept of spreadsheets.

In order to read Excel sheets in Python without using any Microsoft Office code, we need to use the excellent xlrd packge by Lingfo. The basic workflow is as follows:

  1. Open a workbook using book = xlrd.open_workbook(“filetoread.xls”)
  2. Access a sheet in the workbook using sheet = book.sheet_by_index(0) to open the first sheet, or if you know the name, sheet = book.sheet_by_name(“Name of Sheet”)
  3. Retrieve data using any of the multiple methods: cellValue =sheet.cell_value(rowx, colx), or to retrieve a whole column use colValues = sheet.col_values(colx, start_rowx=0, end_rowx=None) or the retrieve a whole row use rowValues = sheet.row_values(rowx, start_colx=0, end_colx=None)

The basic idea is very simple, and we can easily write a little Python script that will allow us to read Excel files without the need to use Excel itself. The following example will offer a preview of an Excel file’s first sheet using the first 5 rows and columns of a file.

import xlrd # Import the package
book = xlrd.open_workbook("sample.xls") # Open an .xls file
sheet = book.sheet_by_index(0) # Get the first sheet
for counter in range(5): # Loop for five times
# grab the current row
rowValues = sheet.row_values(counter,start_col=0, end_colx=4)
# Print the values of the row formatted to 10 characters wide
print "%-10s | %-10s | %-10s | %-10s | %-10s" % tuple(rowValues)
# Print row separator
print "-" *62

That is all there is to it. Quick, fast and no need for Excel whatsoever. Sample output is presented below:


1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.00 out of 5)
Loading ... Loading ...

Related posts:

  1. Manipulating Excel files using Python part 2: Writing Excel Files
  2. The Essential Python Modules for GIS
  3. Python and Geography: Input Output Models and Graph Theory
  4. Geoprocessing Iteration with Python
  5. Understanding the Geoprocessor Programming Model part 3

3 Responses to “Manipulating Excel files using Python part 1: Reading Excel Files”

  1. [...] reusable code by other users (in the form of modules) to achieve things like reading Excel files (click here for an [...]

  2. At 68 years I have developed interest in acquiring GIS skills to pursue a Peace Corps assignment. I did not realize coding is required as Accounting field of study has programs to calculate and perform functions such as income tax computation. Am I safe to state that GIS study require coding ability?
    Does a national certification examination exist to met GIS standards as like a CPA exam?
    Walter Feszchak

    • It is safe to say that some coding knowledge is needed for GIS work. Most analyses these days are not the simple point-and-click sort, but rather an amalgamation of different smaller individual pieces. Coding would allow you to perform that effortlessly for large data sets.

      As for a national certification… There is the GISP certificate by the GIS Certification Institute (and a few other attempts if I am not mistaken) but I have never seen a job posting requiring a certification. Any GIS certificate or degree is usually enough for employees to verify your knowledge.

Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2010 Michalis Avraam Suffusion theme by Sayontan Sinha