It is often the case that the freely available data online are in Excel format. If one has Excel, then one has the ability to do some sort of basic manipulation of the files. But if Excel is not available, or your analysis software does not read Excel files, there is another way: use Python to manipulate Excel files.
Before continuing, let’s discuss the basic idea of Excel. This programs stores data in what is called a Workbook. Think of this as the file on your computer (the .xls). A Workbook can then have one or more Sheets, the little tabs on the bottom left corner usually. Within each Sheet, you can find an arrangement of Cells in a matrix form, which are referenced by Column Name and Row Number (so the top left cell is A1, the one directly to its right is A2, the one below is B2, etc. The image should help shed some light if you are not familiar with the concept of spreadsheets.
- Open a workbook using book = xlrd.open_workbook(“filetoread.xls”)
- Access a sheet in the workbook using sheet = book.sheet_by_index(0) to open the first sheet, or if you know the name, sheet = book.sheet_by_name(“Name of Sheet”)
- Retrieve data using any of the multiple methods: cellValue =sheet.cell_value(rowx, colx), or to retrieve a whole column use colValues = sheet.col_values(colx, start_rowx=0, end_rowx=None) or the retrieve a whole row use rowValues = sheet.row_values(rowx, start_colx=0, end_colx=None)
The basic idea is very simple, and we can easily write a little Python script that will allow us to read Excel files without the need to use Excel itself. The following example will offer a preview of an Excel file’s first sheet using the first 5 rows and columns of a file.
import xlrd # Import the package book = xlrd.open_workbook(&amp;quot;sample.xls&amp;quot;) # Open an .xls file sheet = book.sheet_by_index(0) # Get the first sheet for counter in range(5): # Loop for five times # grab the current row rowValues = sheet.row_values(counter,start_col=0, end_colx=4) # Print the values of the row formatted to 10 characters wide print &amp;quot;%-10s | %-10s | %-10s | %-10s | %-10s&amp;quot; % tuple(rowValues) # Print row separator print &amp;quot;-&amp;quot; *62
That is all there is to it. Quick, fast and no need for Excel whatsoever. Sample output is presented below: