There are multiple problems analysts face when they have to deal with processing multiple data files. There is the issue of identifying similarities and commonalities in files, and then of course how to automate the processing so they don’t have to run a program multiple times with the same parameters of separate files. In the world of ESRI’s GIS analysis, this can be performed quite easily with the help of Geoprocessing, either in Python or the Model Builder. Below is sample code that allows the iteration over a number of datasets.
Geoprocessing in Python provides multiple methods that can help one iterate. Depending on the dataset types you have, you can access list of data easily and effortlessly. The following code will show you how this is done in version 9.3, although earlier versions are as easy to use as well.
In order to iterate a process, we need a list of files we will use, and then a series of commands to be run on those files. There are multiple ways to achieve this through the operating system (os and sys modules), but since we do have that pesky little problem of shapefiles having multiple files associated with them, or geodatabases being a single representation for the OS, we need a different solution. And here lies the power of the geoprocessor. It allows one to list data the way ESRI can recognize them: workspaces, (feature) datasets, feature classes, tables and rasters. So let’s begin.
Oftentimes, analysts store data in folders. If you are conducting a study in 10 different regions, it is often the case you save the data files for each region in a different folder. A folder structure like the following is common:
- \myStudy\
- \myStudy\region1\
- \myStudy\region2\
- \myStudy\region3\
- …
We therefore need a method that allows us to go into the master folder (\myStudy\ in this case) and read the files in each folder. To achive this, we will read the names of folders first. The command ListWorkspaces(widcard, type) can help us do that, by accepting a wildcard as a first argument (any combination of * can help narrow things down) and the type of workspace we want (from Access, Coverage, FileGDB Folder, SDE or ALL). An example follows:
import arcgisscripting # Import the geoprocessor capabilities
gp = arcgisscripting.create(9.3) # Create the geoprocessor object
masterFolder = r"path\to\folder" # Define the master folder
gp.workspace = masterFolder # Define the location to run the commands on
workspaces = gp.ListWorkspaces("*", "Folder") # List all FOLDERS in master workspace
for workspace in workspaces: # Iterate through the workspaces one at a time
print workspace # Print the workspace name
# Do you work here
At this point, you have a Python list object that holds the names of all the subfolders that you may want to use. The next step is to identify a list of all data within that folder. To achieve that, we use one of the following commands, depending on the type of data we have: ListDatasets(wildcard, type), ListFeatureClasses(wildcard, type), ListTables(wildcard, type) or ListRasters(wildcard, type). The wildcard is defined as above, but the type is different. Type actually refers to the type of listing. For datasets, the type is feature,TIN, raster, CAD or ALL. For feature classes the types are Annotation, arc, dimension, label, line, node, point, polygon, region, route or tic. Table types can be DBF, INFO or ALL. Similarly, raster types can be BMP, GIF, IMG, JP2, JPG, PNG, TIFF, GRID or ALL. We will contnue our example by searching for feature classes (shapefiles) in the folders we identified above.
import arcgisscripting # Import the geoprocessor capabilities
gp = arcgisscripting.create(9.3) # Create the geoprocessor object
masterFolder = r"path\to\folder" # Define the master folder
gp.workspace = masterFolder # Define the location to run the commands on
workspaces = gp.ListWorkspaces("*", "Folder") # List all FOLDERS in master workspace
for workspace in workspaces: # Iterate through the workspaces one at a time
gp.workspace = workspace # Change to the new workspace
fcs = gp.ListFeatureClasses() # No parameter defaults to all
for fc in fcs: # Iterate through each file in the folder now
print fc # print the feature class
# Do individual file work here
As you can see we managed to begin with a master folder and continue to analyze individual files within all subfolders of the initial master folder. This is all done in a few lines of code, minimize complexity and confussion for us. In less that 6 lines of Python code you can begin processing each file in your computer quickly and easily.
While listing of other types of files has not been shown, it is quite easy to do by using the appropriate command (as mentioned above). Dealing with databases is similar, as you can imagine and use the database as your master folder. Do not forget that “subfolders” in the database are called datasets, so you will need to use ListDatasets() instead of ListWorkspaces().
I hope this will be useful to some of you out there.




(No Ratings Yet)Related posts:

(4.50 out of 5)
Definitely useful! Thanx.
One question for you. I’m pretty new in geoprocessing in Python for ArcGis and I can’t find a good reference for all the methods availble to geoprocessor. What kind of reference do you use?
Also I don’t quite understand the Geoprocessor Programming Model. Do you have any suggestion where to look for help?
Matej,
There are two types of methods available to the geoprocessing object in Python right now. There are the internal geoprocessing commands, which are what the Geoprocessor Programming Model characterizes, and the geoprocessing tools, which are the same tools available in the ArcToolbox. Unfortunately, the only documentation I can think of about these are from ESRI.
The Geoprocessing Quick Reference guide (PDF) provides a list of all tools available and a short description of each tool. This is similar as running the gp.Usage() on each tool returned by gp.ListTools().
For the programming model… It is hard for my students to grasp it also, and the best explanation I can give is as follows:
The bottom left box labeled arcgisscripting defines the commands available through the geoprocessor (your gp object).
If one of the tools in the above list has an arrow connecting it to another box, it means that the tool returns an object of that type.
For example, to access the geometry of any object, you need to traverse through the following (from the left of the diagram to the right):
gp.SearchCursor() – returns a SearchCursor
SearchCursor.Next() – returns a Row
Row.GetValue() – returns a value from the row, which includes the geometry (called shape)
I hope this is slightly more helpful than the documentation out there.
Hello from Russia!
Can I quote a post in your blog with the link to you?
Hello from the USA. Feel free to do so, and thank you for visiting.