Pulling data from Excel using Python, xlrd

Excel’s data format is everywhere in the business world, and more often than not, you’ll wind up getting chunks of data from people dumped into an Excel file. Sure, CSV tends to be the “lowest common denominator” exchange format for data between software, but that’s quirky for its own reasons.

One way to pull data off Excel using Python is to use the win32 extensions and go through COM. However, this requires you to have a copy of Excel installed, and isn’t all that fast if you’re doing large batches of data grabbing.

The other way is to use the wonderful xlrd package. xlrd is great because it’s pure python, can run anywhere Python 2.1 and higher runs. It handles many of the quirks of the data format (such as the date epochs, encodings). Finally, perhaps most important for people in the business world, it’s BSD licensed, so we can include it in any products we make without running into any GPL conflicts.Of course, xlrd isn’t all powerful. It won’t handle password protected files, show charts/pictures/etc, display formulas (it will display the saved result), do any VBA-related things, display comments and hyperlinks. Also, the new Excel 2007 format isn’t supported as of this writing, but it’ll handle just about all the formats before that point.

The API for the module is pretty large, with lots of things you didn’t think you’d need, so take a look at the docs when you have time.

1
2
3
4
5
6
7
8
9
10
11
12
13
import xlrd
book = xlrd.open_workbook("data.xls") #open our xls file, there's lots of extra default options in this call, for logging etc. take a look at the docs
 
sheet = book.sheets()[0] #book.sheets() returns a list of sheet objects... alternatively...
sheet = book.sheet_by_name("qqqq") #we can pull by name
sheet = book.sheet_by_index(0) #or by the index it has in excel's sheet collection
 
r = sheet.row(0) #returns all the CELLS of row 0,
c = sheet.col_values(0) #returns all the VALUES of row 0,
 
data = [] #make a data store
for i in xrange(sheet.nrows):
  data.append(sheet.row_values(i)) #drop all the values in the rows into data

Note, there’s a difference between CELLS and VALUES. xlrd has a Cell class that encodes information about the cell in addition to its value, such as the data type (number, date, text, etc). Be careful about this when doing operations, for example:

1
2
3
 #if sheet.col(0)[1] returns a cell with 'number:2.0'
sheet.col(0)[1] + 2 # will throw a TypeError.
sheet.col(0)[1].value + 2 #will yield the expected 4.0

There’s many more goodies in the xlrd API, but even with just this little, you can pull out the tables you need with a little bit of work.

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Pulling data from Excel using Python, xlrd

  1. Clark Phillips says:

    Thanks for the example. This has gotten me started. Your example uses a .xls doc. Can we somehow open and manipulate a .xlsx doc?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>