8.4.1. Workbook __init__ Module – Wrapper for all implementations¶
A few Python overheads that we put in the
module of this package. Our goal is to make it so that only
the top-level package is imported; the individual workbook modules
are not generally expected to be used by an application.
"""stingray.workbook -- Opens workbooks in various formats, binds their associated schema, accesses them as Sheets with Rows and Cells. This is a kind of **Wrapper** or **Facade** that unifies :py:mod:`csv` and :py:mod:`xlrd`. It handles a number of file formats including :file:`.xlsx`, :file:`.ods`, and Numbers. """
In order top open files of various types, we’ll bring in a number of helpful modules.
import xml.etree.cElementTree as dom from collections import defaultdict import zipfile import datetime from io import open import os.path import pprint import re import glob import logging import decimal
import stingray.cell import stingray.sheet import stingray.schema.loader
We’ll explicitly import the top-level class definition for each
flavor of Workbook we can support. Because we import these here, these
classes will be available with a simple import of
from stingray.workbook.csv import CSV_Workbook from stingray.workbook.xlsx import XLSX_Workbook from stingray.workbook.ods import ODS_Workbook from stingray.workbook.numbers_09 import Numbers09_Workbook from stingray.workbook.numbers_13 import Numbers13_Workbook from stingray.workbook.fixed import Fixed_Workbook
UnknownFormatexception is raised when a workbook can’t be opened.
class UnknownFormat( Exception ): """The workbook can't be opened.""" pass
No_Schemaexception is raised if there’s a problem locating an external schema for a workbook.
class No_Schema( Exception ): """A valid schema could not be loaded.""" pass
22.214.171.124. Optional Modules¶
We can’t guarantee that
xlrd is available. Also, old
.xls files are
becoming less frequently used, so we’re making this optional.
try: from stingray.workbook.xls import XLS_Workbook except ImportError: from stingray.workbook.base import Workbook class XLS_Workbook( Workbook ): """No ``xlrd`` Available.""" def __init__( self, *args, **kw ): raise UnknownFormat
126.96.36.199. Workbook Subclasses¶
We have a number of concrete subclasses of
These are imported from submodules and made visible in this module.
workbook.CSV_Workbook. This is a degenerate case, where the workbook appears to contain a single sheet. This sheet is the CSV file, accessed via the built-in
workbook.XLS_Workbook. This is the workbook as processed by
xlrd. These classes wrap
xlrdclasses to which the real work is delegated. This is optional – if
xlrdis not installed, things will work, but these files cannot be opened.
workbook.XLSX_Workbook. This is the workbook after unzipping and using an XML parser on the various document parts. Mostly, this is a matter of unzipping and parsing parts of the document to create a DOM which can be traversed as needed.
workbook.Numbers09_Workbook. This handles the iWork ‘09 Numbers files with multiple workspaces and multiple tables in each workspace.
workbook.Numbers13_WorkbookThese handle the iWork ‘13 Numbers files with multiple workspaces and multiple tables in each workspace.
workbook.Fixed_Workbook. This is actually a fairly complex case. The workbook will appear to contain a single sheet; this sheet is the fixed format file. Schema information was required up front, unlike the other formats.
Further extensions will handle various kinds of COBOL files. They’re similar to Fixed Workbooks. See The COBOL Package.
Each of these is a context manager, so we include the necessary methods.
Note that workbooks are rarely simple files. Sometimes they are ZIP archive members. Sometimes, they must be processed via gzip. Sometimes they involve Snappy compression.
In order to minimize the assumptions, we try to handle two forms of file processing:
- By name. In this case, the file name is provided. The file is opened and closed by the Workbook using the context manager interface.
- By file-like object. An open file-like object is provided. No additional context management is performed. This is appropriate when a workbook is itself a member of a larger archive.
188.8.131.52. Workbook Factory¶
This is the factory which creates a subclass of
Workbook for a
a given file.
An opener Factory class. A subclass can extend this to handle other file extensions and physical formats.
class Opener: """An extensible opener that examines the file extension and locates a proper Workbook subclass. """ def __call__( self, name, file_object=None, schema_path='.', schema_sheet= None, **kw ): """Open a workbook. :param name: filename to open. :param file_object: File-like object to process. If not provided the named file will be opened. :keyword schema_path: Directory with external schema files :keyword schema_sheet: A sheet in an external schema workbook. """ _, ext = os.path.splitext( name ) ext = ext.lower() if ext == ".xls": return XLS_Workbook( name, file_object ) elif ext in ( ".xlsx", ".xlsm" ): return XLSX_Workbook( name, file_object ) elif ext in ( ".csv", ): return CSV_Workbook( name, file_object, **kw ) elif ext in ( ".tab", ): return CSV_Workbook( name, file_object, delimiter='\t', **kw ) elif ext in ( ".ods", ): return ODS_Workbook( name, file_object ) elif ext in ( ".numbers", ): # Directory? It's Numbers13_Workbook; Zipfile? It's Numbers09_Workbook if os.path.is_dir( name ): return Numbers13_Workbook( name, file_object ) else: return Numbers09_Workbook( name, file_object ) else: # Fixed format files with no specific extension # Ideally :file:`somefile.schema` is the file # and :file:`schema.csv` or :file:`schema.xlsx` can be tracked down. schema_pat= os.path.join(schema_path, ext[1:]+".*") schema_choices= glob.glob( schema_pat ) if schema_choices: schema_name= schema_choices schema_wb= open_workbook( schema_name ) esl= stingray.schema.loader.ExternalSchemaLoader( schema_wb, schema_sheet ) schema= esl.schema() return Fixed_Workbook( name, file_object, schema=schema ) else: raise No_Schema( schema_pat )
open_workbook(name, file_object, schema_path, schema_sheet)¶
Open a workbook.
For fixed format files, we attempt to track down and load the relevant schema file. The idea here is that a file’s extension can map to the schema’s filename.
somefile.schemawould use a
schema.csvworkbook as it’s schema. We’ll simply try the first file that matches
schema.*to see if it’s a workbook we can open.
- name – The name of the file.
- file_object – (optional) already opened file object.
- schema_path – (optional) filename for an external schema file.
- schema_sheet – (optional) name of a sheet with a schema.
For fixed format files, we attempt to track down and load the relevant schema. An application might have narrower and more specific rules for binding file and schema.
When creating a subclass, use the Chain of Command pattern. This allows a user to create subclasses to handle the various other file name extensions. Here’s an example:
class MyOpener(workbook.Opener): def __call__(self, name, file_object=None, schema_path='.', schema_sheet=None, **kw ): if fnmatch(name, "*.dat"): esl= stingray.schema.loader.ExternalSchemaLoader( os.path.join(schema_path, "schemafile.csv") ) schema= esl.schema() return CSV_Workbook( name, file_object, schema=schema, delimiter="|" ) return super().__call__(name, file_object, schema_path, schema_sheet, **kw )
There may be application-specific rules, or command-line options that will determine a mapping bewtween filename and physical format or filename and schema.