The Stingray Schema-Based File Reader

The Stingray Reader tackles four fundamental issues in processing a file:

  • How are the bytes organized? What is the Physical Format?
  • Haw are the data objects organized? What is the Logical Layout?
  • What do the bytes mean? What is the Conceptual Content?
  • How can we assure ourselves that our applications will work with this file?

The problem we have is that the schema is not always bound to a given file nor is the schema clearly bound to an application program. There are two examples of this separation between schema and content:

  • We might have a spreadsheet where there aren’t even column titles.
  • We might have a pure data file (for example from a legacy COBOL program) which is described by a separate schema.

One goal of good software is to cope reasonably well with variability of user-supplied inputs. Providing data by spreadsheet is often the most desirable choice for users. In some cases, it’s the only acceptable choice. Since spreadsheets are tweaked manually, they may not have a simple, fixed schema or logical layout.

A workbook (the container of individual sheets) can be encoded in any of a number of physical formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications to be independent of these physical formats. We’d like to focus on the logical layout.

Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a required schema.

The TODO List

Todo

Test hashable interface of Cell

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cell.rst, line 214.)

Todo

refactor setSizeAndOffset()

Refactor setSizeAndOffset() into the Allocation class methods to remove isinstance() nonsense.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1196.)

Todo

Fix performance.

This is called once per row: it needs to be simpler and faster. Some refactoring can eliminate the if statements.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1208.)

Todo

88-level items could create boolean-valued properties.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_loader.rst, line 381.)

Todo

Index by name and path, also.

This will eliminate some complexity in COBOL schema handling where we create the a “schema dictionary” using simple names and path names.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/schema.rst, line 334.)

Todo

EBCDIC File V format with Occurs Depending On to show the combination.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol.rst, line 472.)

Todo

Test EXTERNAL, GLOBAL as Skipped Words, too.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol_loader.rst, line 997.)

Todo

Refactor workbook package

This module needs to be rebuilt into a package which imports a number of subsidiary modules. It’s too large as written.

Adding Numbers ‘13 will make this module even more monstrous. Adding future spreadsheets will on exacerbate the problem.

It should become (like cobol) a high-level package that imports top-level classes from modules within the package.

from workbook.csv import CSV_Workbook

from workbook.xls import XLS_Workbook

... etc. ...

This should make a transparent change from module to package.

The top-level definition for cobol.Workbook must to be refactored into a base module that can be shared by all the modules in the package that extend this base definition.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/init.rst, line 170.)

Todo

Additional Numbers13_Workbook Feature

Translate Formula and Formula error to Text

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/numbers_13.rst, line 27.)

Todo

Refactor this, it feels clunky.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/ods.rst, line 128.)

Indices and Tables

Table Of Contents

Next topic

1. Introduction

This Page