The Stingray Schema-Based File Reader

The Stingray Reader tackles four fundamental issues in processing a file:

  • How are the bytes organized? What is the Physical Format?
  • Haw are the data objects organized? What is the Logical Layout?
  • What do the bytes mean? What is the Conceptual Content?
  • How can we assure ourselves that our applications will work with this file?

The problem we have is that the schema is not always bound to a given file nor is the schema clearly bound to an application program. There are two examples of this separation between schema and content:

  • We might have a spreadsheet where there aren’t even column titles.
  • We might have a pure data file (for example from a legacy COBOL program) which is described by a separate schema.

One goal of good software is to cope reasonably well with variability of user-supplied inputs. Providing data by spreadsheet is often the most desirable choice for users. In some cases, it’s the only acceptable choice. Since spreadsheets are tweaked manually, they may not have a simple, fixed schema or logical layout.

A workbook (the container of individual sheets) can be encoded in any of a number of physical formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications to be independent of these physical formats. We’d like to focus on the logical layout.

Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a required schema.

A COBOL file parallels a workbook sheet in several ways. It also introduces some unique complications. We’d like to provide a suite of tools that work well with common spreadsheets as well as COBOL files, allowing some uniformity in processing various kinds of data.

Technology

While this is 100% Python, it’s not simply Python. The actual code is built from this document.

Warning

The Code Did Not Come First

The document doesn’t follow behind the code. The document contains the code. The code is extracted from the document. For details, see the Stingray Build section.

The TODO List

Todo

Index by name and path, also.

This will eliminate some complexity in COBOL schema handling where we create the a “schema dictionary” using simple names and path names.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/schema.rst, line 346.)

Todo

EBCDIC File V format with Occurs Depending On to show the combination.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol.rst, line 489.)

Todo

Test EXTERNAL, GLOBAL as Skipped Words, too.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol_loader.rst, line 1038.)

Todo

Additional Numbers13_Workbook Feature

Translate Formula and Formula error to Text

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/numbers_13.rst, line 31.)

Todo

Refactor this, it feels clunky.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/ods.rst, line 139.)

Todo

refactor setSizeAndOffset()

Refactor setSizeAndOffset() into the Allocation class methods to remove isinstance() nonsense.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1224.)

Todo

Fix performance.

This is called once per row: it needs to be simpler and faster. Some refactoring can eliminate the if statements.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1236.)

Todo

Unit test cases for the hashable interface of Cell

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cell.rst, line 265.)

Todo

Refactor these into the schema module.

These functions are used to define schema, not process Cell objects per se.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cell.rst, line 481.)

Todo

88-level items could create boolean-valued properties.

(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_loader.rst, line 381.)

Indices and Tables