The Stingray Schema-Based File Reader¶
The Stingray Reader tackles four fundamental issues in processing a file:
- How are the bytes organized? What is the Physical Format?
- Haw are the data objects organized? What is the Logical Layout?
- What do the bytes mean? What is the Conceptual Content?
- How can we assure ourselves that our applications will work with this file?
The problem we have is that the schema is not always bound to a given file nor is the schema clearly bound to an application program. There are two examples of this separation between schema and content:
- We might have a spreadsheet where there aren’t even column titles.
- We might have a pure data file (for example from a legacy COBOL program) which is described by a separate schema.
One goal of good software is to cope reasonably well with variability of user-supplied inputs. Providing data by spreadsheet is often the most desirable choice for users. In some cases, it’s the only acceptable choice. Since spreadsheets are tweaked manually, they may not have a simple, fixed schema or logical layout.
A workbook (the container of individual sheets) can be encoded in any of a number of physical formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications to be independent of these physical formats. We’d like to focus on the logical layout.
Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a required schema.
A COBOL file parallels a workbook sheet in several ways. It also introduces some unique complications. We’d like to provide a suite of tools that work well with common spreadsheets as well as COBOL files, allowing some uniformity in processing various kinds of data.
Technology¶
While this is 100% Python, it’s not simply Python. The actual code is built from this document.
Warning
The Code Did Not Come First
The document doesn’t follow behind the code. The document contains the code. The code is extracted from the document. For details, see the Stingray Build section.
Contents¶
- 1. Introduction
- 2. Design Considerations
- 3. The
stingray
Package - 4. Cell Module – Data Element Containers and Conversions
- 5. Sheet Module – Sheet and Row Access
- 6. Schema Package – Schema and Attribute Definitions
- 7. Schema Loader Module – Load Embedded or External Schema
- 8. Workbook Package – Uniform Wrappers for Workbooks
- 9. The “Other” Modules: snappy and protobuf
- 10. The COBOL Package
- 11. The Stingray Developer’s Guide
- 12. Stingray Demo Applications
- 13. History
- 14. Testing
- 15. Stingray Build
- 16. Installation via
setup.py
- 17. Licensing
License¶
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The TODO List¶
Todo
Index by name and path, also.
This will eliminate some complexity in COBOL schema handling where we create the a “schema dictionary” using simple names and path names.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/schema.rst, line 346.)
Todo
EBCDIC File V format with Occurs Depending On to show the combination.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol.rst, line 489.)
Todo
Test EXTERNAL, GLOBAL as Skipped Words, too.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/testing/cobol_loader.rst, line 1038.)
Todo
Additional Numbers13_Workbook Feature
Translate Formula and Formula error to Text
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/numbers_13.rst, line 31.)
Todo
Refactor this, it feels clunky.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/workbook/ods.rst, line 139.)
Todo
refactor setSizeAndOffset()
Refactor setSizeAndOffset()
into the Allocation
class methods
to remove isinstance() nonsense.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1224.)
Todo
Fix performance.
This is called once per row: it needs to be simpler and faster. Some refactoring can eliminate the if statements.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_defs.rst, line 1236.)
Todo
Unit test cases for the hashable interface of Cell
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cell.rst, line 265.)
Todo
Refactor these into the schema
module.
These functions are used to define schema, not process Cell objects per se.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cell.rst, line 481.)
Todo
88-level items could create boolean-valued properties.
(The original entry is located in /Users/slott/Documents/Projects/Stingray-4.4/source/cobol_loader.rst, line 381.)