Copyright 2006 Robert J. Glushko
Assembling a Document Model
Traversing the Component Model
Assembly using Core + Contexts
We have identified primitive and aggregate components
We've used heuristic or formal means (or both)
The methods we used and the results reflect the mixture of transactional and non-transactional documents in our context
Number of components
Size of components
Precision of rules for datatypes, associations, cardinality
A relational model simultaneously describes all of the associations among the components; put another way, it doesn't highlight any particular association
But when we exchange information, we do so to satisfy the requirements in some context
If there are multiple ways to interpret the content we will not achieve interoperability
Hierarchies (tree structures) provide unambiguous structures
So we impose a contextual interpretation when we create a hierarchy on a relational model
Document model assembly is the process of creating a model of a document type – hierarchical and nested – by drawing on the "pool" or library of content and structural components
Assembly involves designing (or selecting a pattern for) the top level structure as an entry point and then navigating through the relationships in the conceptual model collecting the components in the order that best satisfies your requirements
Assembly order can differ whenever there is a bi-directional relationship between components – whenever two components are functionally independent, an assembly order chooses one of the relationships to enforce an interpretation on the assembled document
The direction of following the relationship determines which of the structural roles is being used
We end up with a specific context-sensitive view of the model
This is the logical basis of the document schema – all we have left to do is to encode it as an XML schema
The basic problem of assembly is the same for all types of documents but the solution is different at different points on the spectrum
Non-transactional / narrative / publication type documents usually have fewer content-based rules, but their assembly is often shaped by structural or presentation rules
Since transactional documents and data-intensive contexts tend to have more rules, their component models are more complex and there are more alternative document assembly models
These alternate assembly models may differ in which information from the instance they present (they may be queries or views of the instance rather than a one-to-one rendering) and in the order or structure with which they present it.
If the sequence is important it should be a component in the model and assembled in our logical documents (e.g. SequenceNumber in our Lecture Notes example)
The rules represented in the component model must be followed during any document model assembly:
Mandatory associations must be followed
Mandatory components must be included
Optional associations are followed if they meet the requirements for the context.
Optional components are included if they meet the requirements for the context.
Even if one role is the usual or canonical interpretation, it may not be a requirement for the context of this specific document assembly
The structural depth of the document model is determined by how many associations in the component model are followed
The order in which associations are followed determines the nesting or container structure in the model
Requirements for structural or presentation integrity may be more important than content constraints
There are conventional assembly patterns for many types of documents (perhaps these can be viewed as default requirements)
(Maler and el Andaloussi call this the "shape" of the document type)
Some document types seem naturally "flat" – just 2-level deep "list of things" documents
Sometimes documents can be arbitrarily deep with chapter, section, subsection, etc divisions but from a component type perspective this is a simple recursive structure with few or no content distinctions
Structural integrity – requirement to preserve some aspects of structure:
Identical page boundaries for the electronic and printed versions of documents
Chronological order for a narrative biography or history
"Putting it together" instructions (don't want to say "assembly" here) for a bicycle or piece of furniture need to follow the order in which they are most easily or safely put together.
Presentation fidelity – preserve aspects of original presentation:
An extreme requirement, but in some circumstances it is mandated by law to reproduce a document artifact exactly as it appeared in its original printed format. For example, with International Letters of Credit and Bills of Lading you can readily imagine a bank or customs inspector carefully comparing computer-generated and original printed documents.
This is a requirement to assemble the document model in "document order" – that is, to organize the elements so that their valid order matches the order in which they would want them to appear in a document instance
In many domains because of the rich network of associations you can assemble a large number of different document models from the same component model
Determining how many document types to assemble is another design problem in its own right
During a document analysis phase you will create an inventory of existing document types but this isn't necessarily the set of logical document types you'll end up with after you design.
There may be several types of documents that you want to treat as equivalent by assembling a single more general document model, or you could assemble several separate models
This decision has consequences for the implementation model (the DTD or schema):
The software tools you can use for creating and manipulating documents
The authoring or document creation process
The amount of training likely to be required
The flexibility of your system or applications
The amount of validation that is possible
The amount of integration or transformation required on a one-time or recurring basis
The "component model traversal" is a rigorous approach for assembling document models
It is especially appropriate when you've been able to fully or mostly normalize the component model (because there are many constraints or rules about the components and their relationships)
But the normalized model of a complex domain may be very granular with many small groups of components
So we've developed a complementary conception of document model assembly that can improve the manageability and reuse of the model components – core and contexts
The basic idea is that we're identifying components and assembling them from "the document down" rather than from "the components up"
Even a little bit of analysis suggests that there are some components that are useful in lots of different situations or applications or documents ["person," "address," "line item," "event," ...]
So many people and organizations have created models for these standard components or documents
But often each of these components takes on some additional information or structure in each of the different customization contexts
So any set of component types needed to satisfy all of these contexts will contain lots of them that aren't needed in most of them
It seems like a good idea to create the smallest possible core components and leave room for them to be customized or contextualized by additional components
But the set of contexts that emerge is strongly shaped by the document types you are expecting to assemble, and some people object in principle to modeling shaped by implementation considerations
Furthermore, the criteria or heuristics used to decide what "goes together" are informal and don't yield consistent results
But it isn't a question of "either/or" here between the traversal and the c+c approach. Think of them as influences or philosophies or approaches for document assembly that you need to balance. Thinking of modeling and document assembly in different ways can result in a deeper understanding of why and how you got there
Second report from projects
No new reading assigned