15. Document Inventory
DE + IA (INFO 243) - 7 March 2007
Bob Glushko
Plan for Today's Class
- Requirements Categories in Document-Intensive Contexts
- Recognizing Documents
- Collecting the Inventory
- Analyzing the Inventory
- Strategic and Tactical Document Analysis
Who Performs Document Analysis?
- Standard approach is facilitation by document analysis experts in face-to-face "workshops" with broad participation
- Document creators/users reach consensus with expert help, and then experts systematize it into models and schemas
- Document analysis is often carried out as a consulting engagement – with all the complications of defining the project, managing expectations and relationships, and packaging the results for effective use
Creators/Users in Document Analysis
- What will they know?
- What won't they know?
- What factors will constrain their interactions with you?
Experts / Consultants in Document Analysis
- This is YOUR role
- What will you know?
- What won't you know?
- What factors will constrain your interactions with others?
Requirements Categories in Document-Intensive Contexts
- Solution requirements – the functional, performance, quality attributes
- Information or data requirements – what information is needed, what are its
datatypes, possible values – the document component model
- Document or structure requirements – how is the information organized / assembled / packaged into
sets of related information – the document assembly model
- Processing and usage requirements – what relationships between documents have
a business purpose, what constraints on access or presentation (or on the relationship
between logical and presentation models) are mandated by business relationships – process / choreography / orchestration model or adjuncts to document models
- Presentation or syntactic requirements – how is the information
presented or formatted or rendered – the
physical or output model
YART from Chapter 8 of Document Engineering
- Rules that Apply to Conceptual Models
- Rules that (Can Also) Apply to Physical Models
- Syntactic
- Processing
- Presentational
- Rules that Apply to Instances or Implementations
Requirements in the Model Matrix
Context Dimensions x Rule Types
What a "Document" Is [1]
- Every major advance in transportation, communications, manufacturing, financial technology or "governance" has required new types of documents
- But the basic idea of a document has been surprisingly stable for a couple of millennia
- A document is a self-contained package of related information
- Documents organize business interactions around the information needed to carry out transactions
- Documents are the inputs and outputs of business processes
What a "Document" Is [2]
- In most Document Engineering efforts a critical step is creating a document inventory and classifying the "documents" you locate
- You need to take a very broad view about what's a document because much of what's important to analyze isn't a traditional document
- Much of what we analyze comes from people or systems or machines, and the lines between "requirements analysis," "document analysis," and "user-centered design" aren't always sharp
- You can think of what you learn from people as instances of "interview" or "observation" document types
Recognizing Documents [1]
- Documents are packages used for exchanging information.
- Packages may be:
- Paper form (printed/written, formal/informal)
- Digital form (computer files, structured/unstructured, databases)
- Exchanges may be:
- Messages (emails, EDI)
- Online or Web
- Postal, Fax
- Do you learn the same things from a printed/rendered document and its digital source?
Recognizing Documents [2]
- Sets of data in databases, spreadsheets, accounting systems
- Completed Printed forms
- Job aids, "cheat sheets," sticky notes and other informal or unofficial documents
- Lots of undocumented information in people's heads that you write down after talking to them
Document Types
- Do you learn the same things from document instances and document types?
- Blank Printed forms
- Web forms
- Database schemas
- Documents that describe APIs or maybe even the code that implements them
- Style sheets or templates in office applications
Finding the Right Documents for the Inventory
- Not all types of documents are equally important; is a document intrinsic to a business process or a derivative/aggregate of it?
- If there are many instances of a particular type, we might have to be concerned about representiveness and selection biases
- Don't assume that job titles and formal organizational structure reflect what people actually do
- Don't assume that the names given to documents fit the people, tasks, and organizations in which we locate them
- Regardless of its title, make sure a document is being used before you conclude it is important
Names for Document Types and Instances
- Sometimes there are rules for names of document types
- Sometimes there are rules for names of document instances
- Sometimes the names of document types or instances aren't informative
- Names are just one kind of metadata attached to document instances; there is lots more
Iteration in Document Inventory
- Identifying all the potentially relevant documents or information sources is inherently an iterative task
- Documents may refer or link to other documents
- Documents may refer to people, who can refer to other documents or people
- Developing a causal model of the domain can help identify the intrinsic documents
- Where are the "headwaters" for the information -- what events or processes cause it to be created?
- A causal analysis can suggest other correlated information "streams" that merge with the primary source you've identified
Using Process Patterns to Find Documents: The Document Checklist
Analyzing the Document Inventory
- You need to arrange the results of your inventory so you can think about it as a whole and in parts
- What aspects of documents vary systematically across the categories in the inventory?
- What other aspects of documents vary, but not systematically across the categories?
- We need some concepts and vocabulary for answering these questions
Categories of Document Types
- There are a few hundred common types of documents used in business transactions
- But transactions are just one category of document types
- Other categories with many distinct types include:
- Software and system documentation
- Procedures, policies, laws, and regulations
- Reference books, encyclopedias, dictionaries
- Catalogs
- Organizations often use or produce multiple document types within the same category
Document "Collections" or "Chains" or "Clusters" or "Complements"
- Some sets of document types in an inventory are related to each other
- Some document types are themselves sets of documents of another type
- Other document types fit together in a kind of sequential or process relationship where information flows from one to another in the normal way in which they are used or created
- Transactional documents often come in pairs that must be correlated
- Documents can have complementary (they are useful together) or uncomplementary (they are not useful together) relationships, and the relationships aren't necessarily symmetric (depends on the perspective of the primary document for the user's activity or process)
The Document Type Spectrum
Systematic Variation in Document Types Across the Spectrum
- Instances more heterogeneous on narrative end
- Types are "broader" and more descriptive, less prescriptive on narrative end
- The set of content types within a document type is much greater on the transactional end because the leaves aren't "just text"
- More need for "metadata" augmentation of documents on narrative end, because on transactional end what would be metadata is more likely to be explicitly contained in the content already
- Presentational information more likely to be correlated with content and structure on narrative end
Organizing the Inventory
- For every document or information source you should collect:
- Name
- Source (where/who found)
- Definition
- ?
- ?
- Any metadata that helps you decide whether to analyze it
Sampling the Inventory
- Sample from all parts of the document type spectrum
- Sample more from heterogeneous categories
- Sample documents based on priority of requirements
- Sample based on importance or authoritativeness
Organizational Issues in Document Analysis
- Org charts can suggest business processes (and their associated documents), people who can tell us about them, and the context boundaries we can enforce
- The level at which you interact with an organization - the kinds of people you interact with - strongly shapes what you learn about it
- The concreteness of document analysis makes it more "bottom up" than business process analysis
Strategic Document Analysis
- Document Analysis IN a Strategic Effort:
- HP merges with Compaq and assesses how each side does business to decide what practices/ orgs/people should be retained
- One of the last phases of efforts like these is Document Analysis to ensure that the "keeper" processes of the merging firms are effectively combined
- Document Analysis AS a Strategic Effort:
- Analyze the information creation, management, processing, and distribution activities of an enterprise or organization to support the development of a data and process dictionary, an information architecture, or an enterprise data model
- Often the foundation activity for introducing a "content management" or "knowledge management" system
Information as a Strategic Asset
- Identify "overlaps, gaps, and opportunities" in alignment of information assets with goals of the enterprise
- Eliminate redundancy, identify what information must be collected that isn't, and that which might be
- Increase reuse
- Increase consistency
- Enable flexible creation of customized/personalized information products
- There will be lots of documents and data sets to analyze, but this kind of effort will be much less focused on these existing information artifacts than a tactical document analysis project is
Tactical Document Analysis
- Analyze the existing information used by some constrained set of processes in an enterprise so that the processes can be improved, automated, re-engineered, re-purposed
- Two most common tactical efforts:
- Document automation
- Online publishing
Document Automation
- Transforming printed transactional documents or forms into electronic versions
-
The business driver is often a "request" by a dominant business to its partner to automate the exchange of transactional information in conformance with its proprietary document specifications
- This means that the real goal can be to take an existing process (often, someone else's) and encode it in electronic documents
Online Publishing
- Creating an electronic version of a printed document
- CD-based documents (late 1980s-mid 1990s)
- Web has been the dominant medium since then
- Limited amount of "e-books" on various devices
- A "single-source" publishing strategy can be more strategic if it takes a comprehensive and end-to-end perspective on redesigning print-only publishing processes to take advantage of new formats and flexibility
"Myth of the Paperless Office" -- DanTech
- What were DanTech's motives for going paperless?
- What are the costs and benefits of co-locating people and the documents they create and use?
- How might a paperless office change the cost/benefit tradeoffs?
- What DanTech document types most easily became paperless? What document types resisted becoming paperless? What principles determine this?
- What happens to legacy information in the transition to a paperless office?
- How did DanTech hope to handle naming and classification of new documents? Why did the initial approach fail? Did the revised approach work any better?
- What were the benefits to DanTech of going (almost) paperless?
"Myth of the Paperless Office" -- UKCom
- What were UKCom's motives for going paperless?
- What UKCom document types most easily became paperless? What document types resisted becoming paperless?
- How did improved information access by Account Managers improve their processes? Why did they not reciprocate by sharing information with Bids and Sales?
- What's the key lesson to learn from UKCom's experience?
Readings for 12 March
- Document Engineering Chapter 9 ("Analyzing Business Processes")
- "BPM Process Patterns: Repeatable Designs for BPM Process Models" (January 2006)