Author(s):
Bob Glushko
glushko@ischool.berkeley.edu
Course: Document Engineering and Information Architecture (INFO 243)
Date: 16 April 2008
Title: Assignment 6: Document Analysis
This assignment gives you practice at document analysis and in relating document analysis to business process analysis.
This assignment is due on Tuesday April 29 (no presentation... just turn in your work). This is the second progress report for your course project. There will be one more short assignment due a week later and then your final presentation will be a week after that.
The ContextThe context in which you are doing the analysis is your course project. By now you've identified the most important actors (human or computational) and the most important transactions or information exchanges in which they engage. You might have identified RosettaNet PIPs or other process patterns that suggest the key information components that will interconnect the transactions. In this assignment you will focus on specifying those information components and the documents that can be composed from them.
Analyzing Information Sources and Harvesting Components
ACTIVITY 1:
Exactly which information sources you will analyze depends on your project, (and some of you have already started this analysis) but you should find at least three types of documents or information sources and analyze them in detail. Remember to have an inclusive notion of "document" type -- you might use an interview with a person to identify information components, and this counts as a type of information source.
For some of your document types, there is just one instance - the particular printed form that is currently in use. So there is only one thing to analyze. But for most of your document types, you should be able to find multiple instances (e.g., job descriptions, job applicant descriptions, graduation requirements, hotel descriptions, map APIs, etc.) and analyzing more than one of them will give you a more complete and robust model. In the ideal project you would follow the "law of diminishing returns" and continue analyzing instances of each type until you are no longer identifying new candidate components, but for this assignment you should try analyze at least three instances of each document type (consider that in the event calendar project, the team analyzed 23 different calendars).
For each of the three types of documents or information sources, create a separate "Harvest Table" for the content components in each instance you analyze. You can use a spreadsheet, HTML table, or any other mechanism you choose to organize the information. You should have as many columns of metadata as you need to understand the semantics of your components (use the checklists in the 4/14 and 4/16 lectures and in the Bloodworth paper for guidance), but at a minimum you should record:
I usually recommend that harvesting be done at the most granular level to identify the "primitive" content components. It is hard to avoid noticing that these contents are often found in aggregate or composite structures (like Address, which is a typical aggregate of Number, Street, City, etc). You might make a note about this in your table, but don't spend a lot of time with this because normally you don't deal with aggregates until after the Consolidation activity.
And to minimize the amount of "busywork," you are hereby instructed that you DON'T need to fill out each row completely for any "horizontal" component that isn't specific to the domain of this assignment (i.e., focus on the domain-specific semantics and don't worry about full documentation for "City," "PostalCode," etc.) unless it is absolutely essential for your processes to work effectively.
Consolidating Components
ACTIVITY 2 : Create a "Consolidated Table of Content Components" like those in the appendices to the Modeling SylViA paper or Figure 12-13 in the Document Engineering text for each of your three (or more) document types. Revise the definitions for any of the affected components to ensure that they are effective in bounding the new equivalence classes that you're creating by merging synonyms and splitting out homonyms. For each merger or split, please write a one sentence explanation -- e.g., point out the distinction that was explicit in the harvest table that you are ignoring for synonyms or which was implicit and which you are emphasizing for homonyms. If you can, make the very coarse distinction between components that are essential or mandatory in the model and "everything else."
Standards and Code Sets
ACTIVITY 3: Identify and analyze any existing standards, domain-specific languages or code sets or other specifications that can suggest business rules or controls on possible values for information components. Explain what you found and your decision about whether or not to use this information in your consolidated model.
Identifying Additional Processes or Services
ACTIVITY 4: It is likely that as a result of this document analysis activity, you will have identified some processes or services that can make use of the information components in your domain in ways you'd hadn't yet considered. You should give each process or service a short name and a one-sentence description.
ACTIVITY 5: Identifying new processes usually goes hand-in-hand with revising existing ones. So you'll need to reconcile the list of processes or transactions you've developed to this point. For each of the processes that you now have left, identify an existing or potential document type that would "package" the information produced by the process. This should leave you with a coherent view of how documents and processes fit together and get you ready to assemble your documents, which you'll do in the last assignment.