Copyright © 2005 Robert J. Glushko
Records and Knowledge Management (from 13 October)
The Data that Enterprises Manage
Data Management Challenges
Enterprise Information Integration
MRP and ERP
Business Intelligence
Data Warehouses
Business Dashboards
Records may be created on any physical media including:
Paper
Film (microfilm, photographic film, x-ray)
Disk (optical, magnetic, video, audio)
Tape (magnetic, video, audio)
The method of recording may be manual, mechanical, photographic, or a combination of these technologies
"Content and process management are inextricably linked via records management" (Barbero and Douglas)
When does content become a business record?
Retention requirements
Non-retention requirements
Purging requirements and purging authority
Electronic information systems
"If records...do not show the complete names of senders, addresses, and the date of transmission,users should take reasonable steps to preserve the mail envelope, distribution lists..."
Contractor records
"Unless contract provisions explicitly define the documentation to be provided to the agency, contractors are likely to create needed documentation as private property"
EXAMPLE: Until late 1980s when the government acquired "software" contractors would deliver only the object code and no documentation (the fix)
Oliver North used the White House email system to plan the illegal funding to the Contras in Nicaragua that had been explicitly prohibited by Congress.
To conceal his involvement North and his secretary Fawn Hall shredded all pertinent papers and deleted all relevant e-mail.
North didn't realize that e-mail was backed up, and the e-mail was used as evidence against North.
He was convicted of accepting an illegal gratuity, aiding and abetting in the obstruction of a congressional inquiry, and DESTRUCTION OF DOCUMENTS
Personal papers and files
Documentation of policy and decision making accomplished orally or electronically
Documentation of formal meetings
EXAMPLE: Cheney's "Energy Task Force" (http://www.judicialwatch.org/5309.shtml) did not have to disclose names of participants
Drafts and working files
"With the recent tumble in stock prices ... we want to reminding [sic] you of the CSFB document retention policy... "
"That means no notes, no drafts, no valuation analysis, no copies of the roadshow, no markups, no selling memos, no IBC or EVC memos, no internal memos."
"Note that if a lawsuit is instituted, our normal document retention policy is suspended and any cleaning of files is prohibited under the CSFB guidelines (since it constitutes the destruction of evidence)"
"We strongly suggest that before you leave for the holidays you should catch up on file cleaning."
The Sarbanes-Oxley Act of 2002 was enacted to curb corrupt business activities and fraudulent accounting practices like those of Enron and WorldCom.
SOX (aka Sarbox) requires firms to implement adequate internal control structures and procedures and attest to their effectiveness.
SOX requires sufficient auditing and traceability to relate the IT systems that carry out internal controls and the financial reporting process to the firm's financial statements
SOX also requires that firms disclose "material" information about their operations and financial situation in a timely and predictable manner ("trip wires") that trigger disclosure
So SOX is causing is causing increased spending in document and records management, security, business process management and document engineering as companies define, document, and automate the processes that are needed to run the company while enabling auditing and timely reporting
Standardization underway to develop an "Extensible Business Reporting Language (xbrl.org) and standard models for the auditing document types and their interrelationships EXAMPLE standard timesheet instance (http://www.gl.iphix.net/)
On 11 March 2005 someone stole a laptop from an office in Sproul Hall that contained personal information about 98,369 alumni, graduate students and past applicants
"For several years University of California systemwide policy and UC Berkeley campus policy have required that restricted information stored on portable equipment be protected to safeguard the data if the equipment is lost or stolen. Since fall 2004, the UC systemwide policy has required encryption of such portable data, and campus units are in the process of moving toward full compliance with this new policy."
(The data on the laptop was not encrypted and the office was not locked)
"Our challenge is not that we lack policies governing computer security and the safeguarding of sensitive information. Our policies are clear, and during the last fifteen months we have strengthened them. Our challenge is enforcing these policies, and specifically, rectifying the lack of clear lines of accountability, both personal and departmental."
(No university employees have been terminated or sanctioned as a result of this incident).
By coincidence, in March 2005 3 SIMS students were building a "Restricted Data Identification and Registry" application as their final project
The students presented their work to the Data Stewardship Council on April 14, 2005
Two of the students offered to work for the university during the summer of 2005 to transition and deploy the RDIR application
Much collective knowledge is embodied in a firm's people, systems, management techniques, history, and intellectual property like patents, copyrights, trademarks, brands, etc.
Very little of this appears on the firm's balance sheet
The goal of KM can be viewed as getting the tacit parts of this "intellectual capital" to be explicit
But employees have complex motivations for complying with or not complying with this corporate goal
Sharing solutions to customer problems
Facilitating collaboration
Locating people with relevant skills
Managing unstructured content
Providing greater access to existing information
Business process modeling and management
Improved efficiency to reduce costs to taxpayers
Improved services to taxpayers
Improved traceability and justification for controversial decisions
Efficiently satisfy FOIA requests
Information technologies can solve many of the problems of content and knowledge management but can also cause them
Information technology has radically transformed the nature of business so that every enterprise of significant size, regardless of industry, must treat content and knowledge management as a critical activity
Enterprise concerns are driven by internal goals like efficiency and core competency and also shaped by external factors like competition and compliance requirements
These concerns are moving up the company hierarchy; many firms have CIOs and increasing numbers have CPOs and CKOs.
In addition to the "end-to-end" processes of authoring, management, and delivery for content many enterprises have end-to-end data processes
Some of these data processes are separable from the content processes and others are intertwined (especially in e-commerce processes)
These internal processes also extend to other enterprises, but we'll focus on the internal ones today and deal with the external ones a week from now
Consolidation and preparation of data from one or more sources for the purposes of analysis or presentation.
Create a unified view of the {customer, supply chain, etc}
Get end-to-end visibility of business processes
Take different perspectives (from high level aggregation to resolving individual data anomalies or inconsistency)
Run the business more efficiently, make better decisions by combining and analyzing data from multiple sources
Business processes span multiple departments (or companies) business applications (run by separate departments)
These "silos" or "stovepipes" may have been created over time and not have been designed to share information with each other
Each of these systems has a specific purpose and a data model customized for that purpose - so these models may be incomplete or incompatible with respect to each other
Different systems may use different formats for nominally the same data items (Nov 14, 2002, 11/14/2002 and 11-14-02; 14/11/02 in Europe)
Furthermore, there may be significant semantic differences between data items with the same name ("shipping date" example in Edwards article)
Information can't be reliably exchanged between systems to integrate business processes or support decision making unless semantics are unambiguous
Semantic integration is the process by which this common semantic "data model" or "object model" is created
What's the most powerful semantic integration processor?
An existing customer calls a service representation to increase an order
The service representative must:
locate information about the customer
locate the existing order
determine if the order can be changed or whether a new order must be created
determine whether to accept the order based on the customer's payment history and credit
What information sources or applications must the service rep consult? How can it be done?
The need to consult multiple unintegrated applications to locate information to complete a business process
Recent study by Corizon:
66% of call center agents use three applications or more to serve customers on a typical call
27% use five or more
71% claim time is wasted on or after a call because of switching between different applications
53% admit that errors creep in when entering data into multiple systems
Portal applications replace the different interfaces to multiple systems with a single, user-friendly screen that accesses only the parts of a back-end system that the employee needs
Purpose is to create a unified experience with a "single sign-on"
You can think of this trying to recreate something like Yahoo for the enterprise (Intranet)
Nearly every major software vendor has created an enterprise portal solution that is an attempt to "up-sell" from the application server platform
Integration "by eye" is inadequate in situations with high transaction rates or complex data, and it is necessary for the applications to share data without human intervention
This requires true semantic unification of the underlying logic and content models, which may or may not be presented to the user as a single "composite application"
Making different applications share information has long been a substantial portion of the IT activities in many companies
Integration approaches are often labeled as A2A, EAI, B2B -- but these are broad labels and integration techniques are more varied than a small set of categories implies
One alternative "solution" to this integration problem is to replace the separate applications with a single enterprise-wide application that contains all the functionality needed by the company
MRP was the big business buzzword in the 1980s
MRP systems plan production, procurement, and distribution for an enterprise
For each of the products a company manufactures there is a "recipe" or "parts list" or "bill of materials" that lists the parts or components that go into it (and how many of each).
"Shopping lists" for each production cycle then sent to suppliers
BILL OF MATERIALS X SALES FORECAST = PARTS WE NEED INVENTORY ON HAND = PARTS WE ALREADY HAVE PARTS WE NEED - PARTS WE ALREADY HAVE = PARTS "SHOPPING LIST"
ERP was the big business buzzword in the early 1990s
ERP is a natural evolution of MRP that connects it to other key functions of an enterprise – it is the information "backbone" or "nervous system" of a big firm
ERP interconnects internal systems for manufacturing control, production planning, inventory, and procurement (scope of MRP) with accounting, finance, and personnel
ERP systems provide step-by-step guidance for the processes that aren't automated
ERP can enable substantial efficiencies and collaboration because each function can more readily see how it affects and is affected by the other
SAP dominates worldwide ERP market followed by Oracle; PeopleSoft used be #3
The end-to-end integration of ERP ties purchasing decisions to the organization(s) that makes them and imposes financial controls on all the processes
ERP eliminates waste in the production and distribution of goods and reduces excess and obsolete inventories
ERP can be used as a working model or simulation of the firm
But implementing ERP is hard and many ERP implementations fail (completely or partly)
Some implementation difficulty is architectural – ERP software evolved in the 1980s from single programs running on mainframes to distributed client-server architectures, but the various modules remained very tightly coupled to the core system
But most difficulty is intrinsic to the problems ERP tries to solve – automating and enforcing the business rules of an enterprise by integrating a company's legacy computing and computer-controlled operations
This usually involves changing business processes ... how people do their jobs ... and they resist
ERP and other enterprise systems contain the very granular and "live" operational data of the enterprise
ERP systems generate historical reports that are useful for long-term decision making, but don't enable ad hoc analysis of operations needed to make tactical decisions
So you need another set of your enterprise data organized in a data model optimized for asking questions rather than running your business
Great resource: dmreview.com
A data warehouse is a "subject-oriented, integrated, time-varying, non-volatile collection of data used in organizational decision making"
Data warehouses extract data from ERP systems and other related business software applications into a separate repository
It is common practice to "stage" data prior to merging it into a data warehouse with an "Extract, Transform, and Load" (ETL) application
Since the information won't change, denormalization to improve query performance is a common ETL process
The data model for the warehouse, designed to enable efficient ad hoc data analysis and reporting, is sometimes called a "hypercube"
A common term for the analysis done in a warehouse is online analytical processing or OLAP
The traditional ETL (Extract-Transform-Load) approach relies on proprietary ETL engines being deployed between sources and targets.
Relational databases are rapidly eliminating the ETL category by incorporating transformation functionality
So ETL is becoming ELT (Extract-Load-Transform), with all the complex processing of data occurring inside the database
A virtual warehouse is created "on demand" by centralizing and normalizing metadata about the data sources rather than the data itself.
The data is left in its original location and extracted only when needed, which makes more "real time" analysis
A metadata repository does not have to use XML, but XML-based solutions can leverage object databases to simplify data modeling and repository development activities and use standard XPath and XQuery in processing
The best data warehouse design and the most clever OLAP won't help the business if the analysis can't be understood by the decision makers
"Dashboards" combine information integration with information visualization to enhance the usability of business intelligence
A dashboard provides hierarchical views appropriate to different management levels and the means to "drill down" to find details
"Information Architecture Wiki: Information Architecture Defined"
"Globalization, Localization, Internationalization and Translation"
"Multilingual Web sites: Best practice, guidelines and architectures. Volume 1, through section 4.14"
"Faceted metadata for image search and browsing"