iNaturalist System Architecture Proposal

I243 Document Engineering and Information Architecture
Prof. Bob Glushko
Final Project Proposal

Nate Agrin
Jessica Kline
Ken-ichi Ueda

Contents

  1. Introduction
  2. Document Exchange Analysis
    1. Methods
  3. Process Modeling
    1. Methods
  4. Data Modeling
    1. Methods
  5. Implementation Suggestions
    1. Methods

Introduction

In I213 (User Interface Design and Development) we are developing interfaces for iNaturalist.org, an online community where people interested in nature can record their observations, share and explore their data, and meet other naturalists. Our potential users exist in three broad categories: amateur naturalists, the general public, and researchers. Each group has its own set of service needs as well as methods to meet these needs. While we are only developing interfaces for select components of this application, in our Document Engineering Final Project we hope to model iNaturalist.org as a service-oriented application in its entirety. This will include analysis of existing document exchanges among potential users, process modeling, data modeling, and implementation suggestions.

Document Exchange Analysis

The data exchanged include taxonomic data, spatial data, life history data, narrative accounts of experience, and many others. Such exchanges can occur on paper, in email, on newsgroups, through web sites, among researchers, and through highly structured information exchange architectures. For example, the lay person may wish to know what kind of hummingbird he always sees in his garden, and discover through a field guide that the bird's purple throat and a green back suggest it is an Anna's Hummingbird. The naturalist might use a pencil and notebook to record that he saw a specific Anna's Hummingbird at the Berkeley Marina on the morning of March 18th, 2007. The researcher might query a database to retrieve the exact latitude and longitude of every Anna's hummingbird seen in Alameda County, California from 2000 to 2007. In addition to the amateur naturalists and lay people we know personally and have worked with on our user interface project, there are several research organizations we could involve in our analysis, including scientists at the College of Natural Resources, the Museum of Vertebrate Zoology, and the California Academy of Sciences.

Methods

MethodDeliverable
Interviews?
D-O-C-U-M-E-N-T analysis for different document typesD-O-C-U-M-E-N-T Table
Document Component HarvestingDocument Component Table
Find code setsList of applicable code sets, justifications

Process Modeling

All the exchanges observed in our analysis exist in context, including rules, constraints, and dependencies. By abstracting the processes by which our potential users currently exchange information, we can identify common patterns, important constraints, and constraints that may be unnecessary in the new context of iNaturalist, so that we can design a better system than those currently in use. Some potentially important process patterns include organism identification, observation data logging, and data retrieval and exploration.

Methods

MethodDeliverable
Process modelingUML sequence diagrams

Data Modeling

Data modeling our project requires use of a robust vocabulary capable of describing the 'What?', 'Where?', and 'When?' aspects of a biological observation. A standardized vocabulary may already exist which we can utilize, or we may be required to inspect our central processes in order to begin developing a new vocabulary. In order to chose a preexisting vocabulary or develop a new one, we must consider what types of data are collected by the users of our system, and what their expectations for this data are. This requires that we take a user-centered approach to data modeling, and consider interviewing a candidate group of users about their current data recording and handling processes. The core aspects of the project we will need to consider will include: organisms, their taxonomic identifications, general notes and special observations; location, namely the latitude and longitude, general place name or information relevant in a location context; and the date and time at which the observation occurred.

Methods

MethodDeliverable
Card SortingVocabularly list
Data component consolidationConsolidation table (see 4/11 lecture)
??Big UML diagram for data types

Implementation Suggestions

In order to implement our process model, we should consider using common document exchange models such as the RosettaNet? PIP libraries. While our system will have little to do with financial transactions, many similar patterns may be available in these libraries which we can modify to suit our needs. Defining a biological observation vocabulary on our own may be difficult, however preexisting vocabularies, such as DarwinCore? (http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome) may be adequate for our purposes. The final conceptual implementation of the process and data models should remain abstract enough to guide the development of any sort of physical implementation. It is suggested, therefore, that we focus on utilizing tools such as XML which allow for schema development over concrete implementation development throughout our working process.

Methods

MethodDeliverable
Service explorationReport of available services
Scenario revision based on servicesUML sequence diagram
Whole system architecture modelBig UML model for all system components, how they interact