On DEVONthink

And now, a word about DEVONthink. They describe it as a "smart information assistant", although you could think of it as a sort of MyResearchBits.

On its face, DEVONthink is a versatile database that can store and retrieve just about any type of data available: PDFs, web clippings, emails, Excel documents, bookmarks, multimedia, RSS feeds, etc. At this level, it's similar to a number of related products. The real value comes in the content analysis functions that are applied to everything you throw at it.

At the heart of every DEVONthink database lies an index of its contents, which is used for both information organization and retrieval functions. Helpfully, DEVONthink exposes much of this index, and you can see the vocabulary of a given document:

Vocabulary.png

...as well for the entire database:

Concordance1.png

My guess is that DEVONthink utilizes vector or probabilistic models, but it would take more digging to find out.

Information Organization

On the IO side, an auto classify feature performs clustering analysis and attempts to identify the folder(s) where the document is best suited. As long as you have a couple similar documents in a given folder, this function works admirably. Double-clicking on a folder moves the document to that location, making it easy to file your documents quickly.

Classify.png

Replication allows you to store a document in multiple places, or replicate an entire folder. This is helpful for training the classifier as well as facilitating logical retrieval.

That said, you don't need to organize your information if you don't want to—the other IR functions work as well on a flat structure as a well-organized hierarchy.

Information Retrieval

Perhaps the most unique feature of DEVONthink, the "See Also" bar displays a rank-weighted list of documents related to the current one. By surfacing documents you may not have thought as relevant, this can facilitate serendipity in research.

For this document on GM potatoes, for instance, DEVONthink returns a number of related articles I've saved, including one on Peruvian potato farmer, another document on how genetic modification is transforming agriculture in Europe, and one on a certain incident in which Pringles are ruled as potatoes:

SeeAlso.png

Another example: the previously pictured article on a fatherless baby shark is suggested as a candidate for my folder on Slaughter-house Five notes. No link was immediately apparent, so I glanced through those notes and found the following quote about the seven Earthling sexes:

There were five sexes on Tralfamadore, each of them performing a step necessary in the creation of a new individual. They looked identical to Billy—because their sex differences were all in the fourth dimension... The Tralfamadorians tried to give Billy clues that would help him imagine sex in the invisible dimension. They told him that there could be no Earthling babies without male homosexuals. There could be babies without female homosexuals. There couldn’t be babies without women over sixty-five years old. There could be babies without men over sixty-five. There couldn’t be babies without other babies who had lived an hour or less after birth. And so on.

While this serendipitous insight may be of limited academic value, I can say with reasonable assurance that I wouldn't have thought of the Tralfamadorians while researching virgin births in the local shark population. So I'll chalk that up as useful.

Another valuable feature is advanced search operations:

  • Strict vs fuzzy search (fuzzy search returns near-misspellings, word variants, etc)
  • Regex-style wildcards
  • Boolean operators (e.g., a AND b; a XOR b; NOT b)
  • a NEAR b
  • a BEFORE b
  • etc
  • DEVONthink-Search


Personal use

I already store most of my important content to DEVONthink databases. Ultimately, though, I want to roll everything into DEVONthink through the main services I use to handle links:

  • Diigo/Delicious (partially facilitated through Delicious import scripts)
  • Instapaper (partially facilitated through Instapaper RSS)
  • Twitter (my posted links—need to script this)
  • Twitter (favorited items, which usually indicate a) things I want to read later, or b) occasionally things I actually like. I'll need to script this at some point.)

 

Limitations

Perusing the concordance surfaces insights about some problems that can arise when dealing with bad document content. The highest-weighted words typically arise from problems in PDFs: when words are smashed together in a PDF, they are interpreted as a megaword that is a) extremely rare and b) extremely long (hence the high weighting), but also also c) extremely useless. Searching for "the further out a point is situated" fails to retrieve the needed document, although "thefur" retrieves it instantly.

This is, of course, an issue of bad data, not a bad indexing system—but bad data is surfaced when using the tool. PerhapsasmarterIO/IRenginecould identifyjumbledwordsandparsethemintosomethingmoreuseful.

Another limitation occurs when dealing with long documents. Including an entire book, for instance, makes it difficult for DEVONthink to identify relevant portions of the text. Author Steven Berlin Johnson addresses this by chunking notes and quotes into smaller segments (more from him here); I usually just exclude extremely long documents from the classifier.