John Doe 2 and John Doe 3, UC Berkeley

Have you used IMG?
Yes, they both have, John Doe 2 has been to two tutorials, John Doe 3 to one. Both are signed up for another one in April. John Doe 2 is very familiar with IMG in general, thinks it's a cool tool. John Doe 3 has used it less.

What is your research about?
They are focusing on fungi, so they are in the eukaryote business, which is currently underrepresented in IMG. John Doe 2 is interested in gene evolution and intron evolution. He is interested in looking at multiple events in the evolution of a genome. He finds IMG centered on individual genes. He likes the Vista links for viewing conservation plots for an organism of interest and others in the Vista system. James Galligan at Broad has a nice way of looking at this kind of info on a global level. Right now, seeing conservation of a whole chromosome is difficult with Vista. John Doe 3 noted that the orientation of genomics is changing away from single-gene analysis. He is interested in filamentous fungi. There aren't many fungi in IMG, but about 50 filamentous fungi have been sequenced. Also, gene structure info isn't there for eukaryotes. (Eukaryotic genes have a more complex structure than prokaryotic genes, and since most microbes are prokaryotes, IMG represents genes in a simplified way.) John Doe 3 mentioned a possible collaboration between JGI and UC. John Doe 2 and a fellow named Jason from Duke could maybe collaborate with JGI. Jason is president of BioPerl, has written a lot of tools.

How do you keep track of information about genes?
John Doe 2 works in multiple windows on the computer, no paper notes or other references. References are all on web sites. He'll take notes in another electronic document while working in the browser. He doesn't feel he has everything he could want in IMG, it would be great to get a list of relevant pubmed publications with links.

What role does manual annotation play in your work?
John Doe 2 is just getting into annotation. He uses the Berkeley Phylogenomic Tools and is doing annotation work for Kimmen*, who is developing them. With the BPTs, he gets phylogenetic trees with annotation info in them. Both John Doe 2 and John Doe 3 are fans of phylogenomics. John Doe 2 says "the goal is to get the most accurate information possible". Simple homology-based searches aren't enough for him. He runs phylogenetic analyses, looks at PFAM and SWISS-PROT (curated databases of proteins). John Doe 3 complained about the delays in getting things changed in GenBank, because only the person who submitted the data can change it. "That isn't right". John Doe 2 would like to be able to throw in extra notes. John Doe 3 points out that it would be best to leave everything in there, the whole history. John Doe 2 thinks IMG is very useful, everything appears useful and easy to use, especially for people without a lot of experience with bioinformatics tools.

Walking through an annotation:
John Doe 2 pulled up a gene of interest by using the existing IMG gene search. He chose his organism of interest from a pulldown menu that gave him trouble because the order was odd (alpha but grouped by domain of life). He noted that it would be good to be able to annotate with ESTs as evidence. He pulled up the list of homologs and looked for the ones with high similarity scores and selected those. He added those to his "gene cart" (a storage area in IMG for collecting genes worth exploring). Next he did a multiple alignment from the gene cart page. He said that he would then write down ranked levels of conservation. e.g., "3,4,5, and 6 are longer than gene 1". It would be cool to be able to export the alignment and view a tree. Would also be cool to see the neighborhood, synteny around the gene of interest. He would want to verify that the sequence for the found match is a reasonable one for the protein he's about to annotate it as. Seeing a SWISS-PROT number helps him feel confident. It's also in a COG, and the description of the COG agrees with his possible annotation. The PFAM hit is there, too, so that helps. He says he would still hesitate and would want to apply evolutionary methods, see it on a tree for the gene family. He mentioned a seminal paper by Eisenberg that shows evolutionary methods to be superior in assigning gene function. The tools needed are ones that Kimmen has developed.

Which ontologies would you like to see represented?
John Doe 2 would like to see people agree on one, and GO is the most widely accepted.

What other features would you like to see?
John Doe 3 noted the similarity of the open annotation idea to an herbarium, where scientists each put a little note on a card that is kept with a sample over the years. Everyone's ideas remain together, so there is a history of thought about the sample. John Doe 2 would like to see annotations all attached to one gene entry. This solves the problem GenBank has with tracking errors. (Not everything is in one place.) The user has to be empowered by seeing the history and the sources. One can only rarely see which annotations are automated in GenBank (only for recent stuff). John Doe 2 doesn't trust other people's annotations on their own. He runs his own analysis to check. He would be very careful about making annotations or updating them. He does feel that any annotation is better than no annotation, though. John Doe 3 mentioned that he feels he needs to guard his reputation, so would be unlikely to post tentative annotations, even if marked as such. A link to the paper would be great.

John Doe 2 said if you're going to empower users, a couple more tools would help: dot plots (evaluates conservation between two genes of interest, shown as a plot of one gene against the other, shows repeats), Dotter could be thrown in. It would be great to be able to export a multiple-sequence alignment to your hard drive, as a multiple FASTA file, then have tools to manipulate that alignment, like a phylogenetic tree tool. Psi-BLAST would also be great for IMG in particular.

What info do you need to understand other people's annotations?
Really, you need the data, like a link to the paper.

How useful would it be to you to be able to make functional annotations without being able to do structural annotations?
It's great just to have functional. Structure is a nice addition but not necessary.

Would you be interested in using a collaborative annotation tool that allowed anyone who wanted to obtain a login if the system provided assessments of the validity of their annotations?
John Doe 2 says yes, there's a lot of talented people out there, as long as the data is tracked. John Doe 3 thinks as long as it's attributed.

We asked these guys several of the questions on the "annotators" list.

Do you collaborate with other people in the annotation process?
John Doe 2 usually has someone else look at his work to verify. He thinks most users would put things into the database with less strict checking. He would trust less what other users put in the database. John Doe 3 points out "you're only as good as your reputation", so you need to be careful not to borrow bad data.

How do you communicate about the annotation?
John Doe 2 says it's usually face to face, could be over email, sometimes has been email.

Would an online or email discussion be enough?
It would be good to include hyperlinks to all the relevant pages. Needs to be streamlined so that the other user doesn't need to duplicate the analysis.

Would you want to see tentative annotations?
Even a tentative annotation can give you some info about what's going on.

Would you want to limit info to yourself/a group?
When in a race to publish, John Doe 2 would want to limit access. John Doe 3 would want to restrict annotations that he wasn't sure of. John Doe 2 suggested having a draft version of the page. If you could save a version of the page and send that. People probably wouldn't want to publish anything they consider tentative. "Speculated function" might be a labeling option slightly better than "hypothetical protein"?

How much do you share with other research groups?
John Doe 2 has never been asked by people from outside groups for info. John Doe 3 has collaborated with people from other groups. John Doe 3 thinks people would be more worried about putting out bad annotations than that it would fall into the wrong hands, unless they are planning to publish the annotations.

Do you need to track the sources of all the data?
Absolutely.

Future usability test help? yes, both would be willing to help

 

*Kimmen Sjolander. Berkeley Phylogenomics Tools has links to papers, which would be great. Kimmen has great stuff and might be a good person to collaborate with. She might be willing to work with IMG developers in exchange for time on the cluster(?) (http://phylogenomics.berkeley.edu/cgi-bin/satchmo/input_satchmo.py )

Content goes here