Blogs

Shazam vs. Midomi

Jessica did a wonderful job with the Shazam / Midomi demo.

For those intertested, here's a good article that compares the feature set of Shazam vs. Midomi...

http://www.theiphoneblog.com/2008/11/05/app-app-shazam-midomi/

Speaking of tools to summarize content

Something mentioned in passing near the end of today's lecture reminded me of something I saw demonstrated at an Apple keynote a decade or so ago. The code name was V-Twin (technically the Apple Information Access Toolkit); it allowed you to paste in text and then use a slider to whittle down the text all the way to "key terms". It even showed the results live as you moved the slider -- pretty cool.

The magic behind it? Just what we've been learning: tf, idf, dimensionality reduction... .

Computer Essay Graders vs. Human Essay Graders

Whoops, meant to post this when we were talking about LSA. As I mentioned in class, my mom's occasionally graded papers for various standardized testing services. She once mentioned to me that most of the guidelines for the human readers actually focused on structure, not content — things like, "Does this essay have a topic sentence? Are there supporting details?

Through the Google goggles (reloaded)

Google realizes that its search algorithm is far from perfect. It seems that even the "politically incorrect" opinions could be able to show up in the first places out of Google control. If you search for words such as "jew" or "obama" you may find disturbing results and images.

"Education is not filling a bucket, but lighting a fire."

Turns out this quote seems to have been misquoted all along. It's a couple thousand years older than that...

"For the mind does not require filling like a bottle, but rather, like wood, it only requires kindling to create in it an impulse to think independently and an ardent desire for the truth." —Plutarch, "On Listening to Lectures". (link)

And a closer translation from Penguin Classics:

The star-crossed relationship between IO and IR

I find it ironic that while the key to good IO is separating the content from the presentation, it seems that conversely, the key to good IR is combining the content with the presentation. That explains why it's so hard and frustrating to make IO and IR work in harmony for documents in the wild - if Shakespeare were alive today, this is what he would be writing about. :-)

The Palin Indices

Sarah Palin's Book, Going Rogue, came out last week but the blogosphere quickly noticed one missing element: an index. Not willing to give up a golden opportunity to carefully evaluate the content of Ms. Palin's, several blogs/pubs created their own indexes for Mrs. Palin's masterpiece, including Huffington Post, The New Republic, and Slate.

i202 Assignment 7 Helper

http://bit.ly/202helper

Developed by Karen (2010), Julian (2011), Satish (2011)

speedi.ly, a quick classifier

Just announced: speedi.ly, a real-time text/URL classifier. The API isn't available yet, but you can play with it via the web site. It looks like it's based on vector analysis against a set of generic topic documents, and could be handy for spring semester projects. I just threw the first page of the 202 blog at it with the following result:

Calling All Categorizers

Just heard about this project, the total scope of which seems outside of 202 (but is still fascincating): the MediaBugs project (http://www.pbs.org/idealab/2009/11/how-do-we-categorize-all-journalistic...) hopes to be a fact- and system-checking process similar to bug tracking in software development.

The most 202-ish aspect of this is their call for help in categorization:

Syndicate content