Blogs

Shazam vs. Midomi

Jessica did a wonderful job with the Shazam / Midomi demo.

For those intertested, here's a good article that compares the feature set of Shazam vs. Midomi...

http://www.theiphoneblog.com/2008/11/05/app-app-shazam-midomi/

Speaking of tools to summarize content

Something mentioned in passing near the end of today's lecture reminded me of something I saw demonstrated at an Apple keynote a decade or so ago. The code name was V-Twin (technically the Apple Information Access Toolkit); it allowed you to paste in text and then use a slider to whittle down the text all the way to "key terms". It even showed the results live as you moved the slider -- pretty cool.

The magic behind it? Just what we've been learning: tf, idf, dimensionality reduction... .

Computer Essay Graders vs. Human Essay Graders

Whoops, meant to post this when we were talking about LSA. As I mentioned in class, my mom's occasionally graded papers for various standardized testing services. She once mentioned to me that most of the guidelines for the human readers actually focused on structure, not content — things like, "Does this essay have a topic sentence? Are there supporting details?

Through the Google goggles (reloaded)

Google realizes that its search algorithm is far from perfect. It seems that even the "politically incorrect" opinions could be able to show up in the first places out of Google control. If you search for words such as "jew" or "obama" you may find disturbing results and images.

"Education is not filling a bucket, but lighting a fire."

Turns out this quote seems to have been misquoted all along. It's a couple thousand years older than that...

"For the mind does not require filling like a bottle, but rather, like wood, it only requires kindling to create in it an impulse to think independently and an ardent desire for the truth." —Plutarch, "On Listening to Lectures". (link)

And a closer translation from Penguin Classics:

The star-crossed relationship between IO and IR

I find it ironic that while the key to good IO is separating the content from the presentation, it seems that conversely, the key to good IR is combining the content with the presentation. That explains why it's so hard and frustrating to make IO and IR work in harmony for documents in the wild - if Shakespeare were alive today, this is what he would be writing about. :-)

The Palin Indices

Sarah Palin's Book, Going Rogue, came out last week but the blogosphere quickly noticed one missing element: an index. Not willing to give up a golden opportunity to carefully evaluate the content of Ms. Palin's, several blogs/pubs created their own indexes for Mrs. Palin's masterpiece, including Huffington Post, The New Republic, and Slate.

i202 Assignment 7 Helper

http://bit.ly/202helper

Developed by Karen (2010), Julian (2011), Satish (2011)

speedi.ly, a quick classifier

Just announced: speedi.ly, a real-time text/URL classifier. The API isn't available yet, but you can play with it via the web site. It looks like it's based on vector analysis against a set of generic topic documents, and could be handy for spring semester projects. I just threw the first page of the 202 blog at it with the following result:

Calling All Categorizers

Just heard about this project, the total scope of which seems outside of 202 (but is still fascincating): the MediaBugs project (http://www.pbs.org/idealab/2009/11/how-do-we-categorize-all-journalistic...) hopes to be a fact- and system-checking process similar to bug tracking in software development.

The most 202-ish aspect of this is their call for help in categorization:

IR for left-brain vs. right-brain people

 "One of the challenges of info pros has been to use the structured information-retrieving and -filtering tools, which really do require sequential, left-brained thinking, while simultaneously thinking creatively and intuitively about the entire spectrum of information sources and features, which requires right-brained analysis. It sort of feels like I'm trying to solve a quadratic equation while playing the piano."

Assumption of Democratic Discourse as Goal for Searching

When reading this article I had a hard time with the author's assumption that search-engine users should be engaging in "deliberative debate" by default. Many searches are conducted with the simple goal of answering a question, or getting very general information on a topic. If I search for "flowers," I'm asking about a very, very big topic. We've previously learned that users rarely pass the first page of results. In the ten hits my search engine returns, should several of them be devoted to fringe controversies involving flowers? I think not.

Taxing Tobacco

Tobacco companies are avoiding hundreds of millions of dollars a year in taxes by altering categories. Please refer to following link for further information:

Link: http://www.huffingtonpost.com/2009/11/17/tobacco-companies-using-l_n_360...

 

Doesn't it look similar to "potato chips" case? (reading for L2)

 

- Dhawal

"Unfriend" Wins Top Word of the Year

The Oxford New American Dictionary officials, in an armchair moment, captures some of our popular words in 2009.  "Twitterisms," unsurprisingly is a 'notable word cluster' but fails to win the top word.  Sure, language isn't static, but who should be the authority to rank 'tweeting' above 'unfriend'?  *grumble*

Top Word of 2009: Unfriend, But Twitterisms Abound

-joan

It looks like Hemingway won't be going to Oxford

A recent article from the BBC, "Great Writers 'Fail' Online Test" notes how famous literary pieces by the likes of Winston Churchill and Ernest Hemingway scored poorly when they were graded by computers. The article doesn't get into the complexities of analyzing the computer's grading system, but it does mention that the automatic grading system cannot pick up on human emotion and language subtleties.&#16