Data Science and Analytics
Thought Leaders

Assignment 1

Handed out: 1/23/2012, Due: 1/ 24/2012,

 

A. INFO296A-2: Thought Leaders in Data Science and Analytics Seminar

1. The Speaker List and Schedule are up on website

 http://courses.ischool.berkeley.edu/i296a-dsa/s12/

(We may have a few changes or adjustments a bit later).

 Firms represented by executives/researchers:

-  Internet/Social Networks: Google, Facebook, Yahoo, eBay, Claritics,

Salesforce.com, BlueKai Data Exchange

- Enterprise/Internet: TCS, SAP, Proctor&  Gamble, Deloitte, Wells

Fargo, Bank of America

- In process: Microsoft, AOL, Kaiser

 

2. Catching up on Data Mining:

i) PhDs: I will assume that you have some prior machine learning, data

mining, and/pr probability, statistics, etc.

 If you do  not, follow Masters guidance below.

 

ii) Masters:

- For detailed learning, please consider INFO290 Data Mining and

Analytics (Fri 1-4 pm)

http://courses.ischool.berkeley.edu/i290-dma/s12/doku.php

- - Review on your own the book by Tan, Steinbach, Kumar (PPTS, Weka

software etc.)

 http://www-users.cs.umn.edu/~kumar/dmbook/index.php

 

iii) Masters - Less detailed learning:

- Review on your own the book by Tan, Steinbach, Kumar (PPTS, Weka

software etc.)

 http://www-users.cs.umn.edu/~kumar/dmbook/index.php

and/or

- Attend first half of INFO290 data Mining and Analytics (clustering,

prediction, classification etc.)

and/or

- Contact Jimi Shanahan for assistance with basic material, code, etc.

 

3. Homework: Recall the requirement of weekly submission, due by

Tuesday, 11 pm,  of a given week

-  i) Bullet points on the talk just past, and/or related to the monthly

and reports you will submit, and building towards the logic and

conclusions of the report.

- ii) Other bullet points should cover: Interesting or highlght points

of lecture or discussion, what is new. Identify user need for analytics

(in a given area), the mix and emphasis of solutions based on: 1.

"Commodity" analytics 2. Novel combinations of " commodity" analytics.

 

4. development of new theory based on need. 4. New directions and

approaches.

 

5. New business models.

- iii) Answers a few simple questions Jimi will ask based on Lecture 1

 

B. INFO296A-3: Advanced Project Course on Data Science and Analytics

- Objective is to develop a research paper or methodology for a

conference or journal paper

- Please send email whether you would like to supplied a real world

problem from Silicon Valley or would you prefer to discuss a problem

which you have already considered

- Topics include data/text/image/video mining, event detection,

information retrieval, relevance feedback, active learning, recommender

systems, content-based retrieval, multimedia information extraction and

retrieval, optimization, stochastic dynamic programming and

reinforcement learning

- Please submit in a 2 para verbal description of your problem (unless

you would like one from industry)

- Please submit any modeling which looks reasonable, to start with, and

associated analysis; both the verbal model, the mathematical model, and

the mathematical analysis must be carefully thought out

- Please make sure that your inputs are via email to me, by Tuesday

night 10 pm. We will set up a Dropbox to share research papers etc.

 

ISchool 296A

Spring 2012