Breaking News

Course Description

This course introduces fundamental as well as applied computational techniques for collaborative and collective intelligence of group behaviors on the Internet. The emphasis of the course is on data mining and knowledge discovery of social interactions, signals and data that are the byproduct of social media services such as search engines, social network sites, blogs, micro-blogs, wikis, etc. The course topics include, but are not limited to: web data mining, knowledge discovery on the web, web analytics, web information retrieval, ranking algorithms, recommender systems, human computation, models and theories about social networks, large graph and link-based algorithms, social marketing, monetization of the web, security/privacy issues related to social computing, etc.

Course Information

Prerequisites

Basic computer science principles and skills. Python programming skills. Familiar with basic statistics, graph theory, and linear algebra.

Requirements

The coursework is composed of reading assignments, homework assignments, and a class project. The class project is the major component of the course; hence, you are expected to spend time to make a proposal, present the initial proposal, make a final presentation, and submit your codes and a final report.

Syllabus

Week Date Topics Tutorials
1 8/26 Introduction to Social Computing Python basics and APIs
2 9/2/11 Social Network Theory R Basics, set-up, basic operations
Ritesh Agrawal
3 9/9/11 Graph Theory and Mining Web Crawler with Python, basic concepts, issues, examples
Nate Murray
4 9/16/11 Community Detection
Lei Tang, Yahoo! Research
R Advanced, packages, graphics, statistics, machine learning, data sets, etc.
Ritesh Agrawal
5 9/23/11 Project Discussion Project Discussion
6 9/30/11 Learning and Learning to Rank
Jean-Francois Paiement and David Grangier, ATT Labs Research
GBDT and other learning
methods using R and Python
Jean-Francois Paiement and David Grangier
7 10/7/11 Sentiment Analysis and Opinion Mining
Bo Pang, Yahoo! Research
NLTK
8 10/14/11 Recommender Systems, Social Recommendation, Query Recommendation Recommender Systems
9 10/21/11 Social Media in Education
Bebo White, Stanford Linear Accelerator Lab
PageRank, HITS, etc.
10 10/28/11 Human Computation/Crowdsourcing Crowdsourcing/Human Computation
11 11/4/11 FaceBook
Lars Backstrom, Facebook
Midterm project updates
12 11/11/11 Public Holiday Public Holiday
13 11/18/11 Q&A, cQA, DeepQA, etc. Information Extraction
14 11/25/11 Public Holiday Public Holiday
15 12/2/11 Social Monetization Course Review
16 12/9/11 Wrap-up/Presentations Project Presentation

References

  1. H. Marmanis and D. Babenko, Algorithms of the Intelligent Web, 1st ed. Manning Publications, 2009.
  2. S. Alag, Collective Intelligence in Action, Pap/Dol. Manning Publications, 2008.
  3. L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Synthesis Lectures on Data Mining and Knowledge Discovery, vol. 2, pp. 1-137, Jan. 2010.
  4. D. J. Cook and L. B. Holder, Mining Graph Data, 1st ed. Wiley-Interscience, 2006. [2-hour loan from UCB Library]
  5. M. A. Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, 1st ed. O’Reilly Media, 2011. [Available via Safaribooksonline for Cal students]
  6. S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data, 1st ed. Morgan Kaufmann, 2002. [2-hour loan from UCB Library]
  7. T. G. Lewis, Network Science: Theory and Applications, 1st ed. Wiley, 2009. [2-hour loan from UCB Library]
  8. D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
  9. B. Pang and L. Lee, Opinion Mining and Sentiment Analysis (Foundations and Trend). Now Publishers Inc, 2008.
  10. C. M. Bishop, Pattern Recognition and Machine Learning, 1st ed. 2006. Corr. 2nd printing ed. Springer, 2007.
  11. T. Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, 1st ed. O’Reilly Media, 2007. [Available via Safaribooksonline for Cal students]
  12. J. Adler, R in a Nutshell: A Desktop Quick Reference, 1st ed. O’Reilly Media, 2010. [Available via Safaribooksonline for Cal students]
  13. B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd ed. Springer, 2011. [2-hour loan from UCB Library]

Notes

  1. O'Reilly books are freely available online for Berkeley students through safaribooksonline. You have to be on AirBear or VPN into campus in order to have access.