IS 208

Statistical Analysis Assignment

The purpose of this assignment is to introduce you to the tools and logic of statistical analysis, specifically the use of descriptive statistics. The data consist of fifty observations taken randomly over a one-week period from the provider of an Internet search engine. The data measure five variables and are in two files on the course web page.
NOTES: The same data are in both files, one for the Excel part and one for the SPSS part; links are on the course download page. Both files are named "web_data"--the extensions are .xls and .sav, respectively. Due date: Monday, February 23, 2004.  This assignment is to be done on an individual basis.

The fields in the data set are:

 Variable # Description Format Range or Key 1 Start time nnnn 0000-2359 ("military time") 2 Length of time connected nn minutes using home page 3 User domain n 1 = com 2 = edu 3 = gov 4 = mil 5 = net 9 = other 4 Has user accepted "cookie" n 0 = no 1 = yes 5 Destination n 1 = Banner ad 2 = Ad #2 3 = Ad #3 4 = Ad #4 5 = Another page at this site 6 = Left this site 9 = Can’t tell from log

The search engine provider wants to develop a set of user profiles and summary statistics describing its users and their behavior. It believes that it can learn from an analysis of the logs and hopes this information will be useful in helping determine advertising rates and similar business issues.

Your assignment is to use Excel and SPSS to answer the following questions. While you are encouraged to attach parts of the SPPS output to document your answers, that is not sufficient. You are to write a brief memo to your supervisor summarizing your findings and responses. This memo should be clear, concise, easy to read, and in standard business English.

NOTES: First, use Excel for Basic Questions 1-4 and then compare its ease of use and output style to SPSS. Also, there is a second set of questions that are more "academic." You should attach an Appendix to your memo that answers these questions.

BASIC QUESTIONS:

1. What is the average length of time a user views the home page? (HINT: There may be more than one measure of "average;" use all that are appropriate.) What is the standard deviation?
2. What percentage of users is from each of the domains?
3. What percentage of users clicks through to the advertising?
4. How does usage vary by time of day, in detail? Specifically, what is the hour that has the most number of hits? (Look only at time at which connection is made; ignore connection time.)
5. How does usage vary by time of day, in general? The search service divides each day into three parts for reporting purposes:
1. Overnight: Midnight – 7:59 AM
Day: 8:00 AM – 4:59 PM
Evening: 5:00 PM – 11:59PM.

What is the breakdown of usage by day-part? (Again, look only at time at which connection is made; ignore connection time.)

6. Which type of user (by top-level domain) is most likely to accept a cookie? Which is least likely?
7. Which type of user is most likely to click through to an ad? Which is least likely?
8. Is the banner ad more "attractive" than the other ads? (I.e., are users more likely to click on a banner ad than on another ad?)