IS 208

Statistical Analysis Assignment

The purpose of this assignment is to introduce you to the tools and logic of statistical analysis, specifically the use of descriptive statistics. The data consist of fifty observations taken randomly over a one-week period from the provider of an Internet search engine. The data measure five variables and are in two files on the course web page.
NOTES: The same data are in both files, one for the Excel part and one for the SPSS part; links are on the course download page. Both files are named "web_data"--the extensions are .xls and .sav, respectively. Due date: Monday, February 23, 2004.  This assignment is to be done on an individual basis.

The fields in the data set are:
 
Variable #
Description
Format
Range or Key
1
Start time nnnn 0000-2359 ("military time")
2
Length of time connected nn minutes using home page
3
User domain n 1 = com
2 = edu
3 = gov
4 = mil
5 = net
9 = other
4
Has user accepted "cookie" n 0 = no
1 = yes
5
Destination n 1 = Banner ad
2 = Ad #2
3 = Ad #3
4 = Ad #4
5 = Another page at this site
6 = Left this site
9 = Can’t tell from log

These data have been collected from the log files at an imaginary web site. Each time the home page is accessed, data are recorded that include the time at which the user was first sent the complete home page (reloads are ignored), the length of time that elapses between the time the home page has been sent and the user sends the command to leave (usually by clicking on an embedded link), the full address of the user (from which the top-level domain has been extracted), whether or not the user accepted the "cookie" sent by the server, and the "destination" to which the user went after leaving this page. The concept of "destination" is an evolving one; here we mean the next page or server that the user selected after downloading the home page, if it could be determined.

The search engine provider wants to develop a set of user profiles and summary statistics describing its users and their behavior. It believes that it can learn from an analysis of the logs and hopes this information will be useful in helping determine advertising rates and similar business issues.

Your assignment is to use Excel and SPSS to answer the following questions. While you are encouraged to attach parts of the SPPS output to document your answers, that is not sufficient. You are to write a brief memo to your supervisor summarizing your findings and responses. This memo should be clear, concise, easy to read, and in standard business English.

NOTES: First, use Excel for Basic Questions 1-4 and then compare its ease of use and output style to SPSS. Also, there is a second set of questions that are more "academic." You should attach an Appendix to your memo that answers these questions.

BASIC QUESTIONS:

  1. What is the average length of time a user views the home page? (HINT: There may be more than one measure of "average;" use all that are appropriate.) What is the standard deviation?
  2. What percentage of users is from each of the domains?
  3. What percentage of users clicks through to the advertising?
  4. How does usage vary by time of day, in detail? Specifically, what is the hour that has the most number of hits? (Look only at time at which connection is made; ignore connection time.)
  5. How does usage vary by time of day, in general? The search service divides each day into three parts for reporting purposes:
    1. Overnight: Midnight – 7:59 AM
      Day: 8:00 AM – 4:59 PM
      Evening: 5:00 PM – 11:59PM.

    What is the breakdown of usage by day-part? (Again, look only at time at which connection is made; ignore connection time.)

  6. Which type of user (by top-level domain) is most likely to accept a cookie? Which is least likely?
  7. Which type of user is most likely to click through to an ad? Which is least likely?
  8. Is the banner ad more "attractive" than the other ads? (I.e., are users more likely to click on a banner ad than on another ad?)
ADDITIONAL QUESTIONS:
  1. For which variables are the averages meaningful? Briefly, why?
  2. Why might you expect to find the mean for variable #2  not to equal the median?
  3. What additional information would you need to be able to reasonably predict the total usage of this site during the month from which the data were collected. (Think about what you have learned about statistics and sampling. Remember that we already know this was a truly random and representative sample.)