Search Engine Usage
Mike Chen
Jennie Dal Busco
Kim Garrett
Anoop Sinha
October 31, 2000
For:
IS 271 Instructor:
Dr. Rashmi Sinha
Address inquiries to: | Mike
Chen (mikechen@eecs.berkeley.edu) Jennie Dal Busco (dalbusco@hotbot.com) Kim Garrett (kimg@sims.berkeley.edu) Anoop Sinha (aks@eecs.berkeley.edu) |
School
of Information Management and Systems University of California, Berkeley 102 South Hall Berkeley, CA 94720 |
The purpose of this survey project is to further public research regarding Internet search engine usage. The goal of this survey is to determine:
We translated these research questions into a web-based survey that we deployed using the NetRaker system. We specifically targeted information technology savvy individuals in distributing the survey. The survey included:
Our results showed:
Web search engines are essential to web users because they provide an easy way to locate information on the overwhelming amount of information available. A useful search engine saves users a lot of time and effort. Many search engines are now part of portals to drive users to their sites, so having a popular search engine is valuable
There is a distinct lack of quantitative research reviewing the search engine industry at large. A quick count of the major search engines shows that there are many more than 30 of them; however, a handful of them take up the lion's share of the market. It is not well known what features users look for when choosing search engines. The only research of this kind comes from NPD, a marketing research company which conducts a survey on behalf of subscribing search engines every quarter. This study, the "Search and Portal Tracking Survey", is described by self-proclaimed search engine expert Danny Sullivan as "The most extensive survey of search engine user satisfaction available."
However, aside from Danny Sullivan's "Search Engine Watch" website and the press releases of companies who are happy with their results (e.g., Google, and Iwon.com,this information is restricted to the search engines which have paid for the research, and a more limited access is granted to the press. In short, with only one survey reviewing the industry at large, and only of a very limited "group" of search engines, very little is known about the industry. Further, given the proprietary and confidential nature of the results, even less is known by the general public
The crucial question, then, as this: how much do we know about what people are looking for in a search engine? Reviewing the NPD survey results, we find that users are not satisfactorily queried about the various features on search engines. Is email essential? Is the ability to structure a query in various ways important? Are contests key? None of this is known, or can be derived directly from NPD.
Further,
what are the most popular search engines for search? Moving toward other industry
overviews, we find that both MediaMetrix
and Nielsen
Media's "Net Ratings" (Top 10 Web Properties, September 2000) one can
pull out the search engines and come up with a reasonable estimate on the
"top" search engines. The question, however, remains - why do these search
engines dominate? What are people looking for in a search engine? For a new
entity looking to enter the search engine field: What features are essential
to the success of a search engine? What features help create a "favorite"
search site apart from the rest? With an eye toward improving search engines,
we examine users preferences regarding search engine features.
This
survey was exploratory in nature. The primary goal was to examine user preference
for search engine features, as well as the relationship between search engine
preference and feature preference.
To
enable the comparison of respondents, the survey was primarily implemented
using multiple choice or check box answers, rather than open-ended or fill-in
answers. This also reduced the burden on participants and increased the "codability"
of the results. Initial pilots were performed on paper, while the final iteration
and ultimate deployment were using a web-based method. The survey instrument
was designed in several iterations. Team members drafted the initial survey,
using our own experience and other search-related survey instruments and past
research results as sources. The survey was initially piloted with 3 participants,
on paper with a team member at hand to observe and answer questions. A subsequent
iteration was piloted online, to fine-tune the survey instrument as well as
test the web-based survey for problems. Pilot participants were not participants
in the full deployment, to avoid the possibility of introducing bias in the
results.
Both the subject matter of the survey (search engine usage) and its method
of deployment (web-based) required us to limit our sample to a population
that: (1) uses the Internet; (2) uses at least one search engine; and (3)
is accessible by email.
This project had access to very few resources, especially regarding the identification
and recruitment of participants. Participants were recruited by team members
from among a population of known individuals (colleagues, friends, family),
introducing the possibility of respondent bias. Thus, the generalizability
of the survey is low.
To avoid selection bias (beyond the preexisting personal relationship), we
attempted to recruit the entire population of personally-known, electronically-accessible
individuals (i.e., friends, colleagues, whose email address was known). The
sample can be characterized as relatively technically savvy, advanced users
of the Internet. Overall, this sample has been online longer, as well as more
frequently for longer periods of time, than the population at large, as we
will see when we examine the demographic data. As a result of the nature and
extent of the sampling method, these results can only be generalized to a
limited population.
Participants were contacted by email, with a link to the web-based survey.
(See the appendix for the recruitment/instructions text.) No compensation
or disclosure agreements were associated with participation in the survey,
and participants were assured of anonymity. Upon deployment, the survey instrument
was posted for 7 days before being closed to participants.
Of
a total of a sample size of approximately 100-200 , 64 participants responded
(32-64% response rate, depending upon size). The web-based survey was "hit"
86 times, and completed successfully 64 times. For the purposes of this analysis,
we assume each participant completed the survey only once.
Demographic Distribution
The demographic distribution of the respondents was very limited. Overall,
respondents were primarily male between the ages of 19 and 36, who use the
Internet an average of 4-5 hours per day and report having been online for
five years or more.
Distribution
of Gender among respondents |
Age distribution
among respondents |
An overwhelming majority of respondents report using the web for five years
or more. We chose the annual GVU study of Internet Usage as a reasonable measure
of the population for the purposes of comparison of our sample to the broader
population. Per the 10th GVU study[1],
only 37% of GVU respondents reported being on the Internet between 4 to 6
years, with nearly half the respondents reporting being online for less than
three years. Regarding hours of Internet use, approximately 34% of GVU respondents
reported using the Internet between 1-3 hours per day, and an additional 21%
reported using the Internet between 3-5 hours per day. Only 10% reported using
the Internet more than 5 hours per day. This contrasts sharply with our respondents,
of whom almost 36% report using the Internet more than five hours per day.
Even given the age of the GVU data (now more than 2 years old), and knowing
the source of the sample, this would tend to confirm that the sample is heavily
biased toward highly technical, experienced Internet users.
Length
of time using the web
|
Hours per
day using the Internet
|
Information seeking: usage of the Internet and search engines
The majority of respondents reported using the Internet to find information
every time or most of the time. On this aspect our sample is in keeping with
the GVU study, where 70% of respondents reported using the Internet to search
for information most of the time. The high likelihood that the Internet is
a first resort in information-seeking behavior is further evidence that catering
to users' feature preferences may result in both a more successful search
engine and more successful searches.
Use of the
Internet in
N=64StdDev=.71
|
Use
of search engines
|
Which search engines have you used in the last year?
Search engines used in the last year |
|||
Search Engine |
Count
of Responses |
%
of Total Responses |
%
of Total Respondents* |
About.com |
19 |
5.2 |
29.7 |
AltaVista |
45 |
12.3 |
70.3 |
AOL.com |
3 |
.8 |
4.7 |
AskJeeves |
51 |
14.0 |
79.7 |
Cha-Cha |
3 |
.8 |
4.7 |
DirectHit |
6 |
1.6 |
9.4 |
Dogpile |
8 |
2.2 |
12.5 |
Excite |
21 |
5.8 |
32.8 |
Go |
6 |
1.6 |
9.4 |
Go2Net |
5 |
1.4 |
7.8 |
Google |
58 |
15.9 |
90.6 |
GoTo |
4 |
1.1 |
6.3 |
HotBot |
20 |
5.5 |
31.3 |
Iwon.com |
1 |
.3 |
1.6 |
LookSmart |
3 |
.8 |
4.7 |
Lycos |
18 |
4.9 |
28.1 |
MSN |
8 |
2.2 |
12.5 |
Netscape |
13 |
3.6 |
20.3 |
NorthernLight |
9 |
2.5 |
14.1 |
Snap |
2 |
.5 |
3.1 |
WebCrawler |
7 |
1.9 |
10.9 |
Yahoo |
55 |
15.1 |
85.9 |
|
365 |
100.0 |
570.3 |
Search Engine Preference
One
of our hypotheses was that, of the multitude of search engines available,
a small handful held the greatest "mindshare". This hypothesis was
confirmed by our data. We asked respondents to identify their primary search
engine ("Q8: Please select a primary web search engine, one that you
would say that you primarily use when looking for information:" (multiple
check box options)). Of the 22 search engines listed in the survey, in addition
to an "Other/Fill in" option, a majority of respondents (89%) settled
on one of three engines as their primary engine: AltaVista, Google, and Yahoo!.
Primary
Search Engine
We sought to determine what factors played into the choice of a primary search
engine, including feature preference, habit/convenience, as well as how likely
users were to maintain a preference, once identified. We found similar aggregate
patterns among search engines used in the last year, favorite search engines,
and frequently used search engines. The distribution of most popular search
engines tracked across all these dimensions.
We were unsure about the (potentially confounding) role of habit or convenience
in primary search engine choice. The potential dissonance between frequently
used/primary engine and favorite engine was of particular concern. In piloting
the survey, we discovered a number of pilot participants who responded that
their primary search engine was not their favorite search engine, for a variety
of reasons (such as habit or convenience, or other technical constraints).
We chose to ask about both frequently used and favorite search engines to
determine whether or not a confounding variable, such as habit or convenience,
was acting on the relationship. We discovered there was a high positive correlation
(depending upon search engine, in the range of .701 to 1.0,significant at
the 0.01 level (2-tailed)) between a particular search engine being identified
as frequently used and also as favorite. (Correlational
Matrix) This correlation is also visible in the similar patterns of the
frequency distribution of the two questions.
Which search engines tend to be favorite? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Which search
engines are used most frequently? Q7:
Please select the web search engines that you use most frequently (select
up to 5):
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
*Respondents were permitted to report up to 5 search engines. |
We were interested in exploring which features attracted users to their primary search engine. We discovered that the following features play the most significant role in primary engine choice: ease of use and to find, accuracy, reliability, and speed. None of these are surprising, but it is interesting that "easy to find" ranks relatively high compared to other features.
Q12: What features
make you use your primary search engine more frequently than others (select
up to five):
Features and Primary Engine Choice | |||
Search Engine |
Count
of Responses |
%
of Total Responses |
%
of Total Respondents* |
Site is easy to use |
47 |
19.0 |
77.0 |
Site has accurate search results |
44 |
17.8 |
72.1 |
Site has timely, up-to-date
content and news |
4.9 |
19.7 |
|
Site is well-organized |
14 |
5.7 |
23.0 |
Site is easy to get to |
21 |
8.5 |
34.4 |
Security and privacy |
3 |
1.2 |
4.9 |
I can personalize the site,
tailoring it to my interests and needs |
.8 |
3.3 |
|
Site has a good reputation |
9 |
3.6 |
14.8 |
Site looks appealing |
7 |
2.8 |
11.5 |
Site allows me to communicate
with other people (email, messaging, chat, postcards |
.8 |
3.3 |
|
Site is fun |
4 |
1.6 |
6.6 |
Site is reliable |
28 |
11.3 |
45.9 |
Site allows me to search various
sources (multiple search engines, newsgroups, etc.) |
6 |
2.4 |
9.8 |
Site allows me to look for inform
in various ways (Boolean, keywords, questions) |
5.7 |
23.0 |
|
Site is fast |
34 |
13.8 |
55.7 |
Total responses |
247 |
100.0 |
404.9 |
Prompted by our findings regarding the primary engine attribute "easy
to find", the question of how users get to their primary search engine
became intriguing. We hypothesized that users might rely on bookmarks or default
home pages to access their primary search engine. However, it was interesting
to discover that users tended to access their primary search engine by typing
in the URL, over these other methods. This has implications for the "naming"
(URL identification) of search engines. In other words: the more memorable,
shorter, and easier to type, the better.
We were also interested in the "stickyness" of a search engine:
once a user has chosen a primary search engine, how likely is he or she to
continue to use it? We discovered that a significant proportion of respondents
(X%) reported using their primary engine for two years or more. We also discovered
a small positive correlation between the choice of primary search engine,
and when users reported first using their primary engine (0.203, Pearson correlation).
This may be expected, given (1) the "stickiness" factor, and (2)
that search engines launch (and launch advertising campaigns) at different
times.
It is interesting to note gradient differences among "first use"
between the three top primary engine subgroups: users of Yahoo!, Google, and
AltaVista. It appears that users who reported Alta Vista as their primary
engine first used the engine more than two years ago, indicating a certain
"stickyness", while primary users of Yahoo! reported first using
the engine between 1-3 years ago. Meanwhile, users who reported Google as
their primary search engine are shifted significantly along timeframe to more
recently in time, reporting first use between 0-1 years. This shift in adoption
is not necessarily due to Google's relative inception date (Google was founded
in 1998), but may rather be a result of a marketing campaign or other 'buzz'
later in the timeline.
Stickyness by Primary Engine Subgroups
Again, in an attempt to ascertain how users perceive their primary search
engines versus other engines, and to further explore the possibility of confounding
variables, we asked users to compare their primary engine to others they’ve
used. The majority rated their primary search engine as much better than other
search engines. Overall, users rated their primary search engine as somewhat
better than other search engines they've used, which corresponds to the "stickyness"
of a primary search engine, once chosen.
We also explored primary search engine choice along to the other demographic
dimensions, including experience on the web, gender, and age, but discovered
no significant correlation.
Differences between primary engine groups
Questions 13-30 in our survey rated, on Likert Scale 0-10, the specific importance
of different search engine aspects or features. The answers allowed us to
rank the features that our participants found most important. Furthermore,
we were able to divide our sample into subgroups based on primary search engine
and also gender, and get the rankings for each subgroup.
We learned that each of the different subgroups pulled from our sample has
generally similar rankings of priorities.
(The full data for the responses to these questions (Including Histograms)
is available at: http://www.cs.berkeley.edu/~aks/Sims271/AnoopOutput/ANOOP_OUTPUT.HTM
)
The summaries of the different subgroups are below:
Top
5 Priorities of Different Primary Search Engine Users
|
||||||||||||
All Respondents |
Google |
Yahoo |
AltaVista |
|||||||||
N=64 |
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Mean |
Std. Error |
Std. |
Site has accurate search results |
9.58 |
0.0966 |
0.77 |
9.84 |
0.0652 |
0.37 |
9.33 |
0.24 |
1.03 |
9.29 |
0.42 |
1.11 |
Site is fast |
8.78 |
0.15 |
1.23 |
8.72 |
0.22 |
1.25 |
9.11 |
0.24 |
1.02 |
8.86 |
0.34 |
0.9 |
Site is reliable |
8.59 |
0.21 |
1.67 |
8.44 |
0.36 |
2.05 |
8.94 |
0.26 |
1.11 |
8.86 |
0.46 |
1.21 |
Site is easy to use |
7.5 |
0.31 |
2.51 |
7.19 |
0.49 |
2.75 |
8.39 |
0.44 |
1.88 |
8.71 |
0.52 |
1.38 |
Site is well-organized |
7.03 |
0.33 |
2.63 |
6.62 |
0.49 |
2.78 |
7.94 |
0.5 |
2.13 |
8.29 |
0.52 |
1.38 |
Number of Respondents |
64 |
32 |
18 |
7 |
The three top
search engine participants match in the #1 and #2 responses. The top three responses
have low standard deviations, showing agreement among participants.
In total these
three search engines represent a large percentage of the total participants
(57/64 =
89%), and so these three search engines largely define the overall rankings.
Bottom
5 Priorities of Different Primary Search Engine Users
|
||||||||||||
All Cases |
Google Users |
Yahoo Users |
AltaVista Users |
|||||||||
|
Mean |
Std. Error |
Std. Deviation |
Mean |
Std. Error |
Std. Deviation |
Mean |
Std. Error |
Std. Deviation |
Mean |
Std. Error |
Std. Deviation |
Site is fun |
1.91 |
0.3 |
2.39 |
1.75 |
0.44 |
2.5 |
2.11 |
0.6 |
2.56 |
1.57 |
0.72 |
1.9 |
Site allows me to communicate with other people (email, messaging, chat,
postcards) |
1.5 |
0.31 |
2.51 |
0.81 |
0.23 |
1.31 |
2 |
0.49 |
2.06 |
0.57 |
0.43 |
1.13 |
Site makes me feel part of a community |
0.83 |
0.2 |
1.57 |
0.66 |
0.25 |
1.41 |
1.17 |
0.42 |
1.79 |
0.43 |
0.3 |
0.79 |
Site has contests or games I enjoy |
0.52 |
0.14 |
1.1 |
0.38 |
0.13 |
0.71 |
0.83 |
0.34 |
1.42 |
0.43 |
0.3 |
0.79 |
Site has prizes |
0.5 |
0.14 |
1.08 |
0.28 |
0.11 |
0.63 |
0.61 |
0.3 |
1.29 |
0.43 |
0.3 |
0.79 |
Number of respondents |
64 |
32 |
18 |
7 |
Not surprisingly, the bottom 5 priorities of these three search engine users include similar responses, with “Q21-Site has prizes”, “Q22-Site has contests or games I enjoy”, “Q19-Site makes me feel part of a community”, and “Q26-Site is fun” appearing for all three search engines. Lower responses have lower standard deviation, showing general agreement about these ranks.
Top
5 Priorities of Different Genders
|
||||||||||||
All Cases |
Google |
Yahoo |
AltaVista |
|||||||||
|
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Site is fun |
1.91 |
0.3 |
2.39 |
1.75 |
0.44 |
2.5 |
2.11 |
0.6 |
2.56 |
1.57 |
0.72 |
1.9 |
Site allows me to communicate with other
people (email, messaging, chat, postcards) |
1.5 |
0.31 |
2.51 |
0.81 |
0.23 |
1.31 |
2 |
0.49 |
2.06 |
0.57 |
0.43 |
1.13 |
Site makes me feel part of a community |
0.83 |
0.2 |
1.57 |
0.66 |
0.25 |
1.41 |
1.17 |
0.42 |
1.79 |
0.43 |
0.3 |
0.79 |
Site has contests or games I enjoy |
0.52 |
0.14 |
1.1 |
0.38 |
0.13 |
0.71 |
0.83 |
0.34 |
1.42 |
0.43 |
0.3 |
0.79 |
Site has prizes |
0.5 |
0.14 |
1.08 |
0.28 |
0.11 |
0.63 |
0.61 |
0.3 |
1.29 |
0.43 |
0.3 |
0.79 |
Number of Respondents |
64 |
32 |
18 |
7 |
Bottom
5 Priorities of Different Genders
|
|||||||||
All Respondents |
Male |
Female |
|||||||
|
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Mean |
Std. |
Std. |
Site has accurate search results |
9.58 |
0.0966 |
0.77 |
9.5 |
0.13 |
0.85 |
9.71 |
0.13 |
0.62 |
Site is fast |
8.78 |
0.15 |
1.23 |
8.67 |
0.17 |
1.05 |
8.96 |
0.3 |
1.49 |
Site is reliable |
8.59 |
0.21 |
1.67 |
8.38 |
0.28 |
1.79 |
8.96 |
0.29 |
1.4 |
Site is easy to use |
7.5 |
0.31 |
2.51 |
7.07 |
0.34 |
2.14 |
8.54 |
0.43 |
2.08 |
Site is well-organized |
7.03 |
0.33 |
2.63 |
6.88 |
0.4 |
2.55 |
7.71 |
0.59 |
2.88 |
Number of respondents |
64 |
40 |
24 |
In this exploratory survey, we sought to examine what users are seeking in
search engines, and which engines are most popular. As expected, we found
respondents cited a small handful of engines as primary, with three engines
(AltaVisa, Goggle, and Yahoo!) holding th
[1] Available at http://www.gvu.gatech.edu/user_surveys/survey-1998-10/