Power Laws: What do the Top 100 Visited Sites of the Internet Reveal?

In his thesis “Through the Google Goggles: Sociopolitical Bias in Search Engine Design”, Alejandro M. Diaz’s writes:

“. . . after conducting a large-scale investigation, what Barabási instead found was that the distribution of links on the Web, rather than being “egalitarian” and “roughly equal,” actually follows a power law (i.e., Zipf or Pareto distribution). This means that a small number of pages—what are now called “hubs”—collect an enormous number of backlinks, while the vast majority of documents are linked to by few or no sites at all.” Referencing Googlearchy he adds that, “. . . more, “almost all prominent sites are run by long-established interest groups, by government entities, by corporations, or by traditional media outlets.”

Observing the top 100 sites by traffic according to the estimates at Quantcast (see lists below), this claim doesn’t quite hold up. Just over 1/3 of the sites in the top 100 don’t fall firmly into any of these last three categories. However, Nearly half of the top 100 sites are corporate entities. Factoring in the close relationships between theses companies suggests that there is a large consolidation of corporate interest represented. Traditional media sites trail at 13% and government-related sites barely make it onto the list.

The number of unique visits per month for Google is estimated at 155 million while TypePad is at 18 million. In other words, the number 1 probably gets around 9 times as many visitors as the number 100 site. One thing is clear: the power law is definitely in effect.

Some questions and observations emerge from this review:
  • How much are default browser page settings responsible for tipping that stats associated with a site?
  • There are notable topic redundancies in the list. For example, no fewer than 3 sites appear to be entirely devoted to reporting weather.
  • Oreck, a vacuum cleaning products company, quizzically clocks in at number 37.
  • Wikipedia, operating solely on donations and grants, has a remarkable ranking at number 7. Quite an accomplishment compared to the amount of capital behind every other site in the top 10.
Here's a breakdown of the three categories that Diaz cites above.

Government Entities (2%)
city-data.com (some government-supplied data)
nih.gov

Corporations (49%)
Categorized as a publicly traded company or having a parent company that is a corporation.
google.com
facebook.com
youtube.com
yahoo.com
twitter.com
amazon.com
live.com
microsoft.com
ebay.com
ask.com
bing.com
huffingtonpost.com (AOL)
answers.com
linkedin.com
aol.com
adobe.com
reference.com (ask.com)
go.com (Disney)
paypal.com
comcast.net
walmart.com
mapquest.com
godaddy.com
oreck.com
match.com
manta.com
att.com
windows.com
photobucket.com (FOX)
flickr.com
target.com
myspace.com
apple.com
chase.com
cnet.com
wellsfargo.com
comcast.com
bankofamerica.com
inbox.com
hp.com
monster.com
usps.com
careerbuilder.com
mtv.com
fandango.com
jcpenney.com
norton.com
netflix.com
bestbuy.com

Traditional Media Outlets (13%)
msn.com
weather.com
foxnews.com
about.com (NYTimes Company)
whitepages.com
cnn.com
nytimes.com
tmz.com
yellowpages.com
people.com
merriam-webster.com
nydailynews.com
usmagazine.com

Uncategorized (36%)
Defined as non-profits, privately held companies, or sites that don’t precisely fit the established three categories above.
wikipedia.org
blogspot.com (owned by Google)
blogger.com (owned by Google)
ehow.com
wordpress.com
craigslist.org (privately held)
tumblr.com
pandora.com
imdb.com
dailymotion.com
wikia.com
yelp.com
webmd.com
legacy.com
vimeo.com
hubpages.com
metrolyrics.com
squidoo.com
reddit.com
grindtv.com
drudgereport.com
coolmath-games.com
evite.com
urbandictionary.com
wunderground.com
howstuffworks.com
chacha.com
bleacherreport.com
twitpic.com
deviantart.com
weatherbug.com
zimbio.com
cafemom.com
buycheapr.com