In January 2009, Matt Dabbs posted a list of the Top 25 Church of Christ blogs, based on the Alexa ranking of each site.
The following summer, we began working together to post an updated listing. We’ve worked together on the list ever since.
It became increasingly clear that Alexa doesn’t give reliable lists, and so in January 2010, Matt posted a list based on the average of Alexa rank and the number of Google Reader subscribers. It only makes sense that more RSS subscribers means the site is getting more traffic, but the outcome just didn’t seem right. Edward Fudge is a giant presence on the internet, and he was ranked 17. GraceConversation hadn’t had a post in 9 months and was ranked 19.
So we spent a few weeks looking for a better, fairer way to rank blogs. Eventually, we decided to go with the most solid data possible — the number of Page Views each site had in a given month. What measures traffic better than a direct count of traffic?
But there were a couple of problems. First, Al Maxey and Edward Fudge get most of their traffic via manual email. Page Views doesn’t measure the number of emails sent and read unless email is automated through WordPress or Feedburner. And there’s no way to know how many email subscribers actually read what they receive. We arbitrarily decided on a 67% read rate. (Edward and Al stay 1 and 2 even at a 1% read rate.)
Second, many readers now read through RSS (“Really Simple Syndication”) subscription, such as through Google Reader. The blogging software has no way to know how many people the RSS feed is distributed to.
Thus, we created the measure “Total Page Views” = Page Views in March (per WordPress or SiteMeter) + Posts in March X (Google Reader Subscribers + (0.67 X manual email readers)).
And there’s a third problem. While WordPress bloggers have an excellent facility for counting Page Views, other blogging software requires you to install an add on widget, such as SiteMeter. And so we posted notes asking readers to install SiteMeter in anticipation of the March count, and many did.
We were able to gather complete or partial data on over 40 sites. We eliminated sites without a substantial theological component.
One more note. WordPress blogs can distribute posts via Twitter, Facebook, and email, either through WordPress or Feedburner. Informal testing suggests that the WordPress Page View count includes all these feeds, other than the RSS feeds.
We posted the Top 25 list in a separate post. Here we consider the consequences of what we learned. And what we learned is that there’s not much out there that approximates Total Page Views.
Here’s a table of measures we tried and the resulting statistical correlation. “Correlation” compares how similar the two measures are. A correlation of 1.0 or -1.0 means the data sets are perfectly correlated. Correlations are considered statistically significant if they are strong enough to show a 95% confidence that the results did not happen by chance. The statistical confidence we have from any given correlation grows as the correlation (in this case, the Pearson R) approaches +/- 1.0.
The confidence level is shown below in parentheses. Anything between .000 and .050 is statistically significant. The higher this number gets, the less likely there is an actual correlation between two variables. For instance, Yahoo backlinks has a significance level of .772 with the Total variable. That means there is a 77% chance that there is no correlation between these two measures.
Correlations for our Total Page Views measure with the online tools:
Page Views: .676 (sig = .000)
Alexa: -.132 (sig = .431)
Google Page Rank: .262 (sig = .132)
Google Reader Subscribers: -.073 (sig = .658)
Yahoo Backlinks: 0.053 (sig = .772)
Quantcast Unique Visitors: 0.902 (sig = .404)
Quantcast Rank: 0.873 (sig = .000)
Compete Rank: -0.551 (sig = .141)
Now, the Quantcast Rank and Compete statistics had high correlations — but only 5 sites had Quantcast Ranking and only 10 had a score for Compete or Quantcast Unique Visitors. That makes them useless for our purposes until far more sites are indexed.
Interestingly, though, although Google Reader Subscribers (GRS) produces a near zero correlation, if we multiply by the number of posts for the month, we get a 0.988 correlation! That’s a remarkably good number — but once you get past the first few entries, the line isn’t smooth enough to allow for confident interpolation. That is, we can’t replace Total Page Views with GRS X Monthly Posts without shifting some bloggers far beyond their rightful placement.
The Lotka Curve
In looking at the curve created by the distribution of Total Page VIews, Jay realized that he’d seen that curve before — in Charles Murray’s book Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences, 800 B.C. to 1950. Murray had stumbled across the same curve when charting the impact of great scientists and artists on civilization. He later found the same curve to show up in baseball, golf, and other sports.
It’s called the Lotka Curve.
Alfred Lotka was an eminent biophysicist of the early 1900s. In the mid-1920s, he identified a pattern of publication in scientific journals. Roughly 60% of all people who publish only one journal article. The percentage of people who publish two articles is much smaller, and the percentage of people who publish more articles falls rapidly as the number of articles increases. This creates an L-shaped graph of the function mapping number of publications onto the percentage of contributors at that level.
The formula curve varies with the data set, but this data set give the classic formula: Total/Highest Total = 1 / (Rank squared). For example, for the third ranked blog, Total Page Views = Edward Fudge’s Total Page Views / 32. Now, to the mathematically inclined, that’s just an astounding result.
The Lotka Curve has been the subject of considerable scholarly inquiry. You see, it’s not the bell curve you’d expect. But golf fans will understand. If you chart putts per hole, driving distance, or pars made by PGA golfers in a given year, you’ll get something very close to a bell curve. But if you chart career major victories, you’ll get a Lotka Curve. Lots of men have won one major. Very few have won two. And far fewer still have won 10.
Murray gives a detailed discussion in chapter 6 of his book. The gist of it is that the curve is so steep because the accomplishment being measured is so difficult. And the difficulty means that very few people can do something so very hard so many times. Moreover, success is a result of multiple factors that must be combined over several years.
When we see Edward Fudge and Al Maxey not only at the top of the heap, but in dominating positions, we have to recognize that not only do they write very well, but they’ve been writing very well for a very long time, and they’ve been writing very well, for a very long time, very consistently. And they were among the very first (along with the greatly missed, late Cecil Hook) to see the value of the internet in teaching grace to the Churches of Christ.
In short, it’s not easy to match Edwards and Maxey. Edward has been posting since the 1990s. Al’s been publishing his Reflections since 2003. His debates go back at least 10 years. They’ve done great work for the Churches for a long time — and richly deserve their high rankings.
Here are charts from Excel showing the comparison between Total Page View, as we figured them, by blog according to ranking, and Alexa and Google Page Rank, the two most popular rating services. To make the numbers as comparable as possible, each set of numbers is expressed as a percentage of the maximum score in the range, except Alexa is 100% less that percentage, as Alexa scores are lower the higher the ranking.
Here’s a comparison with the Lotka Curve —
Alexa faired poorly. It didn’t even correlate with Total Page Views. (R=-.286, .113 sig). Alexa is not good when compared to real data. The negative correlation from Alexa is good, however, because Alexa goes down the better your rank. That is opposite the other measures. Alexa correlates best with Yahoo backlinks and also correlates significantly with Google Reader subscribers and Google Page Rank.
To be fair to the people at Alexa, their website notes that the accuracy of the ranking degrades as numbers get larger, and it’s clear from our data that the ranking has little value much beyond a score of 250,000.
Google Page Rank
Even more surprising than the Alexa results is how little correpondence there is between Google Page Rank and Total Page Views. The correlation is very poor for such a famous statistic.
Page Views is the same as Total Page Views except it excludes RSS and email readers. It has a high correlation with Total Page Views. Among the top 25 sites, no site would lose or gain more than one position if the RSS and email feeds were ignored. However, it may be that over time RSS reading of the blogs becomes more popular and so changes this outcome.
There’s a lot to criticize here. For example, no one knows how many email or RSS subscriptions via Google Reader are actually read. Google Reader is not the only RSS service. And being read and being influential are not necessarily the same thing.
Links to a blog would be a great indicator of how important someone’s work is — which is Google’s billion dollar theory. But links could be from work done 5 years ago. They don’t measure current influence. And Yahoo backlinks shows virtually no correlation at all, which is truly surprising.
Some Church of Christ forums have adopted a policy against clickable links — meaning their links to blocks don’t get measured by the services, and the forums are where a lot of influence occurs.
We are very confident that the top 25 are the top 25, and our confidence is higher toward the top of the list. Any method that doesn’t put Edward Fudge and Al Maxey at the top is fatally flawed.
None of the available online tools are nearly as accurate as Total Page Views, but they are unable to directly obtain the data that goes into that figure.
And so, which of the available online tools is the best gauge of influence? No online tool effectively measures influence or Total Page View traffic. Nor can you average several tools to get a good result (we tried).
This means we can’t update the list without obtaining the blog owners’ Page Views every few months. And it’s been a challenge to gather the data. Here’s how to make it easy for you and for us.
1. Whether you’re on WordPress, Blogger, or custom software, install SiteMeter. PLEASE. It’s free. We can then just go to your site, click the link, gather the data, and never have to bother you.
2. ClusterMaps is better than nothing, but it likely underreports by as much as 50%. SiteMeter is as accurate as a widget can be.
3. Seriously: change to WordPress. It gathers the data for you, allows you to distribute via Facebook and Twitter and email, and it’s free. You can take your old content and move it over.