Website Access Logs: Hit Counts, Website Traffic & More

Counting "hits" is actually a notoriously unreliable way to determine who's actually roaming around your site. Do you know who your site's visitors are?

You have your Web site up and running, and following all the advice given earlier in this space, it’s a smashing success. Wonderful. You’re getting more hits off your site than Michael Jackson got off “Thriller,” but something seems to be … well, is it really telling you how many actual people are seeing your site?

Counting “hits” is actually a notoriously unreliable way to determine who’s actually roaming around your site. Why? Dana Noonan in “Making Sense of Web Usage Statistics” says that such counts can record only who comes to the server to download a page. Hundreds upon thousands of users, when accessing a Web page, are actually taking them off local or even browser caches — built specifically to ease traffic on servers. This screws up counts royally, since the server is never told that the pages were accessed. But they’re such an efficient way of managing Net traffic that, in the words of one commentator, “without them, the Web would have crashed long ago.”

So saying your site had 10,000 “hits” yesterday does not mean that 10,000 breathing human beings decided to read through your site. Some would go so far as to say that hit counts are not only worthless, but pernicious in their deception. Jeff Goldberg of Britain’s Cranfield University, one of the harshest critics of Web stats, says they’re useful only for giving administrators a sense of the actual load on the server, and shouldn’t be used by anyone else. Salon’s Scott Rosenberg agrees: “Valuable in setting technical benchmarks for server performance, hit counts are useless as any kind of realistic measure of Web traffic.”

Explaining why hit counts are still taken seriously, Rosenberg says that “in the novel, hype-driven world of Web publishing, we need standards. Vast sums of money — both stock-generated “play money” and real cash — are changing hands based on perceptions of how much traffic sites are garnering. So it’s no wonder people have begun to grab the nearest yardstick,” he says, however bent it may be.

“You probably remember ebullient Web hucksters touting astronomical ‘hit counts’ circa 1994 and 1995,” Rosenberg says, “as in ‘We’re getting millions of hits a day!’ Gullible TV and newspaper reporters who knew nothing about the Web would say, ‘What do you mean by hits?’ and, since the answer was too technical, they’d often just equate hits with visitors — contributing to enormously inflated public notions of Web usage in those early days.”

Is there anything better? Many would say the best available analysis of who’s looking at your Web site is access logs. Dana Noonan’s written what may well be regarded as one of the definitive, readily-available online treatments of the matter, and can be found at [http://www.piperinfo.com/pl01/usage.html#logs].

Basic Vocabulary

First, let’s get the relevant terms straight: Perversely, a hit is both “the easiest to grasp and most deceptive element in a log file,” Noonan says. It’s nothing more than a record of the access of a page or file. “Hits have gotten a bad rap,” Noonan says, “mostly because of the way log analysis programs are misused. Just because each page or image file accessed is recorded as a hit in the log, doesn’t mean each page or file should be used by the log analysis program.”

Actually, a hit is “a number recorded in the log files generated by Web server programs, which send pages to your computer,” explains Rosenberg. “A ‘hit’ gets generated for every file a Web server sends out — and the typical Web page includes anywhere from a handful to dozens of separate files.”

Page views, which track only the number of HTML files a site sent out are more closely correlated to the number of Web pages actually read, says Rosenberg. “Total page view counts for a day, week or month do tell you something about how much traffic a site has. But they don’t tell you much about how many visitors it has: 100,000 page views in a week could be 10 people each reading 10,000 pages, or 100,000 people each reading one page, or any variation in between.”

So sites began tracking computers’ IP numbers, in an effort to get a sense of how many unique visitors, or actual individual people, judging by computer numbers, are visiting. This works up to a point, Rosenberg says: “If you hook up to the Net by dialing a modem into a service provider, odds are good you are assigned a ‘dynamic IP number,’ which changes each time you dial up, so you might show up as 30 different ‘unique visitors’ to a site you visited daily for a month.”

A log file is simply a record of all activity on a given Web site. The different types of log files — access, error, referrer and agent — are discussed below.

Cookies and tokens are unique session IDs used by some commercial log analysis programs to track user activities. Generally a browser will accept these unless told not to, and they’re mostly benign, allowing users to return to a site where they left off.

The Log Types

Traffic logs: Records of who visited a specific Internet site and what they did while they were there. Intended for system administrators.

Access logs: The most important log by far. Produced by all Web servers, they record visits to Web pages. Most Web server logs are kept in or can be converted to common log file format, allowing statistics programs to analyze Web site activity. Typically, common log entries include:

  • Remote host name or IP number.
  • User-logname — often not implemented and replaced by “-“.
  • Authenticated-user — replaced by “-” if not an authenticated request.
  • Date and time.
  • Request from client.
  • HTTP status code returned to client.
  • Number of bytes sent.

Not commonly included but possible to add:

  • URL of the page that linked to the page if not sent by client.
  • Software making the request.

Common HTTP status codes include the following:

200 – OK, successful transmission. 302 – Redirection to new URL. 304 – Use local copy from cache. 4xx – Client error: syntax problem, unauthorized request, “not found.” 5xx – The dreaded server error.

Error logs. Frequently ignored. An important tool for Web management, the error log “tells the story of frustrated users,” Noonan says. Sometimes included under access log.

Referrer logs. Lists the site that a user came from before accessing a particular page. Can reveal sites linked to your pages, but can also be easily fooled. Many commercial Web tracking services and software depend on the referrer log or something like it to generate “intelligence” of user activities for marketing and ad placement.

Agent logs. Records the type of browser or client software used to access Web pages on a particular host. Again, can be easily gulled and is not to be trusted.

Interlude — Counters

A quick word about counters: Popular as they are among those who don’t have access to decent log analysis programs, not only do they not keep an accurate count of how many people “hit” your Web site, they may bypass normal caching precautions, leading to frequent site crashes and delayed loading. Goldberg sums up the usefulness of those “You Are the XXXth Person to Visit My Site” counters by saying “There are basically two ways to put them in your page: the wrong way and the very wrong way. The wrong way merely doesn’t work and will not be more useful than normal statistics. The very wrong way is counter-productive because it subverts the caching mechanism, which is not a good idea just to get statistics.”

The Usefulness of Access Logs

According to a paper recently prepared by the Online Computer Library Center, a nonprofit organization devoted to furthering the reliability of online information, “access logs track information such as a computer’s Internet Protocol address — or domain name, which pages a user accessed, and at what times. It is possible to tell how long a user visited a site and what path the user took through a site. One limitation on log analysis emerges when a user hits the ‘Back’ button on the browser to retrace steps. Access logs do not capture this action effectively, and thus the trail through a Web site is formed by clicking hyperlinks.”

Isn’t there any other way of getting good data on how popular your site actually is? In a word, no. Web “ratings services,” such as Media Metrix (which has already gobbled up competitor Relevant Knowledge), NetRatings and I/Pro are in their infancy and are still experimenting with methodologies; according to Rosenberg, any two will produce “wildly divergent ratings” for your site.

Access logs are good at revealing the number of hits vs. number of accesses, the number of United States hits vs. outside countries, and the paths of some users. Admittedly that’s not much to build a marketing strategy around, but it’s about the best you can get.

Like this? Share it with your network:

I need help with:

Got a Question?

Get personalized expert answers to your business questions – free.

Affiliate Disclosure: This post may contain affiliate links, meaning we get a commission if you decide to purchase something using one of our links at no extra cost to you.