Log File Analysis

Every visit to a web site is recorded in a server log file. The log file keeps track of each file that is requested including web pages, images, cascading style sheets, favicons, robot exclusion files and even files that are requested that do not exist. The log file isn’t accessible to web site visitors but can be downloaded by the administrator and analysed.

Statistical analysis of the web server log file using software such as Webalizer and Analog is used to examine the web site visitor traffic patterns by date or referer. It can compare the popularity of pages and illustrate trends. A webmaster can interpret the web site statistics to understand visitor behaviour and work out where to make changes to a web site, such as prioritising popular pages for development.

The web server log file will usually contain a record of the requesting IP address, request date and time, page or file requested, HTTP code, bytes served, user agent and referer (the mispelling of the word referrer made it into the official specifications and has therefore become the industry standard).

The actual analysis of a log file provides an insight into the activity on a web site but is subject to many caveats. At a basic level you can count the number of times a page has been requested in a specific period and see whether this is more or less than a previous period. However, some Internet Service Providers make extensive use of proxy caches and re-serve the file to their customers. One page request from the server may therefore represent twenty or thirty requests from visitors resulting in under-reporting. Search engines may request the page so that it can be indexed. They may revisit the page time and time again to see if it has updated resulting in over-reporting. However, smarter search engines will check the file header information before downloading and avoid copying a file that hasn’t been changed.

When visiting a web page, the referer or referring page is the URL of the previous web page where a link was followed. Proxies and firewalls can block this information but where it is recorded it can provide other useful information such as search terms used on a search engine. For example a referer of http://www.google.co.uk/search?q=quotes+for+web+design tells us that the visitor came to the web site from google.co.uk and used the phrase ‘quotes for web design’ as the keyword phrase.

Although server log files do not collect user-specific information it is sometimes possible to identify users from the data. The data can be augmented by gathering up other environment information using services such as Google Analytics and all data recording activity should be disclosed in the web site privacy policy.

Raw log files are also used on a forensic level to identify unusual server activity and technical issues. However, with so many caveats to the way the log file data is recorded we usually recommend the use of Google Analytics to any customer that wants to find out how visitors are using their web site.

  • Valid HTML 4.01 Strict
  • Valid CSS!
  • Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0
  • Atom Feed Validation
  • News Feed
  • FeedBurner
  • Atom Feed
  • Site Map XML
  • GeoURL

http://www.quotes.co.uk/logfile.php
Last updated: Friday, 13th October 2006

All trademarks and registered trademarks shown on our web site are acknowledged
and are the property of their respective owners. © Quotes 1996-2008