Web-Scope(tm) Statistics




A Brief Note


Web-Scope(tm) statistics will soon be offered as a commercial service.

TLC Systems is a consulting firm, specializing in systems architecture and design. Our clients are Fortune 500 companies, mostly in the banking and brokerage areas. We began work on Web-Scope to showcase our design and problem-solving skills and to demonstrate our knowledge of the web site and web server environments.

We wanted to show the sort of statistics a commercial site, such as one owned by a Fortune 500 company, would require.

Since then, Web-Scope has evolved into the most complete set of web statistics available, with over 40 different reports covering every aspect of web site and web server operation.

Web-Scope is the only package that delivers web statistics and performance monitoring in real time. Other statistics packages calculate their numbers once a day, once a week, or average out everything that happened since the access log was last restarted.

Web-Scope provides meaningful information in a concise and readable format. The information gathering and reporting facilities can be tailored to match the many different requirements that are evolving on the web.

TLC Systems is now in possession of the technology behind the premier information package for web sites and servers. This technology can be applied to monitor and report on any web site or web server environment.

TLC Systems will be offering Web-Scope as a service. TLC will install, customize, maintain and modify to meet your individual requirements. TLC will handle all the aspects of gathering, analyzing, and presentation, thus freeing you to handle the other aspects of your web site.

The programs can be run locally on your server, or remotely on our server. Remote operation offers the advantage of having the minimum impact on your server environment.






Introduction


"There's information, there's misinformation, and then there's web statistics."
> Vincent van Gui


In the next few years, web sites will begin to play an increasingly important role in businesses and in organizations. The information (not the numbers) about the site will become an important corporate resource. It will be used internally for forecasting and planning, for getting feedback and supporting customers and users, and for troubleshooting.

The information about the web site will be viewed as a database rather than a set of computer logs.

For instance, the marketing department will use the statistics and demographics in selling the web based media, the same way they market other media.

The web statistics will also provide feedback about the effectiveness of advertising of the site in other media, changes to the design of the site, and for getting business-specific information from the site's visitors.

Information about the performance, errors, and possible problems will be needed to administer, manage, maintain, and troubleshoot web sites and web servers.


The Web-Scope(tm) package consists of three parts.

Data Capture
This portion builds additional logs that supplement the server's normal access log. The nature of these logs is determined by the individual customer's requirements. Web-Scope keeps as many as six additional logs to supplement the server's normal access and error logs. These extra logs gather visitor data and performance data for monitoring and troubleshooting the web site and the web server's hardware and software environment.

Data Analysis and Report Generation
This portion analyzes the data in the logs and extracts information into several different formats. Some of the analysis and report programs run every 15 minutes. The others run whenever that particular web page is requested.

Presentation
This portion displays the information in a web browser, as a spreadsheet (optionally as a graph), as a hard copy printout, or as text that can be moved into other client-based applications.


In addition to the reports currently displayed at the TLC Systems web site, there are six reports that are not shown because they contain data about the web server's environment.


We don't concentrate on GIFs, files, or "bytes transferred - by reversed subdomain", since this sort of data carries little meaning. Web-Scope statistics are visitor-based. That means that we try to figure things like:

How many people visited the site?

Where did they come from?

What sort of hardware and software are they using?

What did they look at?

How did they interact with the site?

Was there anything about the site that they liked or disliked?

What path did they take through the site?

Did they have any problems while visiting the site?

Did anyone try to cause harm to the site?

Are they bookmarking the site?

Are visitors coming to the site in a way that keeps them from being logged?

Who has links to the site, and to which pages?

Which search engines point to the site?

On search engines, which queries point to the site?

Was an external link to the site been removed or changed?

Has someone bootlegged part of this site, such as the images?

Can any of the statistics information be used to signal a trend?

Are there any problems with either the site or its logging mechanisms?

If the visitors are robots (Lycos, Webcrawler, etc.), what sort of information are they retrieving?


One of the key ideas of Web-Scope is the concept of seeing what's happening on the site -- right now. Most of the stats reports are recalculated every fifteen minutes. The other reports are interactive and are calculated at the moment you ask for them.

But we also try to give a sense of history -- showing figures for each hour or each day over a period of the last several days or weeks. This helps to spot trends.


The statistics programs are parameterized, with a central parameter file.

Instead of one monolithic program, Web-Scope is a suite of small programs, each performing one function or preparing a report or set of reports. This allows Web-Scope to be tailored to meet each customer's requirements. The report information is presented by a set of HTML programs that allow the reports to be customized for viewing with a web browser.

We are designing an interface which will allow interactive queries against the logs, viewed as an indexed database. This will allow rapid access to any date or time period. This will allow rolling back the information to see an earlier date as if it were today.

Web-Scope's 'search' feature, which lets you quickly locate information across all the log files, is another example of a query function.


Another unique feature of Web-Scope is the use of a fixed sample size.

The sample size is always a power of ten, which gives the percentage of each item without additional calculations. For instance, if an item has a count of 73 and the sample size is 1000, the percentage is 7.3%. This eliminates the clutter of extra numbers which carry little additional information.

Uniform sampling size also guarantees consistency and makes it easier to spot trends or potential problems.

Web-Scope does not present its reports in the form of "Netscape tables." We have found them unsuitable for this application. We have also avoided the use of brightly colored, 3-D bargraphs. While attractive, most contain little if any information. (Web-Scope data is available in spreadsheet form, allowing the users to create their own graphs. This gives more flexibility than the canned, fixed-format graphics presented by other stats programs.)

Web-Scope's reports are well-formatted and concise. We try to optimize the reports for viewing with a web browser or printing for hard copy. We use preformatted text, which allows a report to be selected and copied directly from a browser screen, and then pasted into another document or an email message.

We consolidate information. For instance: Instead of dozens of numbers for AOL, Compuserve, or other large sites (one for each subdomain or proxy server), there will be one total for each major domain or service.

Web-Scope's reports can be generated in a format that allows them to be downloaded directly into a PC spreadsheet by simply clicking on a web page's link. You can write your own macros to do custom calculations on the data, plot a graph or chart, or load the data into your own database.

Web-Scope can also provide statistics for monitoring the performance and well being of the web server itself. These will help to point out problems and help to isolate their cause. The statistics can also be used to tune the server for the maximum performance.

'Reliability' is a word you don't hear very often when discussing web servers. Our stats will help to achieve web service you can depend on.

TLC Systems is working on the technology for gathering statistics about the next generation of web sites -- secure servers which require authorization, sites that handle financial transactions, sites with attachments to other corporate resources such as databases, sites where some of the processing is done on the web server itself, and so forth. They will need a new generation of statistics and monitoring tools, designed to answer a whole new set of questions.

Sites that provide user interaction, such as query facilities, will also need special logging and reports.

Other factors to consider are the size of the site and its volume of traffic. Both of these will affect the methods for gathering and analyzing the statistical data. There also has to be a procedure for archiving the logs so that they can still be accessed as part of the database.

We are also working on web robots -- client programs that can do things like testing response time and availability, load testing, gathering and distributing information, checking the validity of the links on your web pages, and many other tasks.

Another area that we are working on is a statistics package for FTP sites. This will be important in tracking software distribution, upgrades, literature distribution, etc.




Tony Karp, TLC Systems Corp 
 

Our web sites:
 Techno-Impressionist gallery: http://www.techno-impressionist.com
 TLC Systems:                  http://www.tlc-systems.com

Last modified Mar 29, 1996