1
CONTENT DISPLAY MONITOR
RELATED APPLICATIONS
This patent arises from a divisional of U.S. patent applica- 5 tion Ser. No. 09/490,495, which was filed on Jan. 25, 2000, and which is a continuation of U. S. patent application Ser. No. 08/707,279, now U.S. Pat. No. 6,108,637, which was filed on Sep. 3, 1996. Both U.S. patent application Ser. No. 09/490, 495 and U.S. Pat. No. 6,108,637 are hereby incorporated 10 herein by reference in their entirety.
BACKGROUND OF THE INVENTION
1. Field of the Invention 15 This invention relates to monitoring the display and observation of content by a computer system. The invention also relates to monitoring the display and observation at a content display site of content that is provided by a content provider site over a network to the content display site. The invention 20 further relates to the provision of updated and/or tailored content from a content provider site to a content display site
so that the content provider's current content is always displayed at the content display site.
2. Related Art 25 A large amount of human activity consists of the dissemination of information by information providers (content providers^ information consumers (observers). Recently, computer networks have become a very popular mechanism for accomplishing information dissemination. The use of com- 30 puter networks for information dissemination has necessitated or enabled new techniques to accomplish particular functions related to the dissemination of information.
For example, information providers of all types have an interest in knowing the extent and nature of observation of the 35 information that they disseminate. Information providers that disseminate information over computer networks also have this interest. However, the use of networked computers for information dissemination can make it difficult to ascertain who is observing the disseminated information and how, 40 since information can be accessed rapidly from a remote location by any of a large number of possible observers whose identity is often not predictable beforehand, and since control over the display of the information once disseminated may not be possible, practical or desirable. 45
Among information providers, advertisers have particular interest in knowing how and to what extent their advertisements are displayed and/or observed, since such knowledge can be a key element in evaluating the effectiveness of their advertising and can also be the basis for payment for adver- 50 tising. Mechanisms for obtaining such information have been developed for advertisements disseminated in conventional media, e.g., audiovisual media such as television and radio, and print media such as magazines and newspapers. For example, the well-known Nielsen television ratings enable 55 advertisers to gauge the number of people that likely watched advertisements during a particular television program. As advertising over a computer network becomes more common, the importance of developing mechanisms for enabling advertisers to monitor the display and observation of their 60 advertisements disseminated over a computer network increases.
Previous efforts to monitor the display of advertising (or other content) disseminated over a computer network have been inadequate for a variety of reasons, including the limited 65 scope of the monitoring information obtained, the ambiguous nature of the monitoring information, the incompleteness of
2
the monitoring information, and the susceptibility of the monitoring information to manipulation. Review of some of the techniques that have previously been used to acquire monitoring information regarding the display of content (e.g., advertising) disseminated over a particular computer network—the World Wide Web portion of the Internet computer network—will illustrate the deficiencies of existing techniques for monitoring the display of content disseminated over a computer network.
FIGS. 1A and IB are simplified diagrams of a network illustrating operation of a previous system for monitoring requests for content over the World Wide Web. In FIGS. 1A and IB, a content provider site 101 (which can be embodied by, for example, a server computer) can communicate with a content display site 102 (which can be embodied by, for example, a client computer) over the network communication line 103. The server computer at the content provider site 101 can store content colloquially referred to as a "Web page." The client computer at the content display site 102 executes a software program, called a browser, that enables selection and display of a variety of Web pages stored at different content provider sites. When an observer at the content display site 102 wishes to view a particular Web page, the observer causes the client computer at the content display site 102 to send a request to the appropriate server computer, e.g., the server computer at the content provider site 101, as shown in FIG. 1 A. The server computers at content provider sites all include a software program (in the current implementation of the World Wide Web, this is an http daemon) that watches for such incoming communications. Upon receipt of the request, the server computer at the content provider site 101 transfers a file representing the Web page (which, in the current implementation of the World Wide Web, is an html file) to the client computer at the content display site 102, as shown in FIG. IB. This file can itself reference other files (that may be stored on the server computer at the content provider site 101 and/or on other server computers) that are also transferred to the content display site 102. The browser can use the transferred files to generate a display of the Web page on the client computer at the content display site 102. The http daemon, in addition to initiating the transfer of the appropriate file or files to the content display site 102, also makes a record of requests for files from the server computer on which the daemon resides. The record of such requests is stored on the server computer at the content provider site 101 in a file 104 that is often referred to as a "log file."
The exact structure and content of log files can vary somewhat from server computer to server computer. However, generally, log files include a list of transactions that each represent a single file request. Each transaction includes multiple fields, each of which are used to store a predefined type of information about the file request. One of the fields can be used to store an identification of the file requested. Additional fields can be used to store the IP (Internet Protocol) address of the client computer that requested the particular file, the type of browser that requested the file, a time stamp for the request (i.e., the date and time that the request was received by the server computer), the amount of time required to transfer the requested file to the client computer, and the size of the file transferred. Other information about file requests can also be stored in a log file.
Previous methods for monitoring the display of content distributed over the World Wide Web have used the information stored in the log file. For example, one previous method has consisted of simply determining the number of transactions in the log file and counting each as a "hit" on a Web page, i.e., a request for a Web page. The number of hits is deemed to
3
approximate the number of times that the Web page has been viewed and, therefore, the degree of exposure of the content of the Web page to information consumers.
There are a number of problems with this approach however. For example, as indicated above, a request for a Web 5 page may cause, in addition to the request for an initial html file, requests for other files that are necessary to generate the Web page. If these other files reside on the same server computer as the initial html file, additional transactions are recorded in the log file. Thus, a request for a single Web page 10 can cause multiple transactions to be recorded in the log file. As can be appreciated, then, the number of times that a Web page is transferred to a content display site can be far less than the number of transactions recorded in the log file. Moreover, without further analysis, there is no way to accurately predict 15 the relationship between the number of transactions in the log file and the number of times that a Web page has been transferred to the content display site. Such inaccuracy can be very important to, for example, advertisers—whose cost of advertising is often proportional to the measured exposure of the 20 advertising—since the measured exposure of their advertising (and, thus, its cost) may be based upon the number of hits on a Web page containing their advertisement.
A method to overcome this problem has been used. By analyzing the contents of the log file to determine which file 25 was requested in each transaction, it may be possible to differentiate transactions in which the initial html file needed to generate a Web page is requested from transactions in which the requested file is one which is itself requested by another file, thus enabling "redundant" transactions to be identified 30 and eliminated from the hit count. While such an approach can increase the accuracy of counting Web page hits, it still suffers from several problems.
For example, log file analysis may result in some undercounting of Web page hits, apart from any overcounting. This 35 is because, once transferred to a client computer at a content display site, the files necessary to generate a Web page can be stored ("cached") on that client computer, thus enabling an observer at the content display site to view the Web page again without causing the client computer to make another request 40 to the content provider server computer from which the Web page was initially retrieved. Consequently, the observer can view the Web page without causing transactions to be added to the log file, resulting in undercounting of the number of Web page hits. 45
Additionally, log files are subject to manipulation, either directly or indirectly. For example, an unscrupulous content provider could directly manipulate the log file by retrieving and editing the log file to add phony transactions, thus artificially increasing the number of Web page hits and making the 50 Web page appear to be more popular than it really is. This problem can be ameliorated by causing the log files to be transferred periodically at predetermined times (e.g., each night at 12:00 midnight) from the server computer at the content provider site to a neutral network site; however, the 55 log file can still be manipulated during the time between transfers.
A log file might be manipulated indirectly, for example, by programming one or more computers to continually request a Web page, thereby generating a large number of hits on that 60 Web page. While the log file would contain transactions corresponding to actual file requests associated with the Web page, these requests would be artificial requests that would almost certainly not result in a display of the Web page, and certainly not in the observation of the Web page. Moreover, 65 checking the contents of the log file for an unusually high number of requests from a particular IP address (i.e., client
4
computer) may not enable such manipulation to be detected, since a large number of requests may legitimately come from a client computer that serves many users (for example, the proprietary network America OnlineTM has a handful of computers that are used by many users of that network to make connection to the Internet and World Wide Web).
It may be possible to identify the real origin of requests for content using "cookies." A cookie enables assignment of a unique identifier to each computer from which requests really emanate by transferring the identifier to that computer with content transferred to that computer. Future requests for content carry this identifier with them. The identifier can be used, in particular, to aid in identification of indirect log file manipulation, as described above, and, more generally, to enable more robust log file analysis.
Notwithstanding such enhancement, cookies do not overcome a fundamental problem with the use and analysis of log files to ascertain information regarding the display of content provided over the World Wide Web. That is, as highlighted by the overcounting problem associated with the above-described artifice and the undercounting problem associated with caching of content at the content display site, log files only store information about file requests. A log file does not even indicate whether the requested file was actually transferred to the requesting client computer (though, typically, such file transfer would occur). Nor does a log file include any information about how the file was used once transferred to the requesting client computer. In particular, log files do not provide any information regarding whether the content represented by the requested file is actually displayed by the client computer at the content display site, much less information from which conclusions can be deduced regarding whether—and if so, how—the content was observed by an observer. These limitations associated with the content of a log file cannot be overcome by a monitoring approach based on log file analysis. Moreover, log file analysis is calculation intensive, requiring hours in some instances to extract the desired information from the log file.
Another method of monitoring the display of content disseminated over the World Wide Web uses an approach similar to that of the Nielsen ratings system used in monitoring television viewing. In this method, monitoring software is added to the browser implemented on the client computers of a selected number of defined observers (e.g., families) to enable acquisition of data regarding advertising exposure on those computers. This information is then used to project patterns over the general population.
However, this approach also has several disadvantages. First, only a limited amount of data is collected, i.e., data is only obtained regarding a small number of information consumers . As with any polling method, there is no guarantee that the data acquired can be extrapolated to the general population, even if the observers selected for monitoring are chosen carefully and according to accepted sampling practices. Second, as the size of the World Wide Web (or other computer network for which this method is used) grows, i.e., as the number of content provider sites increases, the number of monitored observers necessary to ensure accurate representation of the usage of all content provider sites must increase, since otherwise there may be few or no observer interactions with some content provider sites upon which to base projections. It may not be possible to find an adequate number of appropriate observers to participate in the monitoring process, particularly given concerns with the attendant intrusion into the privacy of the selected observers. Third, installation of the monitoring software on a client computer to be compatible with a browser presents a number of problems. Such
« 上一頁繼續 » |