INFORMATION INDUSTRY ASSOCIATION
STATUTORY PROTECTION FOR DATABASES:
ECONOMIC & PUBLIC POLICY ISSUES
Laura D'Andrea Tyson and Edward F. Sherry 1
EXECUTIVE SUMMARY
1. Introduction
In 1991, in a landmark case (Feist Publications, Inc. vs. Rural Telephone Service Co.), the Supreme Court ruled that copyright protection did not extend to all or parts of databases that did not involve some original "creative" selection and/or organization of data.2 Indeed, the Court went further and ruled that such databases were not encompassed within the scope of the constitutional provision authorizing copyright protection. This sweeping decision eliminated the traditional "sweat of the brow" rationale for database protection that had been accorded under copyright law and left database producers in legal limbo in terms of their ability to protect themselves from unauthorized copying and dissemination of their products and from outright piracy.
Both scholars and participants in the database industry agree that the current situation is undesirable.
Because technology has expanded the potential applications of databases to myriad research, educational, medical, and business uses, the lack of adequate legal protections for the efforts of database providers poses a serious public policy challenge with widespread implications.
This paper presents the economic rationale for statutory protection of databases, building on the general economic concepts of private property rights. It argues that databases produced and disseminated by private producers require legal protection to ensure that they are provided in amounts and forms consistent with their market demand. At the same time, there is a valid public interest both in maintaining access to information among the scientific, educational and library communities and in preventing the potential abuse of market power by private database providers. The public policy challenge is to find the appropriate legal means to balance the interests of database producers-who are concerned that without adequate legal protection they will not be able to earn an adequate return on the substantial costs of developing and maintaining their information products-and database users-who are concerned that statutory protection will impede the flow of information by restricting its availability and raising its price. In the end, both producers and users are seeking to ensure that there is information available to support education, scientific progress, and economic growth. An appropriately crafted law providing statutory protection can meet this challenge to the benefit of both producers and users.
2. Economic and Technological Issues
Increasingly, the database market consists of electronic databases that allow users to combine software and information into powerful tools for research, educational, and commercial applications and for addressing major national challenges, such as finding cures for cancer and AIDS. Many of today's electronic products are developed along with sophisticated software for their use, and even when they rely on existing software packages, they are time-consuming and costly to produce and maintain because of rapid developments in software, computer and Internet technologies.
The database industry has grown rapidly, driven by new technologies. (Since 1979, the number of databases on the market has grown nearly 20-fold.)
Some observers argue that rapid growth of the database industry shows that statutory protection is either unnecessary or can be delayed without significant cost. This argument fails to consider how fast the industry might have grown with statutory protection, and overlooks the fact that growth in the industry has been partly based on the expectation that policy makers will act to provide such protection in the future. Furthermore, this argument ignores the potential impact of the recent European Community Directive on future growth of the database industry in the United States.
Electronic technologies have dramatically increased our ability to store, update and retrieve virtually limitless amounts of information. The paper describes several electronic databases that provide massive amounts of information about medicine, chemistry, agriculture and the military.
Electronic technologies have also enhanced the ability of users to copy and sell databases, thereby increasing the vulnerability of database producers to piracy.
At the same time, however, technology has enhanced the ability of database publishers to use "self-help" technical and contractual means to monitor the use and safeguard against the copying of their products. These means include encryption, passwords, on-line-only access to data, and contractual restrictions on further dissemination. In the absence of statutory protection, the industry has had to rely on such self-help approaches to protect their investments.
Without statutory protection, database producers can be expected to underprovide their products in easily-copyable formats (such as CD-ROM). This has two effects: consumers are made worse off because they are deprived of database format choices; and industry growth is slowed by the resources spent on self-help means to prevent copying.
The existence of technological and contractual means of protecting investments in the database industry does not obviate the need for statutory protection, just as fences or other methods of property protection do not obviate the need for legal protection against trespass-both work together.
2.1 The Changing Economics of Information Development
The absence of adequate property protection and the threat of low rates of return for database products will reduce the supply of reliable information produced by private companies.
From an economic point of view, all databases are costly to produce but easy to reproduce or copy. These two features of information generation and dissemination have some simple but profoundly important implications.
First, competing in the ever-expanding market to meet the modern demand for information is expensive. A considerable amount of time, money and effort is required to construct and maintain a database-information must be generated and compiled, verified for accuracy, searched for errors, organized for use and interoperability with other hardware and software products, and continuously updated over time. Innovations in Internet and computer technologies have made these last two features the most critical for today's users. Database providers must invest in their product to keep it functioning at state-of-the-art efficiency. Databases must be reconstructed to accommodate new conventions in computing power, new ways of "linking" to other electronic data (for example on the Web), and the exponential growth in the size of the data sets themselves.
All users-government, scientific, educational and commercial-have come to expect instant access to information. The Internet will continue to explode the stock of information available to all users at the relatively low cost of conveying it to them. But databases provide a different service. They organize, interpret and interrelate vast amounts of data according to an unlimited number of minutely tuned criteria that can be set and reset by the user. Information technologies, while expensive, have made possible research and discovery methods that were previously either unknown or impossible.
Second, to protect our common interest in identifying, creating and making available the best information, we must protect this valuable resource from pirating. Revolutions in electronic technologies that have made databases easier to use and more potentially useful have also made them easier to "pirate." The ability of a potential competitor (or customer) to "free ride" on the substantial investment of an original database developer by copying and selling (or re-selling) his database weakens market incentives for investment in the database industry.
Broadly defined, intellectual property rights-including patent protection, copyright protection and to some extent trade secrets-seek to strike a balance between encouraging widespread use of information and encouraging its development by giving its developer a limited right to set the terms of its use in order to try to recoup his investment with a reasonable rate of return. It is this kind of balanced protection that the Feist case eliminated.
As an economic matter, whether the government or the private sector should produce a particular kind of information depends on which method of production is most efficient or least costly in terms of society's resources.3
Sometimes government production will be the cheapest-for example, when the government is reporting on its own activities or when valuable information is a byproduct of government activities, such as law enforcement or space exploration.
Even when the government is the least-cost provider of a kind of information, it is a mistake to conclude that such information is free or should be made freely available. In the absence of explicit fees or charges for the production and dissemination of data, the government must pay for the costs associated with these activities through taxation.
For many kinds of information, however, there is a strong presumption that the private sector will be the low-cost provider. And there is also a strong presumption that private production and market signals will avoid wasting resources in generating information that has little or no value
Concern about the potential for the exercise of market power by private producers of databases is a major motivation behind concerns about statutory database protection, but there is little empirical evidence that the exercise of market power is actually a significant problem, even in so-called niche markets.
2.2 Responsive Pricing
There is an incentive for private producers to try to extend the audience for their information products through a pricing strategy called price differentiation-that is, through charging different prices for the same or very similar information to different users or to users seeking different packages of information. However, database developers will not be able to extend the audience for their products through price differentiation unless copying and resale are controlled through adequate statutory protection.
Private producers of information typically will seek to recoup their investment costs by pricing their product above the cost of distributing that information to users. Such prices, while they may be necessary to insure continued investment and innovation in the database industry, will deter some potential users. Concern that such pricing strategies might limit access to information by scientists, educators, and libraries has motivated opposition to statutory database protection.
However, information providers already widely employ the practice of price differentiation, recognizing that it expands their potential market, not only for products of immediate concern, but also for future products, as more users become accustomed to incorporating database services into their work or research.
Database users differ significantly not only in their ability to pay but also in their needs for accuracy, completeness, timeliness, support service, search engines, and ease of use. Database providers already offer different configurations of data and services at different prices, charging lower prices with fewer services to those with a low willingness or ability to pay and higher prices with more services to those with a higher willingness or ability to pay.
Non-governmental producers of databases that are used for scientific research in both commercial and academic settings often charge lower prices for the latter.
If copying and resale are not controlled by adequate statutory protection, they will undermine the ability of original database developers to use price differentiation. Both producers and consumers can benefit from price differentiation in the database industry, but piracy and copying will eliminate these benefits.
3. Concerns About Statutory Protection for Databases
There are strong economic arguments for statutory protection for databases. But concerns also have been voiced about possible negative effects of such protection. These concerns fall into two broad categories:
1.concerns that database producers, especially those of highly specialized goods with limited niche audiences, already have substantial market power, which will be enhanced by statutory protection to the detriment of consumers; and
2.concerns that such protection could reduce access to information by the scientific and educational communities, thereby slowing technological progress and economic growth.
3.1 Market Power
Economists have long recognized that the credible threat of a new market entrant is a powerful constraint on the ability of firms to exercise their potential market power.
Some skeptics of statutory protection for databases have concluded that "the market for commercially distributed databases is almost universally characterized by a distinct absence of competition" and that "the private database industry is largely characterized by niche marketers who supply and dominate specific market segments.4" These conclusions are not well supported by either economic logic or empirical evidence.
A large part of the database industry, including many of the most commercially significant databases, operates in an intensely competitive environment.
Even the existence of a small number of firms serving a particular market does not mean that those firms exercise significant market power.
1.Competing firms rarely supply the "same" database. Rather they compete on a range of fronts: selection and updating of data; convenience; search engine; ease of use; and price.
2.In a market economy, firms prosper by supplying what their customers want. When one sees a niche database market supplied by a single firm, that may be evidence that the firm is doing a good job serving the needs of the market at reasonable prices.
3.It is typically less costly for a new entrant to replicate an existing database than it was for its original producer to develop it in the first place, in which case the fixed costs of entry for the second firm are lower than they were for the first.
4.There are always opportunities for new entrants in niche markets as long as the underlying information contained in a particular database is available and can be replicated. If an investment in a particular database proves sufficiently attractive, firms will be encouraged to enter the market, and will be able to do so because they can turn to the original data or can enter into licensing agreements with the original compiler.
3.2 Competition and Data Replicability
In most cases a potential entrant can get data from the same sources as the original firm, in which case there is no public policy need to allow the new entrant to free ride on the original firm's investment.
The possibility of replicating the underlying data in a database is a key factor affecting the potential for market entry of new competitors. In fact, the underlying data in most databases is replicable.
Data generated by the government is usually made available to users in its raw form at or below its dissemination cost, and that data can be collected by classic "sweat-of-the-brow" effort.
Some databases rely on privately generated data that cannot be precisely replicated but for which comparable data would be available at comparable cost to competitors willing to make the necessary investment.5
Privately generated data that relies on proprietary information cannot be replicated by competitors. As a pragmatic matter, it is unlikely that these databases would be commercially sold; information of this sort is commonly closely held as a trade secret.
Historical data may not always be replicable. But if adequate protection is not provided for databases whose contents are not contemporaneously replicable, there will be little incentive to try to record the sorts of data that cannot be measured again-precisely the sorts of data that should be collected while it is still possible to do so.
Some privately generated data cannot be replicated at comparable cost by a competitor because the data is produced by a publicly sanctioned monopolist. Phone numbers, for example, are arbitrarily assigned identifying data, privately generated with no additional effort (the phone company assigns a number as a provision of service) in the course of operating a publicly sanctioned local-monopoly business. When data is generated by a government-created monopolist, it is not appropriate to allow the monopolist to control database products building on that data. These sorts of data, collected by a government-created or government-sanctioned monopolist as a provision of service, should be made available to other users.
Even when the underlying data in a database is not contemporaneously replicable, the basic case for adequate statutory protection remains the same: in the absence of such protection, there will be inadequate incentive to develop the original database in the first place.
3.3 Preferential Access
The question of whether particular categories of users should get "preferential access" to information contained in privately produced databases is not the same question as whether the producers of these databases should be afforded adequate statutory protection.
Access to data by the scientific and educational communities is vitally important, and effective statutory protection for databases can be drafted to respect these needs. Indeed, such protection will be beneficial to these communities, because it will provide the market incentives necessary to maintain healthy private investment in databases over time.
Even if as a matter of public policy, certain kinds of users should receive preferential access, there remains the public policy issue of how that access is best achieved. A subsidy financing the purchase of a necessary good or service by preferred categories of users is the most direct form for realizing this objective. Allowing them to take the amount they want of a good or service from those who supply it without paying for it is only one particular form of subsidy-a subsidy in kind that is financed in essence by the suppliers and the other paying customers they serve. Usually such an in-kind subsidy proves considerably less efficient than a direct subsidy from the government.
Those who use existing information at zero price can adversely affect the growth of information over time to both their own detriment and to the detriment of paying users.
Inadequate funding can deprive such preferred users of the data they need to do their work, thereby depriving society of the benefits of technological progress and educational attainment. But in-kind subsidies taken from database providers by such users in the form of unauthorized copying for unauthorized purposes are not the appropriate remedy for addressing these valid concerns. To provide conditions for a healthy and competitive database industry that will serve the needs of all users, adequate statutory protection is required.
The need for adequate funding for science and education-to support the use of equipment, publications, software and data-is an important public policy challenge. But this challenge is logically distinct from the challenge of providing an appropriate environment for protecting the rights of private database producers.
Many of the concerns the scientific and educational communities have expressed in the debate about statutory database protection are in fact concerns about whether the government will continue to put adequate resources into the development and dissemination of those kinds of information for which it is the likely low-cost producer-information such as statistics, weather data, space exploration data, and court opinions-or whether it will "privatize" these activities in ways that will increase their cost to data users. Again, any debate over the privatization of information produced by the government is distinct from the debate about statutory protection for databases.
4. Legal Concerns about Appropriate Statutory Protection for Databases
Economic logic supports statutory protection for databases. How best can such protection be provided: How long should it last? What mode should it take? What should be its scope?
4.1 Duration of Protection
The shorter the period of protection, the greater the incentive of producers to set high prices to try to recoup their investment during the allotted time.
Under current US law, patents last 20 years from the date of filing; copyright protection lasts for the life of the author plus 50 years (or for 75 years from publication for works created by entities). The EC Database Directive protects databases for 15 years. HR 3531 proposed a 25-year term of protection.
From an economic perspective, it is difficult to determine how long protection should last. Ideally, one might want the length of protection (and/or the scope of protection) to vary from database to database, but such a system would be totally impracticable.
The shorter the period of protection, the greater the incentive of producers to try to set high prices to recoup their investment during the allotted time. Public policy makers should consider the likely relationship between duration of protection and firm pricing strategies in drafting statutory protection for databases.
4.2 Updating and Protection
Database providers spend hundreds of millions of dollars a year updating their existing databases, and the newly-updated databases also need protection.
Some commentators have expressed concern that various proposals for statutory protection for databases will enable database providers to obtain "perpetual" protection for their databases merely by updating information contained in their products on a regular basis.
This argument is no more significant when applied to databases than it is when applied to updated copyrighted material (such as new editions of books or reference works). Many encyclopedias or other reference works are updated regularly. Each new edition is copyrighted, and the copyrights on old editions expire overtime. This does not provide "perpetual" protection.
If statutory protection for databases were provided for 25 years, the 1997 edition of a database would become available for copying by competitors in 2022. If in 2022 there were a market for the 1997 edition, a potential competitor could make a copy of it and compete with its original developer.
Since the point of statutory protection is to protect investment in the creation, verification, maintenance and dissemination of information, such protection, when extended to updated products, in principle should apply to new content and to the additional investment required to verify, maintain and disseminate old content included in new editions. Once the protection on a database runs its course, users should be allowed to copy its contents.
4.3 Use vs. Replication: On Sufficient Statistics and Scientific Research
The claim that increased protection will impede the ability to use databases for scientific and academic research rests on a misunderstanding of the difference between use and copying.
Statutory proposals for database protection typically allow users to extract and copy small portions of a database, but seek to protect users or database developers against copying "all or substantial portions of" it. Some have suggested that this would improperly limit the ability of legitimate users to make use of the database. We disagree. Using a database is not the same thing as copying it.
Scientists and other researchers typically can use "all or substantial portions of" the data in a database. They formulate and test hypotheses, perform statistical analyses, and, in the case of some sophisticated databases, even input their findings directly into the database to be compiled with and referenced against the complete data set.6
Scientists and researchers may need to make and temporarily store (e.g., in RAM) an electronic "working copy" of the relevant data in a format useable by their statistics software package (or other analytic tool). Raw data is a tool, not a goal in itself; it needs to be used with other tools (such as search engines and statistics packages) in order to yield useful results. Absent the ability to manipulate and analyze the data, scientists would have little use for it in the first place. New legislation protecting databases can make it clear that such temporary "working copies" are lawful, so long as they are not used by unauthorized persons or in an unauthorized manner.
4.4 Different Forms of Legal Protection For Databases
Under current law, the main legal paradigm used to protect a database developer's interest in a database is a property-rule regime, as reflected in the common term "intellectual property" to refer to patent and copyright law. But there are also elements of a tort-based liability rule regime, notably in the law of trade secrets. One can use hybrid systems combining elements of tort and property.
4.5 Liability Rules vs. Other Forms Of Protection
The fundamental policy choice between a property rights approach and liability rules turns on whether negotiations among the parties are possible. When negotiations are not possible, liability rules have been developed to recompense owners for past infringement of their rights through court-awarded damages. When negotiations are possible, a potential user of property can negotiate with its owner about the terms and conditions for its future use.
Those contemplating copying or using someone else's database almost invariably know who developed the original database, and they are free to negotiate for the right to copy or use it.
Arms'-length negotiations will establish a market price for those rights. We believe it is both unwise and inefficient to substitute court-established damages figures for market-established prices. It would establish the equivalent of a "private right of eminent domain," allowing others to take what they wish from a database without the owner's permission, subject only to paying court-awarded damages. The administrative burden imposed on the courts from trying to substitute court-awarded damages for privately negotiated prices would be severe.
4.6 Property Rights in Databases are Not "Exclusive" Rights to the Data
Giving a database developer a "property" right in a database does not "exclude" others from replicating the underlying data from original sources and using or selling their own version of a database product based on that information.
In this regard, patent protection is significantly different from proposals for database protection. While both are termed "property" rights, the ability to preclude others from independent development of the same or a similar product are significantly different. Patent protection, in this sense, is much more exclusive in effect than are the proposals for database protection.
4.7 "Value-Added" Products
There is a well-established system by which those who seek to provide "value-added" products negotiate with the original authors and publishers for the right to do so. The fees to be paid to the original authors and publishers are set by negotiation.
Many "value added products" are complements to the original database. The customer may find the database easier to use, or more valuable, because she also has the complementary product. Complementary products, such as an index to a database, a better search engine, or a manual, increase demand. Database developers have an incentive to encourage the development of complementary products.
At the other extreme, some "value added products" are economic substitutes for the original database. This is especially likely to be the case if the substitute product contains a substantial amount of the contents of the original database. Consumers may or may not prefer the substitute product over the original database. Adequate statutory protection will allow the original developer to negotiate a license fee from a competitor for the right to develop a value-added product.
4.8 Compulsory Licensing
A flat compulsory licensing fee would allow competitors to "skim the cream" by just copying the most successful databases. "Compulsory licensing" proposals are a particular form of price control, this time over the "price" of access to the contents of a database.
Some critics of statutory database protection have urged that database providers be required to license their databases to "second comers" and those who wish to supply "value-added" products. For example, it has been proposed by Professors Reichman and Samuelson that others should be free to use all or part of a database on "payment of reasonable compensation according to a menu of user options vetted by the industry with user and government inputs."
Implementing any such system, and making sure that the "reasonable compensation" and the "menu of user options" keep up with technological and market changes, is likely to be a daunting and controversial task.
Economists have for years objected to the distortionary effects of price controls of all sorts. "Compulsory licensing" proposals are a particular form of price control, this time over the "price" of access to the contents of a database. The government is not well-placed to set such prices, for the same reasons that price controls are generally inefficient.
5. Conclusions
There are strong economic reasons for providing adequate statutory protection for the database industry. Such protection would serve the interests not only of database producers, but also of database users, including users in the scientific, educational and library communities. The creation, storage, verification, maintenance, updating and dissemination of information serve valuable economic and public policy functions-and they are not free whether they are performed by the government or private companies. Indeed, such activities often involve substantial upfront costs and considerable risk, since it is impossible to predict their actual value until the resulting information products are available for use. These costs and risks may be especially daunting for the development of highly specialized databases that are likely to have limited applications in the commercial arena, at least in the short run, and that therefore may have to rely initially on demand from a limited number of scientific and academic users with limited ability to pay.
Without effective statutory protection, private firms will be deterred from investing in database production. The resulting shortfall in the provision of information will have adverse effects on the pace of technological progress, on the economy's growth potential, and on the very research and educational communities whom critics of statutory protection wish to help.
------------------------------------------------------------------------
1 - Dr. Laura D'Andrea Tyson is the Class of 1939 Professor of Economics and Business Administration at the University of California at Berkeley. Dr. Edward Sherry is an attorney and Senior Economist with the Law and Economics Consulting Group, Inc. Research assistance and support on this project was provided by Alan Marco.
2 - It is important to emphasize that, for databases, idiosyncratic or creative selection or organization of data may be undesirable. For many business and medical users, for example, the most valuable databases are those that contain comprehensive, current information that is logically organized so as to be easy to navigate.
3 - Concerns about possible government censorship or control over information also bear on the question of whether the government or the private sector should provide certain kinds of information.
4 - J.H. Reichman and Pamela Samuelson, "Intellectual Property Rights in Data?," Vanderbilt Law Review, Vol. 50, No. 1, January 1997.
5 - For example, the A.C. Nielsen Company collects scanner data through exclusive contracts with supermarkets. It then sells its ScanTrack reports about pricing, sales volume, and market share based on that data to grocery manufacturers. Nielsen's competitors can not precisely replicate Nielsen's data, but they can obtain "just as good" data through their own efforts.
6 - For example, this interactive function is a component of MDL Information Systems' ISIS database software.
STATUTORY PROTECTION FOR DATABASES:
ECONOMIC & PUBLIC POLICY ISSUES1
Laura D'Andrea Tyson and Edward F. Sherry 2
1. Introduction
In 1991, in a landmark case (Feist Publications, Inc. vs. Rural Telephone Service Co.3), the Supreme Court ruled that copyright protection did not extend to all or parts of databases that did not involve some original "creative" selection and/or organization of data.4 Indeed, the Court went further and ruled that such databases were not encompassed within the scope of the constitutional provision authorizing copyright protection. This sweeping decision eliminated the traditional "sweat of the brow" rationale for database protection that had been accorded under copyright law and left database producers in legal limbo in terms of their ability to protect themselves from unauthorized copying and dissemination of their products and from outright piracy.
Both scholars and participants in the database industry agree that the current situation is undesirable: in the words of Professors J.H. Reichman and Pamela Samuelson, two thoughtful but skeptical legal scholars who have written on database protection, "firms that make the contents of databases accessible to the public often become vulnerable to market-destructive appropriations that existing laws do not adequately remedy…The risk of market failure inherent in this state of chronic underprotection tends to keep the production of information goods at suboptimal levels.5" A similar conclusion has been posited by other legal scholars, such as Professor Paul Goldstein, who notes that data and databases get less protection from copyright than their producers need to support the expense of data collection and assembly.6
As technology expands the potential applications of databases to myriad research, educational, medical, and business uses, the lack of adequate legal protections to database providers poses a serious public policy challenge with widespread implications. This challenge has been further complicated by the decision of the European Community (EC) to issue a directive that provides both copyright and specific protection for the contents of databases. The EC Directive demands reciprocity with a threat that US databases will not be afforded such protection in Europe unless the US adopts "similar" legislation by 1998. A first legislative proposal to address this challenge was introduced in Congress in 1996 (HR 3531), but it attracted criticism from segments of the scientific, educational, and library communities, as well as from some legal scholars. Recently, there has been renewed Congressional interest in providing some sort of statutory relief to database producers and to do so in a timely way to meet the needs of database producers and respond to the European Directive.
This paper presents the economic rationale for statutory protection of databases, building on the general economic concepts of private property rights. It argues that databases produced and disseminated by private producers require legal protection to ensure that they are provided in amounts and forms consistent with their market demand. At the same time, there is a valid public interest both in maintaining access to information among the scientific and educational communities and in preventing the potential abuse of market power by private database providers. The public policy challenge is to find the appropriate legal means to balance the interests of database producers-who are concerned that without adequate legal protection they will not be able to justify incurring the substantial costs of developing and maintaining their information products-and database users-who are concerned that statutory protection will impede the flow of information by restricting its availability and raising its price. In the end, both producers and users are seeking to ensure that there is information available to support education, scientific progress, and economic growth. The final sections of this paper present some suggestions about the kinds of legal protection that might provide an appropriate balance of interests between users and producers to the benefit of both.
2. Economic and Technological Issues
Increasingly, the database market consists of electronic databases that allow users to combine software and information into powerful tools for research, educational, and commercial applications and for addressing major national challenges, such as finding cures for cancer and AIDS. Many of today's electronic products are developed along with sophisticated software for their use, and even when they rely on existing software packages, they are time-consuming and costly to produce and maintain because of rapid developments in software, computer and Internet technologies.
Although the actual wording of effective statutory protection for databases will require a formal definition of a database, for the purpose of understanding the economic rationale for such protection, a database can be defined simply as any organized collection of information. This definition is broad enough to include the vast array of databases that are part of today's economy, from text-based databases, such as databases of court opinions and statutory regulations, to numerical databases, such as databases of federal, state and local economic statistics, to sophisticated digital databases that combine information and software into complex systems for decision making. A database can be as simple as an index, an almanac or the listing of daily stock market prices at the back of the financial section of a newspaper. But, increasingly, the database market consists of electronic databases that allow users to combine software and information into powerful tools for research, educational, and commercial applications and for addressing major national challenges, such as finding cures for cancer and AIDS and maintaining the competitiveness of US agriculture.
Competing in the ever-expanding market to meet the modern demand for information has become expensive. A considerable amount of time, money and effort is required to construct and maintain a database-the need for the information must be identified; it must be generated and compiled, verified for accuracy, searched for errors, organized for use and interoperability with other hardware and software products, and continuously updated over time. Innovations in Internet and computer technologies have made these last two features critical for today's users. Database providers must vigorously invest in their products to keep them functioning at state-of-the-art efficiency. Databases must be reconstructed to accommodate new conventions in computing power, new ways of "linking" to other electronic data (for example on the Web), and the exponential growth in the size of the data sets themselves.
All users-government, scientific, educational and commercial-have come to expect instant access to relevant information. The Internet will continue to explode the stock of information available to all users at the relatively low cost of conveying it to them. But databases provide a different service. They organize, interpret and interrelate vast amounts of data according to minutely tuned criteria that can be set and reset by the user. Information technologies, while expensive, have made possible research and discovery methods that were previously either unknown or impossible.
A few examples illustrate the scope and complexity of today's electronic databases and the amazing applications they enable. PoisIndex is an index of approximately one million entries on a wide variety of poisonous substances, including drugs, chemicals, commercial and household products, and biologic substances. Substances are reviewed for entry into the database by a group of skilled medical professionals, who also scan the world's medical literature for pertinent data on toxic exposure and management. Approximately 200 actively practicing clinicians from over 20 countries participate in the editorial and selection process. Each substance entry in the database is linked with up to four full-text documents outlining clinical effects, range of toxicity, treatment measures, and other toxicological information. Software engineers are employed to maintain, test, produce and support the database and the software required to store, edit, sort and retrieve the data. The typical PoisIndex user is a medical professional, usually an emergency physician or poison center specialist, who needs instant access to such information in life-threatening circumstances.
The MDL Drug Data Report (MDDR), produced by MDL Information Systems in cooperation with Prous Science Publishers, is a database of approximately 85,000 chemical compounds with potential drug applications. It is updated on a monthly basis from a specialized search of published reports, patent applications and scientific papers so as to make data available on new biologically active compounds as soon as they are disclosed. MDDR tracks these compounds through stages of development and into clinical trials. Accompanying software permits researchers to analyze the effects of modifications of a drug compound's structure on its properties. Researchers can also combine the results of their own internal and external results with the database supplied by MDL to develop their own specialized research tool.
Visible Human is a database product consisting of more than 10,000 images for exploring human anatomy. Color photographs, along with magnetic resonance and registered computed tomography images, provide new perspectives into the structure and function of the body. Several versions of Visible Human are available for use with different hardware and software support systems including Windows, Macintosh, and Unix. Relying on any one of these systems, the user can view the body using a graphical navigator, reference images using bookmarks, animate a series of images to gain insight into anatomical relationships, and annotate and highlight areas of interest with text, color and markers. This research and educational tool builds on a database of the human body developed by the National Library of Medicine. Until a private producer came along to develop Visible Human as a commercial venture, the underlying data was not readily available for use by either academic or other potential customers.
Derwent World Patents Index is a comprehensive database of more than seven million separate inventions culled from more than 13 million patent documents worldwide. Coverage includes patents of products from the pharmaceutical industry, agricultural and veterinary medicine, polymers and plastics, chemistry, electronics, electrical and mechanical engineering. All patent information is presented in a uniform, user-friendly format consisting of a simplified English-language abstract explaining key technical details and highlighting applications. In addition to bibliographic information, technical drawings or diagrams are included as available. The Index is updated weekly with information from 40 patent-issuing authorities around the world, 1200 scientific journals, and papers presented at international conferences. Users of the Derwent Index include patent and information professionals, research scientists, engineers, universities, research institutes, libraries, and individual inventors and entrepreneurs.
Now in its centennial year, Jane's provides a variety of data on defense and security issues and in the civilian aerospace and transportation fields. Jane's is rapidly completing a transition from print forms of database information, such as yearbooks and reference books, to electronic databases available to users in various formats including hard copies, online service, and CD-ROM. Jane's data collection process involves a large number of correspondents, freelance contributors, authors and editors who seek out or contribute information from a variety of sources including text, photos, video, audiotapes, and interviews. Jane's databases often rely on highly specialized data collection efforts. For example, to prepare a 200-page special report on trends in land mine warfare, one researcher traveled to Bosnia to view the latest discoveries in anti-personnel and anti-tank mines. In addition to its information-gathering staff, Jane's employs a pool of experts to verify the accuracy of its database products. Users of Jane's database products include government officials, military experts, journalists, industry experts, and academics.
Electronic databases are also being developed and used to improve agricultural decision making. The National Agricultural Database Laboratory, run by the University of Wisconsin and supported by a variety of public and private funding sources, has a number of database products either already available or under development for the nation's farmers. These include a National Dairy Database, a National Pig Database, a National Sheep Database, a National Beef Database, and a National Poultry Database. The National Dairy Database, the first of the Laboratory's projects, consists of 600 peer-reviewed US dairy publications and related publications on water quality, crops, fertilizers, and waste management. The solicitation process for this database began as a call for materials to agricultural extension offices, although entries from a variety of channels, including end users, now have been suggested for inclusion. The database is developed and supported by existing hardware and software technologies and maintained, updated, and enlarged by a university-based research team. Revenues from the sales of the database are returned to the Laboratory to cover expenses incurred in the preparation and electronic publishing of its growing number of database products. Users include individual farmers, university research institutes, agricultural extension centers, and individual researchers.
Database software is enhancing existing databases with powerful, specialized new tools, and database software developers are expanding into the provision of data to optimize on these software capabilities for their users. MDL Information Systems, for example, found that there was an inadequate supply of scientific data for its ISIS software system, and has since developed eight different databases in reaction chemistry or synthetic methodology; two databases in bioactivity (including the Drug Data Report); one metabolism database; four chemical sourcing databases; and six databases of chemical safety information (precautions, toxicity, transportation warning and the like). The ISIS databases enable sophisticated scientific analysis by allowing users to create original data sets to test scientific hypotheses and methods in unique ways. ISIS software can combine archival data from one or more of the MDL databases, from another database provider, or from the scientist's own proprietary database into huge interrelated data sets that can be searched and linked in numerous combinations.
MDL's database of synthetic methods, for example, brings together for scientists a forty-year survey of published papers on synthetic methods with the most current methods, and this information is updated four times per year. Scientists can explore the effects of a particular synthetic method or set of methods on particular elements in ways that would previously have required months if not years of study of the literature and tedious experimentation in the laboratory.
Combinatorial chemistry databases perform automated syntheses of chemical elements that allow researchers to experiment with molecular structure. With this process they can develop new compounds, which may prove to have value as herbicides, drugs, polymers or fibers. Major pharmaceutical companies and biotechnology companies use the ISIS system to increase the speed of the product development process and reduce their time to market with new products.
As these examples indicate, today's electronic database products are a far cry from the simple printed databases that have been a significant part of the publishing industry for centuries. With the touch of a few keys, users can scan through information culled from a variety of sources over long periods of time. Many of today's electronic products are developed along with sophisticated software for their use, and even when they rely on existing software packages, they are time-consuming and costly to produce and maintain because of the necessity of keeping up with rapid developments in software, computer and Internet technologies. Indeed, sophisticated database producers such as MDL and Bloomberg maintain their own staff of professional software engineers. As described above, many of today's database products also have interactive features, allowing users to add their own data to the system's underlying information to create a customized database of their own. In addition, many database products are highly specialized for particular users. Unlike many software developers, who may develop a program with millions of potential customers, many producers of specialized databases face markets of limited size-sometimes no more than a few hundred customers. Without adequate legal protection, commercial producers of such products may not be able to justify the costs and risks associated with developing them.
Because the boundaries between databases and other sorts of information sources are increasingly fuzzy, it is difficult to get a precise measurement of the size of "the" database industry. But by any reasonable measure, it is both large and growing rapidly (see Tables 1 and 2).
Table 1 lists sales figures that represent many ways of measuring the role of databases in the economy. Some of the industry categories in the table clearly include more than just databases. For instance, the first entry "US publishing industry and related services," includes newspaper publishing, which is broader than just the databases contained therein. On the other hand, no category clearly includes all of the database industry. For instance, proprietary databases, such as those compiled by A. C. Nielsen Company, would not be included in the publishing category. Rather, Nielsen shows up as the biggest company listed by Ward's Business Directory under "Commercial Nonphysical Research," (our last entry in Table 1) with almost $1.3 billion in sales. Generally, most of the large companies in this category produce databases as one of their lines of business. However, this category would exclude companies such as Lexis/Nexis, a large producer of full-text legal and news databases.
Table 1. Estimates of the size of the database industry 7
Industry DescriptionYear of SourceSalesOther Data1. US publishing industry and related services19968$200 billion.2. Newspapers, books and magazines19969$85 billion.3. Data processing and network services, 1993199410$46.4 billion.4. Business information suppliers199611$26 billion.5. Data processing and preparation, SIC 7374199712$21.4 billion.167 companies
180,700 employees.6. Electronic Information industry199613$15 billion.7. Database revenues of business information, 1994199614$13.8 billion.8. 1993 Electronic information services199415$13.6 billion.9. 1995 Electronic delivery of business information (primarily online and CD-ROM)199616$10.7 billion. 10. Information retrieval services, SIC 7375199717$7.8 billion.345 companies
60,800 employees11. Commercial Nonphysical Research, SIC 8732199718$4.5 billion413 companies
52,000 employees.
Table 2 reveals that although both the number of databases and the number of database producers have continued to expand since the Feist decision in 1991, the growth rates for both of these measures slowed considerably in the six years following that decision compared to the prior six years. Although not conclusive, these growth numbers suggest that this decision may have dampened investment in the industry, as economic logic predicts.19
Table 2. Growth of the database industry 20
Number of
DatabasesNumber of
ProducersNumber of
Online Services1997300221591980411289711981641411135198291981218919831360820244198418071069327198522471316414198623691379454198728231568528198831351685555198935351813600199039431950645199143322120718199244472033772199351832204818199453002232822199553422202828199655112255860199757392312899Source: Gale Directory of Databases, p. x. Information reflects the number
of entries published in the Directory of Online Publishers since 1979.
Electronic technologies have dramatically increased the ability to store, update and retrieve large amounts of information. They have also changed the ability of users to copy and sell databases and the ability of database producers to monitor their use. All of these changes have important implications for the economics of the database industry.
When databases were provided in book form, copying was limited by copyright law and the technology of printing, and more recently by the technology of photocopying. But with the advent of computerized databases, online access, and scanning technology, it is now significantly easier for a user to copy large parts of an electronic database.21
On the other hand, technology also has enhanced the ability of database publishers to use "self-help" technical or contractual means to monitor the use and safeguard against the copying of their products. These means include encryption, passwords, on-line-only access to data, and contractual restrictions on further dissemination. In the absence of statutory protection, the industry has had to rely on such self-help approaches to protect its investments. Although they may have been reasonably effective in some cases-and may explain in part why the industry so far has grown rapidly in the absence of statutory protection-such approaches may well have come at a cost in terms of efficiency. For example, many databases currently available only on-line could easily be disseminated to users on CD-ROM, and many users might prefer that format. But in the absence of statutory protection, database providers may not produce CD-ROM versions of their products because they can be easily copied and sold at a lower price. By contrast, it is considerably more difficult for potential pirates to copy and sell the contents of on-line-only databases. In other words, without statutory protection, database producers can be expected to underprovide their products in certain easily-copyable formats. This has two effects: consumers are made worse off because they are deprived of database format choices; and industry growth is slowed by the resources spent on self-help means to prevent copying.
In the database industry, as in other economic activities, self-help means of protecting property sometimes serve as economic substitutes for legal means of protection. For example, fences, guard dogs and private security patrols can substitute for statutory protections against trespassing on private property. Economic logic suggests, however, that there is an optimal mixture of self-help and legal forms of protection. In the case of property protection, self-help methods do not obviate the need for legal protection against trespass-both work together. Nor in the case of databases does the existence of technological and contractual means of protecting investments obviate the need for statutory protection. Indeed, rapid changes in digital technology can render many such self-help approaches obsolete overnight, as the history of copy-protection in the software industry demonstrates. They may also be incompatible with user needs and desires.
3. The Economics of Information and Databases
The absence of adequate property protection and the threat of low rates of return for database products will reduce the supply of reliable information produced by private companies.
From an economic point of view, all kinds of databases, like any form of economically significant information, have two things in common: they are costly to produce, but they are easy to reproduce or copy. These two features of information generation and dissemination have some simple but profoundly important implications.
The fact that it is cheap to reproduce information once it has been produced suggests that it should be made available to users at its relatively low cost of dissemination, a cost that has fallen over time for most kinds of information as a result of revolutions in copying and digital technologies. Indeed, if there was a fixed supply of information available for free, economic theory would suggest that anyone who wanted it should be able to obtain it at no cost beyond the relatively low cost involved in conveying it to them (a cost that economists call the "marginal cost of dissemination").
But there is not a fixed free supply of information for two reasons. First, information must be developed, produced, generated or discovered in the first place. Second, it must be collected and made available to the consumer. Just as there is a significant difference in economic value between an apple hanging on a tree in an orchard 500 miles from me and the "same" apple in the produce section of my local supermarket, there is a significant "value-added" from compiling available information (often from diverse sources) and converting it into a useable form.
A great deal of time, money and effort is required to generate and compile information, verify its accuracy, detect technical or transcription errors, and organize it for use. The substantial costs involved (both monetary and non-monetary) are both "fixed" (in the sense that they do not depend on whether one person or a million uses the resulting information) and what economists call "sunk" (in the sense that these costs are not recoverable should the information no longer be needed).22
Without the prospect of an adequate return, producers will tend not to invest in generating, collecting and organizing information. This is especially problematic if end-users can take information without paying for it, or if competitors (or customers) "free ride" on the substantial investment of an original database developer by copying and selling (or re-selling) his database. The competitors, who do not have a comparable investment to try to recoup, can typically undercut the price, further undermining the original compiler.
From an economic perspective, then, there is a need to provide those who make such investments with the prospect of earning an adequate or competitive return on those investments, so as to encourage future efforts.23 This presents the fundamental economic paradox of information. Once information has been generated, economic logic suggests that it should be made available to anyone who wants it at a low or zero price commensurate with the cost of disseminating it. At the same time, however, such a price does not provide adequate incentive for making investments in information products, nor does it provide a signal as to which databases should be produced in the future.24
Broadly defined, intellectual property rights-including patent protection, copyright protection and to some extent trade secrets-seek to strike a balance between encouraging widespread use of information and encouraging its development by giving its developer a limited right to set the terms of its use in order to try to recoup his investment with a reasonable rate of return.25 It is this kind of balanced protection that the Feist case eliminated.
The fact that information is costly to produce raises another key question: who should produce it in the first place? Because information has a key characteristic of what economists call a "public good"-that is, one person using it does not prevent another person from using it-it is sometimes argued that the government should produce information or at the very least subsidize its production by the private sector.26 But this conclusion is not always warranted. As an economic question, whether the government or the private sector should produce a particular kind of information depends on which method of production is most efficient or least costly in terms of society's resources. Sometimes government production will be the cheapest-for example, when the government is reporting on its own activities or when valuable information is a byproduct of government activities, such as law enforcement or space exploration.
But even when the government is the least-cost provider of a particular kind of information, it is a mistake to conclude that such information is free or should be made freely available. In the absence of explicit fees or charges for the production and dissemination of data, the government must pay for the costs associated with these activities through taxation. The costs of information do not disappear when the government is the provider, although how these costs are recouped and from whom may change significantly. Indeed, economic logic suggests that the government should charge at least the incremental cost of dissemination of government information to those who use it. The imposition of such user fees would reduce the need to finance information generation and dissemination through taxes, and would insure that those who use the information value it at least as much as it costs to provide it to them.27
For many kinds of information, however, there is a strong presumption that the private sector will be the low-cost provider. And there is also a strong presumption that private production and market signals will avoid wasting resources in generating information that has little or no value.
In the case of information production, the presumption in favor of private production must be tempered by the concern that the economics of information production-its high upfront costs and low marginal costs-may deter market entry and result in market power. Concern about the potential for the exercise of market power by private producers of databases is a major motivation behind concerns about statutory database protection. Later sections in this paper conclude, however, that there is little empirical evidence that the exercise of market power is actually a significant problem, even in so-called niche markets.
Even without market power, private producers of information will typically seek to recoup the upfront costs of their investment by pricing above the incremental cost of distributing that information to users. And such prices, while they may be necessary to secure an adequate return on investment, will deter some potential users of information and restrict information flows. Concern that such pricing strategies might limit access to information by scientists, educators, and libraries has also motivated opposition to statutory database protection.
Luckily, however, there is an incentive for private producers to try to extend the audience for their information products through a pricing strategy called price differentiation-that is, through charging different prices for the same or very similar information to different users or to users seeking different packages of information products. Indeed, price differentiation is a traditional strategy for most kinds of information providers (as it is for producers of other kinds of goods whose production involves relatively high upfront costs and relatively low incremental costs) to attempt to recover their upfront costs by increasing the demand for their output. For example, the movie industry sells movies to theaters, hotels, airlines, video rental stores, and individual consumers at different prices. Similarly, the book industry sells hardback and paperback books at different prices, and the software industry sells its products at both retail and site-licensed prices.28
A similar strategy of price differentiation is also apparent in the database industry. For example, online information services, such as Lexis-Nexis, often charge one price for daytime use and another lower price for nighttime use. Non-governmental producers of databases that are used for scientific research in both commercial and academic settings often charge lower prices for the latter.29
Since database users often differ significantly not only in their ability to pay but also in their needs for accuracy, completeness, timeliness, support service, search engines, and ease of use, database providers often have many opportunities to sell slightly different configurations of data and services at different prices, charging lower prices with fewer services to those with a low willingness or ability to pay and higher prices with more services to those with a higher willingness or ability to pay. Indeed, such pricing strategies may be essential for many database producers to generate enough demand and revenues from sales to cover their upfront production costs.
Such strategies can work, however, only as long as those who purchase a database at a relatively low price cannot copy and sell it to those who would otherwise have paid its original developer a relatively high price. If copying and resale are not controlled by adequate statutory protection, they will undermine the ability of original database developers to use price differentiation as a way to recoup upfront costs. And without recouping costs, database products will not survive in the market, harming both those who would have been willing to pay a relatively high price and those who would have been willing to pay a lower price. In other words, both producers and consumers can benefit from price differentiation in the database industry, but piracy and copying will eliminate these benefits.
4. Concerns About Statutory Protection for Databases
As the preceding discussion indicates, there are strong economic arguments for statutory protection for databases. But concerns also have been voiced about possible negative effects of such protection by some legal scholars and by some in the scientific, educational and library communities. These concerns fall into two broad categories: first, concerns that database producers, especially those of highly specialized goods with limited niche audiences, already have substantial market power, which will be enhanced by statutory protection to the detriment of consumers; and, second, concerns that such protection could reduce access to information by the scientific and educational communities, thereby slowing technological progress and economic growth. The following discussion examines each of these two concerns in greater detail and concludes that they do not provide a sound basis for either blocking or delaying statutory protection for databases. When compared to actual practices in the database industry, both concerns appear to be overblown or based on misunderstanding, and to the extent that these concerns are warranted, they can be addressed by appropriate regulations of statutory protection adopted by the Congress.
4.1 Concerns about Market Power
Economists have long recognized that the credible threat of a new market entrant is a powerful constraint on the ability of firms to exercise their potential market power.
According to economic logic, an industry like the database industry with high upfront and low incremental costs of production has the preconditions for first-mover advantages and market power. This presumption is even stronger for so-called "niche" markets--like many specialized database markets--that may be large enough to justify the upfront costs of developing a specialized product but not large enough to support more than one or two competitors. Such analytical arguments, buttressed by anecdotal evidence, have led some skeptics of statutory protection for databases, such as the National Research Council and Professors Reichman and Samuelson, to conclude that "the market for commercially distributed databases is almost universally characterized by a distinct absence of competition" and that "the private database industry is largely characterized by niche marketers who supply and dominate specific market segments."30 These conclusions are not supported by either economic logic or empirical evidence.31
A large part of the database industry, including many of the most commercially significant databases, operates in an intensely competitive environment. Perhaps the largest single category of databases--measured in terms of volume of information and market size--involves financial data. There are four major players--Dow Jones, Reuters, Bloomberg, and Bridge (formerly Knight Ridder)--and thousands of smaller database providers, all serving the multi-billion-dollar demand for financial and commercial information. Interestingly, Bloomberg provides a good historical example of a firm that initially provided data only about a niche market--"sinking funds"--but has grown into a billion-dollar-a-year firm supplying a wide range of databases and news services.
Even the existence of a small number of firms serving a particular market does not mean that those firms exercise significant market power. First, as mentioned earlier, the database industry is characterized by differentiated products. Competing firms rarely supply the "same" database. Rather they compete on a range of fronts: selection of data; convenience; search engine; ease of use; and price.
Second, concerns about monopoly power in niche markets overlook the potential for substitution across databases that draw on somewhat different data sets. In many areas of research, for example, it is possible--and often wise--to investigate important issues and test general propositions with several alternative data sets. In addition, with the profusion of freely available information (for example, on the Internet) and powerful computers and computing tools, database makers face competition worldwide from competitors and end-users alike.
Third, and most important, in a market economy, firms prosper by supplying what their customers want. Firms that have satisfied customers get repeat business; those that do not tend to fail. Firms cannot "rest on their laurels" for fear that new competitors may come along and take away their business. Consequently, when one sees a niche database market supplied by a single firm, that may be evidence that the firm is doing a good job serving the needs of the market at reasonable prices.
Economists have long recognized that the credible threat of a new market entrant is a powerful constraint on the ability of firms to exercise their potential market power. The relevant public policy question, of course, is whether such a threat exists in a particular marketplace. Scholars who are concerned about the potential for market power abuse in the database industry emphasize the existence of high upfront costs and limited market size, especially for specialized products, as barriers to new market entrants. But over time, the shift toward electronic databases may well reduce some of the upfront costs of entry, as the prices of hardware, software, and communications technologies continue to fall. And at the same time, as these technologies spread broadly through educational, commercial, and family life, the demand for electronic information products is likely to expand, creating larger markets and luring additional entrants into database production. Moreover, because, as already noted, database producers must maintain in-house experts and technical staff and equipment to improve existing databases and keep pace with changing technological requirements, it may prove cost effective for them to dedicate these resources toward support of more than one database (especially if it is in an adjoining field for which their experts and staff already have expertise).
Finally, it may well be less costly for a new entrant to replicate an existing database than it was for its original producer to develop it, in which case the fixed costs of entry for the second firm are lower than they were for the first. There are three reasons why this is likely to be the case. First, most databases contain citations to the underlying sources from which they were developed. The original compiler had to track down these sources from among all of the possible sources where the data might reside. Much of this research may involve "blind alleys" and "false starts." A "second comer" can avoid these pitfalls and go directly to the sources identified in the existing database.
Second, the original developer of a database may have to spend a substantial amount of time and effort trying to track down data from a variety of potential sources, only to find that the information available from scattered sources is insufficient to yield a reliable data series. It may therefore cost the original developer a substantial amount to yield the "negative result" that certain desirable data are simply not available. A "second comer" can learn from the mistakes of the first. In the absence of the first firm's effort, the potential competitor would have to scour the record, determine what is and what is not available, and choose which data series to present. The incumbent firm has already done all that, and the new entrant is able to consider the results of those efforts.
Third, potential competitors can observe which databases are successful and which fail and choose to replicate only the former. In contrast, original developers of databases, like book publishers, movie studios, and record companies, often do not know how much demand there will be for a particular product until they develop and market it. Sometimes a database is more successful than anticipated, sometimes less--and often developers rely on the profits they make from their "hits" to cover the costs of their "flops." New entrants do not face the same degree of market uncertainty--they can take advantage of the "marketing experiment" performed by incumbents without themselves incurring the same risks.
Even if the first entrant does not significantly lower the costs of entry for new competitors in any one of these three ways, the possibility of new entrants exists as long as the underlying information contained in a particular database is available and can be replicated. That is why the proponents of statutory database protection have been careful to maintain that it should apply only to the database, not to the underlying data. It seems reasonable to argue that as long as potential new entrants--or potential users--can go to the same underlying data sources and replicate the information in a particular database, copying substantial parts of the original database should be precluded by statutory protection. How valuable the investment in the original database will be then depends on market demand, and if it proves to be sufficiently attractive, new entrants will be encouraged to enter the market and able to do so because they can turn to the original data.32 Moreover, as long as the underlying data can be replicated, potential buyers of database products can decide to buy the database from its original developer or collect the data themselves. Of course, one might object that this means that subsequent competitors or users will have to "duplicate" the work of the original developer and that such duplication of effort is wasteful. But this argument is mistaken for two reasons. First, it is likely that the second comer will innovate and improve on the first database in the course of replicating it. Second, concerns over unnecessary duplication of effort overlook the fact that second comers or users always have another choice--they can negotiate with the original developer for the right to use his product.33
If a mutually acceptable agreement is feasible, the parties will have an incentive to reach it so as to save the costs of replicating the data. For the second comer, the incentive is to save the costs of the duplicative effort. And the threat by the second comer to make that effort should the original developer fail to agree to terms will prevent him from charging an exorbitant price. Conversely, the ability of the original developer to compel the second comer to make good on that threat will in turn enable the original developer to extract a payment from the second comer that will help recoup part of the original investment necessary to develop the database in the first place. In short, the parties have a strong incentive to negotiate a more efficient solution if there are in fact mutually acceptable gains from trade.34
4.2 Competition and Data Replicability
In most cases a potential entrant can get data from the same sources as the original firm, in which case there is no public policy need to allow the new entrant to free ride on the original firm's investment.
As the preceding discussion indicates, the possibility of replicating the underlying data in a database is a key factor affecting the potential for market entry of new competitors. This observation naturally leads to the question: which kinds of data are replicable and which are not?
In fact, the underlying data in most databases is replicable. The clearest examples involve data collected by classic "sweat-of-the-brow" efforts from publicly available sources. A large number of databases are of this sort, including databases of historical information, many scientific databases, and databases that rely on government information such as census data, economic statistics, weather reports and readings, and indexes of government decisions and government actions such as court decisions. The fact that many commercially significant databases build on government information means that the government itself can encourage competitive database markets by making the data it generates publicly available on a non-exclusive basis. And in those cases in which the government decides to contract its data collection, maintenance, storage, verification, or dissemination activities to private firms, it should also require as part of the contract's conditions that the data be made publicly available on a non-exclusive basis to all users. In this case, the government's fee to its private contractor would cover costs plus whatever profit is negotiated as part of the contract, and the resulting data would then be made available to users in its raw form at incremental cost.35 Of course, this leaves open the possibility that the contractor could develop his own database from the underlying data and charge a different and higher price for this product if the market values its additional services or features.
In fact, even when the government makes its data publicly available to a database developer and to end-users at the same price, individuals often choose to acquire their data from the private sector because of the value-added services such firms provide. Many businesses prefer to get government statistics from firms like DRI-McGraw-Hill rather than directly from the government because the private providers update and verify the data on a continuous basis and provide it in a readily usable electronic format. In California, for example, the state government has been putting much state-generated information on-line at no charge, competing with many private information suppliers that also provide government information to end-users and charge a fee for doing so. The increasing competition from the state government has caused private firms to introduce new value-added services to keep their customers satisfied. In the words of the leadership of one private firm, "We want to differentiate our product from the public-domain product, so people can make a choice of whether to use a free service or a paid service. The data is the data. We believe the difference is the accuracy, timeliness, ease of use and search, and other feature capabilities we can provide."36
What about statutory protection for databases that build on data that is not publicly available and hence cannot be easily replicated? Here several different situations should be distinguished. Sometimes a database relies on privately generated data that cannot be precisely replicated but for which comparable data would be available at comparable cost to competitors willing to make the necessary investment.
For example, during the 1980s supermarkets installed checkout devices that scanned bar-code labels on grocery items, and the A.C. Nielsen Company began to collect this data. Nielsen contracts with various supermarkets to supply it with scanner data. It then sells its ScanTrack reports about pricing, sales volume, and market share based on that data to grocery manufacturers. As we understand it, Nielsen typically pays the participating supermarkets a fee and provides them with copies of the reports in exchange for providing the data. Our understanding is that these contracts between Nielsen and the selected supermarkets are exclusive: the stores agree to supply scanner data only to Nielsen. But other firms could and have entered into similar contracts with other stores to collect similar data. We know of at least one firm, IRI, which also provides scanner-based market data.
Of course, the data collected at other stores will be similar, but not identical. Nielsen might report that a particular detergent had a 17.4% market share in March in Nielsen-scanner-equipped stores; IRI might discover that the same detergent's market share in IRI-scanner-equipped stores during the same month was 18.7%.
That is, it may not be possible for Nielsen's competitors to precisely "replicate" Nielsen's data. But competitors can get "just as good" data from their own efforts. Consequently, there is no economic justification for allowing competitors to copy Nielsen's data merely on the grounds that it is not possible to "replicate" the data exactly.
Our next category involves privately generated data that cannot be replicated because it relies on proprietary information developed competitively. For example, suppose a pharmaceutical manufacturer develops a new compound, and runs scientific tests to determine its efficacy in treating a particular disease. Or suppose an electronics R&D house develops a new semiconductor manufacturing process and tests that process to determine product yields. These tests clearly generate databases of scientifically- and technologically-valuable information. And it would be difficult for competitors, without access to the compound or the new process, to replicate that data.
As a pragmatic matter, it is unlikely that these databases would be commercially sold; information of this sort is commonly closely-held as a trade secret. But if such a database were sold, there would be no economic rationale for allowing others to copy and sell it in competition with the original compiler, even though as a practical matter it would not be possible for the competitor to "replicate" the data contained in the database.
Another category involves historical data that cannot currently be replicated, but could have been at the time it was collected. Suppose that Smith compiled a city directory of Boston in 1985, by pounding the pavement. Today, in 1997, it is difficult if not impossible for Jones to try to replicate that database. People have moved, businesses have shut down, memories have faded; the information is no longer available. Does this justify Jones in copying the information in Smith's directory without Smith's permission (assuming that Smith's database is still protected)? We do not think so. (Of course, once the statutory protection on Smith's 1985 database expires, say in 2010 if database protection lasts for 25 years, Jones may reprint the database at no charge.)
This example shows that we must be careful in saying that database protection is acceptable on the grounds that "competitors can always choose to replicate the work of the original database compiler if they want to compete." In our view, the actual justification is a bit more complex. We protect Smith's database against Jones' copying, not because Jones can (currently) replicate it--in our city directory example, Jones cannot replicate in 1997 the data collected in 1985 by Smith--but rather because denying Smith protection diminishes Smith's incentive to develop the database in the first place.
If adequate protection is not provided for databases whose contents are not contemporaneously replicable, there will be little incentive to try to record the sorts of data that be measured again--precisely the sorts of data that that cannot be measured again--precisely the sorts of data that should be collected while it is still possible to do so.
Our last category involves privately generated data that others cannot replicate at comparable cost because of preferential governmentally established monopoly access to a different but related market. The main example we have in mind here is the telephone listings that were at issue in Feist.
Phone numbers are arbitrarily assigned identifying data, privately generated with no additional effort (the phone company assigns a number as a provision of service) in the course of operating a local-monopoly business. The government compels the phone company to print phone directories so that this information, though privately generated, is publicly available.
According to our logic, the factual situations of the Feist case are in reality much closer to the kinds of concerns addressed in the antitrust law under the rubric of so-called "essential facilities" than they are to the kinds of concerns raised by a typical "database piracy" case. For the vast majority of databases, there is no credible essential facility claim. Either a potential entrant can get the data from the same sources as the original firm, in which case there is no public policy need to allow the new entrant to free ride on the original firm's investment, or the original firm generated the data itself in a non-monopoly context and others are equally free to generate their own data. When data is generated by a government-created monopolist, it is not appropriate to allow the monopolist to control database products building on that data. Such cases can be avoided by a policy that these sorts of data, collected by a government-created or government-sanctioned monopolist in order to provide its service, should be made available to other users.
5. Concerns about Preferential Access
The question of whether particular categories of users should get "preferential access" to information contained in privately produced databases is not the same question as whether the producers of these databases should be afforded adequate statutory protection.
Many skeptics of statutory protection for databases are primarily motivated by their concern that it will restrict access of such preferred users as scientists and academics to essential data. Numerous exceptions and limitations to property rights in the copyright law encourage the use of protected property by such users for socially valued functions such as teaching, research and library activities. In addition, US copyright law contains a general "fair use" exception for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Neither the EC Directive nor HR 3531 contained any language indicating similar exceptions in statutory protection for databases, 37and this caused concern, especially in the scientific community, that such protection could harm important activities. A recent report by the National Research Council38 articulates these concerns. Although we are sympathetic to the needs of the scientific and educational communities--we either are or have been members of these communities for most of our adult lives--we believe that effective statutory protection for databases can be drafted to respect these needs. Indeed, we believe that such protection will be beneficial to these communities because it will provide the market incentives necessary to maintain healthy private investment in databases over time.
The question of whether particular categories of users should get "preferential access" to an economic good or service is not the same question as whether ownership rights over that product should be adequately protected. In addition, even if as a matter of public policy, certain kinds of users should receive preferential access, there remains the public policy issue of how that access is best achieved. A subsidy financing the purchase of a necessary good or service by preferred categories of users is the most direct form for realizing this objective. Allowing them to take the amount they want of a good or service from those who supply it without paying for it is only one particular form of subsidy--a subsidy in kind that is financed in essence by the suppliers and the other paying customers they serve. Usually such an in-kind subsidy proves considerably less efficient than a direct subsidy from the government. This conclusion can be illustrated by an example of preferred access drawn from a non-database setting.
The Women-Infants-Children (WIC) program is financed by the federal government and administered by the USDA. It provides mothers and their infant children with coupons that can be redeemed at participating retailers for milk, infant formula, and other specific staple food products like peanut butter and orange juice. The idea behind the WIC program is that early childhood malnutrition is clearly detrimental to both children and society and that one effective way to combat this problem is to subsidize the purchase of nutritional foods by those too poor to afford them. Several independent studies have concluded that the WIC program is both successful and cost-effective, and it enjoys bipartisan Congressional support.
But the fact that the WIC program has a sound public policy purpose does not mean that the best way to finance adequate nutrition for poor children is to allow their mothers to take milk and other food products from grocery store or other suppliers. Such an in-kind subsidy provided by these suppliers would tend to reduce their incentive to supply and increase the prices they charge to their paying, non-preferred customers. In addition, such an approach would tend to encourage waste or excessive use on the part of the subsidized population. In contrast, the WIC program provides direct subsidies that both regulate the level of usage by the preferred population and encourage producers to supply more, not less, of their output to the benefit of all users.
Similar insights apply when the "preferred class" of users consists of scientists, students, and academics rather than poor women and their children, and when the products in question are things like scientific equipment and databases. Consider the case of scanning electron microscopes. They are niche products with few suppliers, and virtually all of their customers are either academic or commercial scientists. Many scientists who would like to use electron microscopes for their research and educational work are unable to afford them, and government grants may provide financing--that is, monetary subsidies--to purchase such equipment. As far as we know, there have been no public policy proposals for an in-kind subsidy scheme whereby such preferred users could simply take equipment from their suppliers at zero price. Nor would such proposals make good economic sense.
Some may argue in response that information is unlike milk or microscopes in that it is non-rival in use--that is, the use of information by some non-paying users will not reduce the supply of information available to paying users. But this argument overlooks the fact that in the absence of adequate property protection and rates of return, the supply of information produced by private companies will be reduced over time as investment levels fall. In short, while information may be non-rival in use at a point of time when that information has already been developed, it is not non-rival in use over time--those who use existing information at zero price can adversely affect the growth of information over time to both their own detriment and to the detriment of paying users.
We have no doubt that the question of access to data by the scientific and educational communities is a vitally important one for society as a whole. We also recognize that inadequate funding can deprive such preferred users of the data they need to do their work, thereby depriving society of the benefits of technological progress and educational attainment. Another related issue is access to information for such purposes as news reporting, commentary and criticism. But we do not believe that in-kind subsidies taken from database providers by such users in the form of unauthorized copying for unauthorized purposes are the appropriate remedy for addressing these valid concerns. Indeed, we believe quite the contrary--in order to provide conditions for a healthy and competitive database industry that will serve the needs of all users, adequate statutory protection is required. And as our earlier discussion indicates, the natural interest of database providers in broadening their audience promotes the practice of price differentiation whereby those who are able to pay more sometimes implicitly subsidize those who are not.
The need for adequate funding for science and education--to support the use of equipment, publications, software and data--is an important public policy challenge. Some of these funds come from government, some from private contributions, and some from the private sector in the form of company-sponsored support for research both within and outside of universities. But this challenge is logically distinct from the challenge of providing an appropriate environment for protecting the rights of private database producers.
This challenge is also logically distinct from the question of whether the government is putting adequate resources into the development and dissemination of those kinds of information of which it may well be the low-cost producer--information like statistics, weather data, space exploration data, and court opinions. Many of the concerns the scientific and educational communities have expressed in the debate about statutory database protection are in fact concerns about whether government spending on such information will be maintained at the necessary levels or whether the government will "privatize" these activities in ways that will increase their cost to data users.
6. Some Legal Concerns about Appropriate Statutory Protection for Databases
So far the discussion has focused on the economic logic for statutory protection for databases. We now turn to some questions about how best to provide such protection: How long should protection last? What mode should it take? What should be the scope of protection?
6.1. Duration of Protection
The shorter the period of protection, the greater the incentive of producers to set high prices to try to recoup their investment during the allotted time.
Under current US law, patents last 20 years from the date of filing; copyright protection lasts for the life of the author plus 50 years (or for 75 years from publication for works created by entities). The EC Database Directive protects databases for 15 years. HR 3531 proposed a 25-year term of protection.
From an economic perspective, it is often difficult to determine how long protection should last. Ideally, one might want the length of protection (and/or the scope of protection) to vary from database to database, but such a system would be totally impracticable. But setting a single term for database protection, applicable across all databases, presents a complicated task.
Finally, in thinking about the appropriate duration for database protection it is important to recognize that the shorter the period of protection, the greater the incentive of producers to try to set high prices to recoup their investment during the allotted time. Public policy makers should consider the likely relationship between duration of protection and firm pricing strategies in drafting statutory protection for databases.
6.2. Updating and protection
Database providers spend hundreds of millions of dollars a year updating their existing databases, and the newly-updated databases also need protection.
Up-to-date databases are clearly valuable--often extremely valuable. Stock traders need up-to-the-minute (even up-to-the-second) information about stock prices. Indeed, this allows providers of stock price information to "price differentiate." They charge a much higher price to those who need real-time stock price data than they charge to those who are willing to have the information delayed fifteen minutes. (Those willing to wait until the next day can buy it in the newspaper for a quarter.) But even in less time-sensitive fields, up-to-date information is almost invariably preferred to less recent and less complete information.
Consequently, database providers spend hundreds of millions of dollars a year updating their existing databases to add up-to-date information. The newly-updated databases also need protection.
Some commentators have expressed concern that various proposals for statutory protection for databases will enable database providers to obtain "perpetual" protection for their databases merely by updating the information contained in the database on a regular basis.
This argument is no more significant when applied to databases than it is when applied to updated copyrighted material (such as new editions of books or reference works). Many encyclopedias or other reference works are updated regularly. Each new edition is copyrighted, and the copyrights on old editions expire overtime. This does not provide "perpetual" protection. A good example is the CRC Press' Handbook of Chemistry and Physics, known to all science students, which has been published since 1914. Each new edition is copyrighted, and the copyrights on old editions expire over time.
We believe that a similar argument applies to proposals that would allow database providers to obtain protection for both an original database product and updated versions. Assume that databases are given 25 years of protection (as was proposed in HR 3531). The 1997 edition of the database would become available for copying by competitors in 2022. If in 2022 there were a market for the 1997 edition, a potential competitor could make a copy of it and compete with its original developer.
But what happens if an updated version of a database contains a substantial amount of information that was available in an earlier edition as well as a substantial amount of new content? Would the statutory protection accorded to the updated version extend to all of its contents--old and new alike--or merely to its "new content," as the copyright law does for printed matter? Since the point of statutory protection is to protect investment in the creation, verification, maintenance and dissemination of information, such protection when extended to updated products, in principle should apply only to new content and to the additional investment required to verify, maintain and disseminate old content included in new editions. Once the protection on a database runs its course, users should be allowed to copy its contents.
The problem with this approach, however, is that some or many of the old entries contained in an old database may have been re-verified or reorganized and possibly revised as the result of additional investment to produce the new edition. And there is likely to be no practicable way for the user to distinguish between old information in the new edition that has gone through such a process and old information that has simply been copied by the producer into the new version of his product. One way--and perhaps the simplest way--to handle this complication would be to apply statutory protection to all of the elements of a new version of a database. In other words, users would be prevented from copying "old information" as well as "new content" from a new version of a database, although they would be allowed to copy much of "the same" old information from a version whose protection had expired.
This approach would of course require that once protection for an old version of a database ended, users could still get copies of it. In the case of books and copyrights, the availability of old editions is fostered by the fact that copies of them are lodged with the Library of Congress and the Copyright Office. In the case of databases, statutory protection could require database producers to make archival copies of their databases on a regular basis (perhaps annually) and store them with the Library of Congress or some other agency. The archive copies would then be available after the statutory protection period had expired.
6.3 Use vs. Replication: On Sufficient Statistics and Scientific Research
The claim that increased protection will impede the ability to use databases for scientific and academic research rests on a misunderstanding.
The statutory proposals for database protection typically allow users to extract and copy small portions of a database, but seek to protect database developers against the copying of "all or substantial portions of" a database. Some have suggested that this would improperly limit the ability of legitimate users to make use of the database. We do not agree.
Users of a database often make use of "all or substantial portions of" of the information it contains in the course of large-scale data analyses, especially statistical analyses. It is typically--indeed, one would be tempted to say, nearly universally--impossible to determine what was in all or substantial portions of the entire database, starting only from the sorts of analyses and results that users in fact make of databases. To illustrate this point, we will indulge in a brief technical discourse.
To take a simple example, suppose that the database in question consists of the daily noontime temperature reading at City Hall in San Francisco for 90 consecutive days in the summer of 1988. Users want these 90 data points, not for their own sake but because they can be summarized, evaluated, compared with other areas, etc. For example, one might want to compare the summertime temperatures in San Francisco and Portland. For this comparison, one might want to summarize the 90 San Francisco data points into a single number that preserves some (but not all) of the data, but in a more compact form, such as the (arithmetic) average noontime temperature. For practical purposes, this single number may be sufficient for the comparison at issue. (Obviously, for other purposes it would not.)
But the important thing here is that we have "compressed" the data (from 90 points down to a single "summary" figure) in a way that cannot be reversed. One cannot go from the fact that "the average noontime temperature in San Francisco was 73 degrees" to a complete listing of 90 separate daily temperatures. To take a stark example, the average of the three numbers (3, 5, 7) is 5, but so is the average of (0, 5, 10) or of (2, 6, 7) or of (5, 5, 5). Being told the average does not enable one to recreate the underlying data.
The key point here is that scientists and other researchers typically can use "all or substantial portions of" the data in a database for their scholarly research, formulate and test hypotheses, do statistical analyses, write up their conclusions, and report their results in academic papers and journal articles.
At most, they may need to make and temporarily store (e.g., in RAM) an electronic "working copy" of the relevant data in a format useable by their statistics software package (or other analytic tool). Raw data--especially large amounts of raw data--may be the lifeblood of scientific research, but one key goal of science is to search the raw data for patterns and explanations. Raw data is a tool, not a goal in itself; it needs to be used with other tools (such as search engines and statistics packages) in order to yield useful results. Absent the ability to manipulate and analyze the data, scientists would have little use for it in the first place. Under copyright law, it is not clear whether the user of a database may make such temporary "working copies" without permission of the copyright holder. However, new legislation protecting databases can disregard the technicalities of copying and can address the economically significant element--the right to use the database and thereby benefit from the database producer's investment and labor. Thus, database protection legislation can make it clear that such temporary "working copies" are lawful so long as they are not used by unauthorized persons or in an unauthorized manner.
Consequently, in our view, one objection voiced by several commentators to increased database protection--namely, the claim that increased protection will impede the ability to use databases for scientific and academic research--rests on a misunderstanding of the difference between use and improper copying.
6.4. On Different Forms of Legal Protection For Databases
Under current law, the main legal paradigm used to protect a database developer's interest in a database is a property-rule regime, as reflected in the common term "intellectual property" to refer to patent and copyright law. But there are also elements of a tort-based liability rule regime, notably in the law of trade secrets. One can use hybrid systems combining elements of tort and property.
There are a number of different legal paradigms that might be used to protect a database developer's interest in a database. Under current law, the main paradigm is a property-rule regime, as reflected in the common term "intellectual property" to refer to patent and copyright law. But there are also elements of a tort-based liability rule regime, notably in the law of trade secrets. One can use hybrid systems combining elements of tort and property.
There are really two distinct but interrelated questions here. The first is: with respect to any particular proposed use by a particular type of user of a particular set of information in a particular database collected and/or generated by a particular database supplier, what are the respective rights of the parties? In their pioneering 1972 study of alternative forms of legal protection, Calabresi and Melamed referred to this as the question of how to define the entitlements of the various parties.39 The answer may clearly vary from database to database, from one type of use to another, and from one type of user to another (e.g., end-user vs. competitor).
Once these entitlements have been identified, the second question then turns to: how are those entitlements protected? With respect to each particular proposed use of a particular database, there are five general alternatives:
1.The use may be permitted. Others-whether generally, or limited to certain types of individuals or firms-may be free to use the database in this particular fashion without the permission of the original database developer-indeed, often despite the implied or express opposition of the developer.
2.The use may be subject to a compulsory license. Others (again, some or all) may be allowed to use the database in this fashion, but they must pay a pre-established fee to do so. This has two main alternatives:
a.the fee may be set by governmental regulation of some sort; or
b.the database developer may be required to set and post a fee, which it may set (with or without some governmental oversight as to the size of the fee); but once set, the same fee must be charged on a non-discriminatory basis to all comers (or, at least, all comers within the appropriate category).
Presumably, in this context, there must be some sort of penalty attached to using the database use without paying the requisite fee. Otherwise, the infringer is in a "heads-I-win, tails-I-break-even" situation: since detecting infringement is not certain, the infringer may avoid paying the fee entirely if not caught, so that, absent a penalty, it would always be cheaper only to pay once caught.
3.The use may be subject to damages under a liability rule: if others (again, some or all) use the database in an unauthorized fashion, they must pay damages to the database developer, with the level of damages determined on a case-by-case basis.
4.The use may be protected by a property rule: others (again, some or all) may not use the database in this fashion without the (pre-negotiated) permission of the database developer, and can be enjoined from doing so. Under this approach, prospective users must negotiate with the database developer for the right to use the database in a particular fashion, and thus must meet the developer's price or forego the proposed use. Under this approach, the developer would be free to set a high price if it chose to do so.
5.Finally, the use may be protected by criminal sanctions. Unauthorized use may result in fines or other penalties (including confiscation and destruction of infringing material, and possibly imprisonment).
These alternatives emphasize that different forms of protection can be used for different entitlements.
6.5. On The Choice Between Liability Rules And Other Forms Of Protection
The fundamental policy choice between a property rights approach and liability rules turns on whether negotiations among the parties are possible. When negotiations are not possible, liability rules have been developed to recompense owners for past infringement of their rights through court-awarded damages. When negotiations are possible, a potential user of property can negotiate with its owner about the terms and conditions for its future use.
As a matter of practical reality, one cannot undo the past. Earlier violations of entitlements cannot be undone; at most, the legal system can award damages for past infringement. And those damages have to be set by a court; one cannot rely exclusively on after-the-fact assertions by the database developer, since it has an incentive to claim that it would not have sold at any price, in an effort to "prove" that damages for past infringement should be extremely high.
That is, looking backward at past infringement, the only realistic approach involves liability for damages.40 But this does not mean that databases should therefore be protected by "liability rules."
The fundamental policy choice between property and liability rules turns on whether negotiations among the parties before infringement occurs are possible. In some contexts, like auto accidents, it is infeasible for people to negotiate beforehand. As Calabresi and Melamed put it,
If we were to give victims a property entitlement not to be accidentally injured we would have to require all who engage in activities that might injure individuals to negotiate with them before an accident, and to buy the right to knock of an arm or a leg. Such pre-accident negotiations would be extremely expensive, often prohibitively so. To require them would thus preclude many activities that might, in fact, be worth having.41
Database protection is different from accidents. Those contemplating copying or using someone else's database almost invariably know who developed the original database, and are perfectly free to negotiate for the right to copy or use it.
Those arms'-length negotiations will establish a market price for those rights. We believe it is both unwise and inefficient to substitute court-established damages figures for market-established prices. It would establish the equivalent of a "private right of eminent domain," allowing others to take what they wish from a database without the owner's permission, subject only to paying court-awarded damages. The administrative burden imposed on the courts from trying to substitute court-awarded damages for privately-negotiated prices would be severe.
6.6. Property Rights in Databases are Not "Exclusive" Rights to the Data
Giving a database developer a "property" right in a database does not "exclude" others from replicating the underlying data from original sources and using or selling their own version of a database product based on that information.
Some critics have express concern that the E.C. Directive and the 1996 Bill would give an "exclusive property right." We believe this concern to be misplaced, as it appears to rest on what we believe to be a misunderstanding of the "exclusivity" granted by property rights systems.
There is no question but that a property right gives the owner the ability to exclude others from using her property without her consent. But the extent to that this "excludes" others depends on their ability to replicate that property.
Giving a database developer a "property" right in a database does not "exclude" others from replicating the underlying data from original sources and using or selling their own version of a database product based on that information. That is, generally there are no barriers to entry into providing competing databases.
It is instructive to compare databases and patents on this point. A grant of a patent gives the patent holder the right to exclude others from making, using or selling the patented product or process. This right extends even to others who have never been exposed to the patent. Thus if Jones gets a patent, he can prevent Smith from using the patented technology, even if Smith developed the same technology independently and without knowledge of Jones' patent or Jones' research.
In this regard, patent protection is significantly unlike the proposals for database protection. If Smith has never been exposed to the contents of Jones' database, Smith is perfectly free to develop a similar if not identical database. Even if Smith has seen Jones' database, Smith is free to develop a competing database, so long as Smith does not directly copy from Jones' database.42
Hence giving Jones a "property right" in a database is significantly different from giving Jones a "property right" in a patent. While both are termed "property" rights, the ability to preclude others from independent development of the same product are significantly different.
6.7. "Value-Added" Products
There is a well-established system by which those who seek to provide "value-added" products negotiate with the original authors and publishers for the right to do so. The fees to be paid to the original authors and publishers are set by negotiation.
Some have criticized existing proposals for database protection because they assert that database developers would have an incentive to deny others the right to produce "value-added" products of various sorts. Again, we believe that this rests on a misapprehension of the role of private negotiations in facilitating the development of value-added products.
Economists differentiate between substitutes and complements. Many "value added products" are complements to the original database. The customer may find the database easier to use, or more valuable, because she also has the complementary product. For example, an index to a database, or a better search engine, or a manual or instruction book explaining how to use it more effectively, enhances the value of the database itself. In a literal sense, they "add value" to the original database.
Complementary products increase demand for the original database, hence the database developer has incentive to encourage the development and marketing of complementary products. There is no reason to think that granting a database developer statutory protection for its databases will discourage the development of complementary "value added products." In any case, many such complementary products do not even require licensing negotiation because they do not incorporate protected parts of a database.
At the other extreme, some "value added products" are economic substitutes for the original database. This is especially likely to be the case if the substitute product contains a substantial amount of the contents of the original database. Consumers may or may not prefer the substitute product over the original database. Adequate statutory protection will allow the original developer to negotiate a license fee from a competitor for the use of the original database to develop a value-added product.
In the copyright sphere, there are many examples of "value-added" products. For example, the authors and publishers of a popular book may be approached by others seeking paperback publication rights, or foreign translation rights, or the rights to adapt the book into a movie. Popular children's films and cartoon shows generate proposals for all sorts of "tie-in" merchandise, from T-shirts to lunchboxes to toys. All of these can be considered "value-added" products.
There is a well-established system by which those who seek to provide such "value-added" products negotiate with the original authors and publishers for the right to do so. The fees to be paid to the original authors and publishers are set by negotiation.
Imagine, instead, that we used a liability rule under copyright law to determine how much the author or publisher of a book could charge for (say) foreign translation rights, or paperback publication rights, or the rights to develop the book into a movie or a TV series. Under such a system, translators or paperback publishers or TV and movie producers and studios would negotiate with authors and publishers, but if they were dissatisfied with the price they were able to negotiate, they could take the matter to a court to determine what a "reasonable" price should be for those rights. Given the thousands of books published in paperback each year, and the hundreds of books that might be developed into movies or TV series, litigation to set a "reasonable" price for these rights could swamp the court system.
In essence, using a liability system to protect databases would, again, amount to granting to second comers a private "right of eminent domain," which would allow them to make use of databases developed by others at a court-determined rate. The law, rightly in our view, refuses to allow private parties to exercise a right of eminent domain, either with respect to real property or with respect to intangible property. We see no reason to adopt such a system for databases.
6.8. Compulsory Licensing
A flat compulsory licensing fee would allow competitors to "skim the cream" by just copying the most successful databases. "Compulsory licensing" proposals are a particular form of price control, this time over the "price" of access to the contents of a database.
In a similar vein, some critics of statutory database protection have urged that database providers be required to license their databases to "second comers" and those who wish to supply "value-added" products. For example, Professors Reichman and Samuelson have proposed that others should be free to use all or part of a database on "payment of reasonable compensation according to a menu of user options vetted by the industry with user and government inputs."43
As noted above, compulsory licensing is similar to liability rules, in that both substitute prices determined by the courts or by the government for prices determined by voluntary negotiations. The difference is that a compulsory license fee is determined ex ante, before the taking, and at a general level, while liability-based damages rules are determined ex post, after the taking has occurred, and on an individuated basis.
We believe that such a proposal is impracticable. Implementing any such system, and making sure that the "reasonable compensation" and the "menu of user options" keep up with technological and market changes, is likely to be a daunting task. We find it implausible in the extreme that "the industry" would be able to agree on what constitutes "reasonable compensation." Firms that develop their own databases obviously have a very different perception of what is "reasonable" than firms that merely clone databases developed by others. But both are part of the "industry" as it exists today.
Any system of compulsory licensing rates that allows new entrants to "pick and choose" elements they want to incorporate is likely to lead to the classic "cream skimming" problem. Some databases are clearly more economically valuable than others. How would one implement a "menu" that recognized this disparity? As we noted above, developing and marketing databases is a risky enterprise; database publishers rely on the profits they make from the "hits" to cover the cost of the "flops." A flat fee would allow competitors to "skim the cream" by just copying the most successful databases. This would undermine the database publishers' ability to develop all kinds of databases, including those specialized scientific and technical databases for which there is relatively low demand and which are of particular concern to Professors Reichman and Samuelson.
Economists have for years objected to the distortionary effects of price controls of all sorts. In our view, "compulsory licensing" proposals are at heart just a particular form of price control, this time over the "price" of access to the contents of a database developed by another. We do not believe that the government is well-placed to set such prices, for the same reasons that price controls are generally inefficient.
7. Conclusions
As the preceding discussion has demonstrated, there are strong economic reasons for providing adequate statutory protection for the database industry. Such protection would serve the interests not only of database producers, but also of database users, including users in the scientific, educational and library communities. The creation, storage, verification, maintenance, updating and dissemination of information serve valuable economic and public policy functions-and they are not "free" whether they are performed by the government or private companies. Indeed, such activities often involve substantial upfront costs and considerable risk, since it may be impossible to predict their actual value until the resulting information products are available for use. These costs and risks may be especially daunting for the development of highly specialized databases that are likely to have limited applications in the commercial arena, at least in the short run, and that therefore may have to rely initially on demand from a limited number of scientific and academic users with limited ability to pay.
Without effective statutory protection, private firms will be deterred from investing in database production. The resulting shortfall in the provision of information will have adverse effects on the pact of technological progress, on the economy's growth potential, and on the very research and educational communities whom critics of statutory protection wish to help.
------------------------------------------------------------------------
1 - Research for this project was funded by a contract between the Law and Economics Consulting Group, Inc., of Emeryville, CA, and Reed-Elsevier, Inc. of Newton, MA, and The Thomson Corporation of Stamford, CT, major international publishing companies. The views expressed in this article are those of the authors.
2 - Dr. Laura D'Andrea Tyson is the Class of 1939 Professor of Economics and Business Administration at the University of California at Berkeley. Dr. Edward Sherry is an attorney and Senior Economist with the Law and Economics Consulting Group, Inc. Research assistance and support on this project was provided by Alan Marco.
3 - 499 U.S. 340 (1991).
4 - It is important to emphasize that, for databases, idiosyncratic or creative selection or organization of data may be undesirable. For many business and medical users, for example, the most valuable databases are those that contain comprehensive, current information that is logically organized so as to be easy to navigate.
5 - J.H. Reichman and Pamela Samuelson, "Intellectual Property Rights in Data?," Vanderbilt Law Review, Vol. 50, No. 1, January 1997.
6 - Paul Goldstein, Copyright's Highway (1994), p. 211.
7 - The categories in Table 1 give estimates from $4.5 billion to $200 billion in sales. Given that A. C. Nielsen accounts for $1.3 billion on its own, it is clear that the database industry is a multi-billion dollar industry. Additionally, within the "Business Information" market, the largest segment was "Database Publishing," category 7 in Table 1, which had revenues of $13.8 billion dollars in 1994. It is obvious that databases in all markets must amount to more than this figure, though it is difficult to know what is included in any given category without a detailed analysis of the raw data.
8 - Information Today, January, 1996.
9 - Sloan Management Review, March 1996.
10 - US Industrial Outlook 1994, p. 25-1.
11 - Sloan Management Review, March 1996.
12 - Ward's Business Directory, 1997.
13 - Information Today, January, 1996.
14 - Electronic Information Report, 1/5/96.
15 - US Industrial Outlook, 1994, p. 25-1.
16 - Interactive Daily, 2/5/96.
17 - Ward's Business Directory, 1997.
18 - Ward's Business Directory, 1997.
19 - Some observers argue that rapid growth of the database industry shows that statutory protection is either unnecessary or can be delayed without significant cost. But this argument fails to consider how fast the industry might have grown with statutory protection or the costs of various self-help technological means the industry has used to protect itself. It also overlooks the fact that growth in the industry has been partly based on the expectation that policy makers will act to provide such protection in the future. Furthermore, this argument ignores the potential impact of the EC Directive on future growth of the database industry in the United States.
20 - Because the cost of setting up and operating an online service is significant, it is to be expected that owners of on-line services will leverage that investment by carrying multiple databases, that the owner of a few databases would distribute them through another's on-line service and hence that the number of databases on line would grow faster than the number of on-line services.
21 - It is also feasible to transfer a paper database into electronic format either by keypunching the data or by using optical character recognition software to scan a paper database and transfer it into electronic form. However, the former method is often very time consuming, while the latter technology is not perfected. It is likely to produce a significant number of errors in any lengthy work. But digital copying of electronic databases can be "perfect" and very low cost. A key feature of digital technology is that a copy is indistinguishable from the original, so that an Nth-generation copy is "just as good" as a first-generation copy. This is not the case for traditional analog photocopying, in which quality degrades relatively quickly after two or three generations.
22 - The distinction between fixed and sunk costs can be seen by an example. Suppose that one wants to enter the sidewalk hot-dog vending business. To do so, one needs a business license and a pushcart. The costs of these are "fixed" in the sense that they do not vary with sales. ("Variable" costs, for example, for supplies, are of course subject to sales.) If one chooses to exit the business, it may be possible to sell the cart and thus recoup part of the fixed costs. The "sunk" costs are the part of the fixed costs that cannot be recouped--in this case, the cost of the license plus the difference between what it cost to buy the cart and the price for which it can be sold.
23 - The classic reference is Kenneth Arrow, "Economic Welfare and the Allocation of Resources for Invention," in Universities-National Bureau of Economic Research Conference Series, The Rate and Direction of Economic Activity: Economic and Social Factors (1962).
24 - Professor Kenneth Arrow also pointed out another problem with information. A potential buyer of information does not know what the information is, and therefore does not know what the information is worth to him, until he sees its contents. But once the information is revealed, the buyer has no incentive to pay for the information he has already seen, or (at the least) has an incentive to underreport the value he places on the information. Arrow made another point. Those who develop information tend not to be able to appropriate the full social value of that information. Consequently, there is (in general) a tendency to under-invest in the production and dissemination of information (relative to the level that would be first-best from a societal standpoint.)
25 - Note that we use the expression "try to recoup" because investment in information development is a gamble like other investment. There is no guarantee that demand will prove sufficient to enable the developer to earn a profit on his investment.
26 - Concerns about possible government censorship or control over information also bear on the question of whether the government or the private sector should provide certain kinds of information.
27 - Carl Shaprio and Hal R. Varian, "US Government Information Policy," draft prepared for presentation at Highlands Forum, Department of Defense, Washington, DC, June 8, 1997.
28 - Shapiro and Varian, ibid.
29 - As an illustration, one successful database provider specializing in medical information offers many of its products to academic users at prices that are only about 1% to 2% of the prices paid by commercial users. The academic price is based on the approximate cost of its distribution to academic users (which is similar to its marginal cost of dissemination).
30 - Recihman and Samuelson, 1997, p. 70.
31 - The anecdotal and survey evaluations of market power cited by the National Research Council are based on a count of how many firms compete to supply each database in specific database markets, without accounting for the different size of these markets in terms of revenues or sales. This procedure gives disproportionate weight to small database markets and is therefore biased toward showing a lower level of competition than that which characterizes the industry as a whole.
32 - In rare instances, data may not be easily replicable. We discuss these situations in section 4.2.
33 - We must be careful in saying that database protection is acceptable on the grounds that "competitors can always choose to replicate the work of the original database compiler if they want to compete." We protect a database against copying not because it can be replicated, but rather because denying protection diminishes incentive to develop the database in the first place.
34 - As an illustration, database producers may negotiate with potential competitors who are interested in licensing a database and incorporating it in a competitive product. The database producer will try to negotiate a price that reflects his assessment of the value of the resulting competition product in the marketplace and the likely decrease in revenue from the original product. In the absence of a successful contract negotiation, the potential competitor still has the opportunity to create his new product by replicating the data he needs from its original sources.
35 - The NRC has pointed to the US government's privatization of the provision of Landsat data as an example of the problems associated with granting statutory protection to databases. The NRC correctly points out that, following privatization, prices for Landsat images increased more than ten-fold, substantially impacting scientific users. But in our view these problems resulted from the manner in which the privatization contract was written and interpreted, not from the privatization effort itself.
36 - Francis Bremson, Director of Marketing and Sales for Legi-Tech, quoted in Mitchell Benson, "State Web sites offer firms competition," Wall Street Journal, May 14, 1997, p. CA1.
37 - However, HR 3531 contained a number of provisions that had much the same effect, even if it did not invoke the words "fair use."
38 - Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, 1997.
39 - Guido Calabresi and A. Douglas Melamed, "Property Rules, Liability Rules, and Inalienability: One View From The Cathedral," Yale Law Journal, (1972), pp. 1089-1129.
40 - As discussed earlier, a penalty should be attached to past infringement to avoid a "heads-I-win, tails-I-break-even" scenario on the part of the infringer.
41 - Calabresi and Melamed, 1972, p. 1109.
42 - Jones may deliberately "seed" false information in his database or look for inadvertent errors that reappear as ways to determine whether or not Smith copied from his database.
43 - Recihman and Samuelson, 1997, p. 147.