The Social Information Infrastructure

William Sims Bainbridge

Social Science Computer Review 13:2, Summer 1995, pp. 171-182.

Abstract

The Division of Social, Behavioral, and Economic Research of the National Science Foundation has explored aggressively the potential involvement of the social sciences in the National Information Infrastructure. We envision the NII as a global network of computer communications, which will evolve out of the Internet, linking all social scientists to massive digital libraries and to myriad smaller distributed data sources containing information of every imaginable sort. Five workshops have charted applications of high-performance computing in the social and behavioral sciences: cognitive science, computational geography, computational economics, artificial social intelligence, and electronic networks. A survey of SBER programs revealed that many are helping to create the information infrastructure, and substantial investment in six "flagship" digital library projects will develop the systems necessary for the NII of the 21st century. Keywords: information infrastructure, digital library, cognitive science, computational geography, computational economics, artificial intelligence, Internet.

The Social Information Infrastructure

High Performance Computing and Communications (HPCC) is a multi-agency federal funding initiative for research and development in the field of advanced computers and computer communications networks. The Information Infrastructure and Technology Act of 1992 increased the scope of this initiative in such areas as high-speed networks and digital libraries. By 1993 the term "National Information Infrastructure" (NII) had become the official name for the data storage and communications portion of this work, but it was popularly called the "information superhighway." The government seeks to shape the development of the NII in two chief ways: (1) major changes in regulation of the communications industry, and (2) investment in fundamental research and demonstration projects. All of these developments may have substantial implications for the social sciences, both transforming the way research and education are done and giving social scientists fresh responsibilities to ensure that the new technologies will promote human welfare.

To explore these potentials, the Division of Social, Behavioral, and Economic Research (SBER) of the National Science Foundation (NSF) has held five workshops - one organized by a program officer from each of its five program clusters - and conducted a comprehensive information infrastructure survey of all its programs. In this essay I shall attempt to summarize the findings of those studies from my own perspective, and I will share my views on the very recent developments in digital libraries. My report is entirely unofficial, and it is possible that other observers will come to somewhat different conclusions. I have drawn heavily upon the insights and expertise of a large number of social and behavioral scientists who have been involved with HPCC and NII, however, and I hope the following expresses an approximate consensus among them.

The Scope of the Information Infrastructure

New computational methods are sweeping across our disciplines, and the vast opportunities of the National Information Infrastructure are challenging the social sciences to approach data in new ways and on scales previously unimagined. At the 1994 Computing in the Social Science Conference, Albert Anderson demonstrated a system that used the Internet and parallel processing to analyze raw census data at the incredible speed of 1,000,000 cases per second. Robust, nonparametric estimation methods, such as bootstrapping, and the recent application of neural networks to statistical analysis are highly computer-intensive techniques that promise to be of great value. My often-expressed view that statistical computing in the social sciences is a mature technology that does not require bigger and faster machines may simply be wrong. Thus, even very familiar methodologies may benefit from high-performance computing, but I still think that the greatest potential for progress is in wholly new directions.

The rapid development of computerized geographic information systems (GIS) is stimulating social science to analyze and display data at far higher levels of effectiveness, and GIS has already proven its great value for commercial and governmental applications. The interdisciplinary field of cognitive science unites psychology and linguistics with computer science, and very recently it has entered economics, sociology, and political science under the rubric of artificial social intelligence. The social and behavioral sciences are at various stages in the development and legitimation of theory-based computer simulation, and models of social networks of complex agents are beginning to demand the speed and memory capacity of supercomputers.

The expressed goals of the NII place great emphasis on the knowledge base and technical expertise of the social and behavioral sciences. These include: having conditions that promote private-sector investment and innovation; ensuring that information resources and the empowerment that comes with them are available at the lowest reasonable cost to the largest number of societal groups; developing institutions that respond effectively to issues of information security and intellectual property rights; and creating systems to collect, process, and deliver socio-economic data in ways that give users the maximum value.

As the information infrastructure portion of HPCC increases in relative importance, the social and behavioral sciences take on a greatly magnified role. Much data communicated over the NII will concern human institutions and behavior: texts of all the newspapers and other periodicals from around the world; all nonclassified government data; all public records of legal proceedings and property ownership; all public economic data, from minute-to-minute stock and commodity prices to every advertisement and sales catalog; every book that exists in every language from ancient times until the present moment; comprehensive systems of social geographic data pinpointing every home, highway, and business on the planet; schedules of every public event and artistic performance around the globe; extensive public files of every educational institution from class schedules to lecture notes; the rapidly expanding contents of computer bulletin boards and open E-mail; detailed descriptions and pictures of the entire collections of all museums; digital recordings of interviews, speeches, and anthropological films with automatic transcription and indexing; and the extensive contents of all social science data archives such as the Inter-university Consortium for Political and Social Research (ICPSR).

Logically, therefore, the NSF Division of Social, Behavioral, and Economic Research should play a major role in HPCC as we move into the era of the information superhighway. Much of this work will involve support for research to develop techniques of database analysis, computational tools serving a wide range of new needs, and methods for creating and managing large systems of social, economic, and governmental information.

The five workshops and the survey conducted by SBER provide a firm basis for assessing the HPCC/NII potential of the social and behavioral sciences. The intention was to delineate accomplishments and ascertain future opportunities as well as to identify deficiencies in the current knowledge base and research infrastructure. Summaries of their conclusions follow.

Cognitive Science Workshop

This workshop was initiated by Joseph L. Young, director of the NSF program in human cognition and perception, in the Cognitive, Psychological, and Language Sciences cluster of SBER. Cognitive science is inherently interdisciplinary, embracing aspects of psychology, linguistics, artificial intelligence, neuroscience, engineering, and other behavioral and social sciences. Research focuses on such questions as how people make decisions, how memories are stored, how computers can best be constructed and programmed, and how humans can interact optimally in an increasingly computer-oriented environment. Cognitive science extends from basic knowledge about the mental development of children to the study of teaching and life-long learning. Knowledge models and network theory that developed as a result of cognitive science research are now being used in robotics and other industrial applications. We now have in cognitive science the theoretical and technological tools to enable the design of computational environments that facilitate productive human cognition.

Computational Geography Workshop

This workshop was sponsored by the geography and regional sciences program in the Anthropological and Geographic Sciences cluster of SBER. Geography systematically analyzes information about the locational patterns and distributions of physical and social phenomena, so it has benefited greatly from new technological advances in the collection, manipulation, and presentation of data. The recent development of computer-based geographic information systems (GIS) has greatly expanded research and educational opportunities in geography, and CIS has found much use in government and commerce and in a range of social sciences. Many challenges lie ahead, including management of vast multidimensional databases; development of more effective methods for visualizing data; compilation of numerous local data sets into a global system; creation of far more detailed and realistic models of global processes; efficient and precise integration of spatial data across different geographic scales; and training of professionals to use the new techniques effectively.

Computational Economics Workshop

This workshop was initiated by Daniel H. Newlon of the economics program in the Economic, Decision, and Management Sciences cluster of SBER, with the assistance of John Wooley. There is now a wide array of opportunities to advance our understanding of economics using computational methods. These include the estimation and optimization of dynamic models with uncertainty. This kind of analysis helps us understand portfolio and savings decisions, patent renewal, schooling choice, and retirement behavior. The opportunities also extend to models of the economics of the environment that could facilitate international negotiations about carbon emissions. We have the opportunity to gain a better understanding of the functions of financial markets such as the Chicago Board of Trade through the use of artificial intelligence methods that model high-speed electronic trading systems. Models can be developed to study a single industry on a worldwide basis, providing new insights into the complexities of international trade. In addition, there is a need for substantial new support for a number of small projects in a variety of interesting and scientifically promising areas.

Artificial Social Intelligence Workshop

This workshop was sponsored by the NSF sociology program in the Social and Political Sciences cluster of SBER. Artificial social intelligence (ASI) is the application of machine intelligence techniques to social phenomena. ASI includes computer simulations of social systems in which individuals are modeled as intelligent actors, whether by neural networks or symbolic processors. Such work can help us to understand how well computer models fit human behavior and to develop new ways for humans and computers to interact. Promising ASI techniques for collecting social data include computer-administered interviews, automatic processing of written texts, and smart laboratories for conducting experiments on social exchange. Methods of analyzing social data that employ ASI computer techniques are currently under development, including statistical modeling by means of neural networks, expert system models of societal institutions, and data manipulation tools based on new approaches such as genetic algorithms. Several major scientific disciplines, notably economics, sociology, social psychology, and political science, have only just begun to employ ASI techniques, but the early successes are quite impressive.

Electronic Networks Workshop

This workshop was initiated by Ronald J. Overmann, director of the science and technology studies program in the Science, Technology, and Society cluster of SBER. With the development of the National Research and Education Network (NREN) over the next few years, exciting opportunities and challenges confront the social, behavioral, and economic sciences. The high-speed, wide-band network and associated digital libraries provide the means to transform the way in which research and education are undertaken. The workshop report on electronic networks made five recommendations in the form of an internal NSF memorandum. First, demonstration projects should investigate alternative methods for placing data and text online, develop useful tools for information retrieval, explore optimal ways of visualization of materials, and help establish guidelines for future work. Second, researchers should cooperate in the development of guidelines or standards for network-related projects, to outline where software tools need to be developed, to move toward more standardized coding schemes, and to provide advice to others interested in carrying out similar projects. Third, it is very important to promote human-resource development through training programs to upgrade skills of faculty members and to facilitate training of graduate and undergraduate students. Fourth, social scientific research and evaluation projects should examine usage of electronic networks. Fifth, the computing and communicating infrastructure must be upgraded - hardware and software alike - within the social, behavioral, and economic sciences.

Social and Behavioral Information Infrastructure Survey

A survey of SBER program officers revealed that almost all programs in the division have supported research that will contribute to the NII, and the expertise exists to move forward far more rapidly. Some social science research focuses on the institutions that would create the NII: advanced management techniques for public data; productivity of advanced information systems; the developmental history of NII-related technologies; legal aspects of information infrastructure; and methods for maintaining data confidentiality. Studies in behavioral science examine the human-machine interface: computer models of human cognition and perception; artificial social intelligence; natural-language processing across diverse languages; machine recognition and generation of speech and writing. Considerable research effort is also devoted to methods for managing and transforming social, geographic, economic, political, and cultural data. Finally, SBER programs support research on the effectiveness of information infrastructure in serving the societal goals for which it is created.

The range of rapidly developing research areas in the social and behavioral sciences relates directly to several of the information infrastructure components of HPCC, notably intelligent user interfaces, digital libraries, very large knowledge bases, advanced manufacturing, education and life-long learning, civil infrastructure and health care, multilingual language processing, and database/network security. In areas such as these, the social and behavioral sciences can contribute to the knowledge base on which the technology itself is developed. Beyond technology, however, there is a body of scientific understanding that we need to achieve in order to insure that when the technology is in place it will be fully effective in meeting the needs for which it is designed.

The nine principles and goals of the NII as described in the 15 September 1993, NII Agenda for Action of the Information Infrastructure Task Force of the federal government emphasize social values and challenges are:

1. Promote private-sector investment.
2. Extend the "universal service" concept to insure that information resources are available to all at affordable prices.
3. Act as catalyst to promote technological innovation and new applications.
4. Promote seamless, interactive, user-driven operation of the NII.
5. Insure information security and network reliability.
6. Improve management of the radio frequency spectrum.
7. Protect individual property rights.
8. Coordinate with other levels of government and with other nations.
9. Provide access to government information and improve government procurement.

The goal of promoting private-sector investment implies research in management sciences and in economics on how this can be done in the innovative, high-risk hardware and software technologies that will be needed. The universal service concept, insuring that information resources are available to all at affordable prices, naturally draws on research in sociology, political science, and several other social sciences. Seamless, interactive, user-driven operation requires the insights into human-computer interaction deriving from linguistics and psychology. Social science research will contribute to understanding the legal and ethical contexts influenced by and influencing new developments in high-performance computing and information infrastructure. This work will also help develop principles of information security, intellectual property rights, and practices that encourage responsible computing and appropriate access.

Also vitally important will be social science research on the human impact of other kinds of data transmitted over the NII; economic, sociological, and political science analyses of the corporations, government agencies, and complex market systems involved; psychological and linguistic studies of how humans are affected by computerized data and how they can use it most effectively; assessment of the social, economic, and cultural revolutions caused by the emergence of new NII-related industries and the demise of older institutions that may be supplanted by them; systematic examination of the global consequences of the NII, including the changing position of the United States in world trade and the potential weakening of other cultures; development of a scientific knowledge base concerning the processes that promote successful innovation and entrepreneurship.

The Digital Libraries Initiative

Recently, the NSF Division of Social, Behavioral, and Economic Research has begun to contribute significantly to a major effort to develop digital libraries, which will be the central scientific component of the information infrastructure. A competition was held in 1994 jointly by the National Science Foundation, the Advanced Research Projects Agency, and the National Aeronautics and Space Administration. The announcement of this competition noted that the Internet already offers the ingredients of a digital library system: "They include reference volumes, books, journals, newspapers, national phone directories, sound and voice recordings, images, video clips, scientific data (raw data streams from instruments and processed information), and private information services such as stock market reports and private newsletters."

The announcement went on to say: "To explore the full benefits of such digital libraries, the problem for research and development is not merely how to connect everyone and everything together in the network. Rather, it is to achieve an economically feasible capability to digitize massive corpora of extant and new information from heterogeneous and distributed sources; then store,, search, process, and retrieve information from them in a user-friendly way. Among other things, this will require both fundamental research and the development of 'intelligent' software."

As the competition progressed, the importance of the social and behavioral sciences for this vast project became ever more clear. Ronald J. Overmann played the key role for SBER, participating in the evaluation of digital library proposals and building links for future cooperation with the National Archives, the National Endowment for the Humanities, and similar foundations. SBER decided to make a significant financial investment in partnership with the NSF Computer and Information Science and Engineering Directorate, and six awards were announced, totaling $24,400,000 of government money, matched by comparable investments from universities and industry. The awards vary in their immediate relevance for the social sciences, but each potentially offers applications for our disciplines and may eventually draw upon them. These awards are:

1. Carnegie-Mellon University. In cooperation with the influential public broadcasting television station in Pittsburgh, WQED, Carnegie-Mellon will create the Infomedia online digital video library system to provide machine-transcribed and -indexed television science programs. Starting with a thousand hours of material from WQED and from Britain's Open University, Infomedia will work out the technology necessary to index and archive far larger sets of audio-visual material. Included are the development of a universal system for billing users for the cost of NII data, and evaluation of the educational potential of Infomedia materials in the schools of Fairfax County, Virginia.

Possible future social-scientific applications include automatic scanning of all television programs for topics of research interest and automatic interview transcription with analysis. The project will exploit existing speech recognition and natural-language-processing technologies, but it might also lead to some new research findings in linguistics. Already the Carnegie-Mellon team has experimented with techniques for constructing educationally valuable synthetic interviews from masses of prerecorded materials. For example, a database could contain all recorded speeches and interviews by President John Kennedy. The student could ask questions of the computer, and instantly the display would show Kennedy speaking about the topic of the question, as if he were giving the student his answer. A high-school government pupil or a college student of introductory political science could do an assignment for the course by interrogating Kennedy on some issue and writing a report about his answers to the interview questions.

2. University of California, Berkeley. This project will create a prototype digital library of environmental information covering the state of California, designed so that untrained users will be able to find and contribute valuable information. One practical use will be the collation of data to prepare an environmental impact report about, for example, a proposed new construction project. To the extent that the environment has been disturbed by human beings, then the data will be at least partly social in character and thus important for research on the human dimensions of global change.

3. University of Michigan. This testbed multimedia digital library will emphasize the earth and space sciences, stressing service for a wide range of users. The list of initial commercial sponsors suggests something of the potential intellectual and economic scope of the NII: IBM, Elsevier Science Publications, Apple Computer, Bellcore, University Microfilms, McGraw-Hill, the Encyclopedia Britannica Educational Corporation, and Kodak.

4. University of California, Santa Barbara. This project draws upon the expertise of the National Center for Geographic Information and Analysis, funded in part by SBER'S program in geography and regional science. It will provide easy access to large, varied collections of maps, images, and pictorial material, beginning with a collection of digitized maps and aerial photographs of the area around Santa Barbara and Los Angeles. An important feature of this digital library is the partnership with the State University of New York at Buffalo and the University of Maine, thus demonstrating the feasibility of multiple "distributed" sites at great distances, linked through Internet. The future global information infrastructure will store data at many thousands of locations, and the user may seldom worry about where the data are coming from, relying upon the computer and universal indexing systems to track down the desired information.

From the very beginning, some of the geographic data will be social in nature. Advanced geographic information systems will link quantitative and verbal data with the maps and other images, so all of the social sciences could use this digital library profitably after it is fully developed. Ultimately, the entire world will be charted in a nearly infinite number of ways, and data about any point on the Earth can be accessed from any other point, for research, education, and innumerable commercial applications.

5. Stanford University. This project is most heavily concentrated in the pure computer science aspects of digital libraries, to produce enabling technologies for the NII. The aim is to create a shared environment in which data of all kinds can be managed, communicated across different kinds of systems, and offered to users through uniform access techniques.

6. University of Illinois. Thousands of scientific periodicals will be placed online, initially in the engineering and physical science literatures. The user interface will include a customized version of the now-famous Mosaic software that makes it easy for the user to navigate the World Wide Web and that was developed at the National Center for Supercomputing Applications in Illinois, supported by NSF and ARPA.

In general, the social sciences have been rather slow to consider switching to electronic publication of its scientific journals, whereas fields such as mathematics and physics are already augmenting conventional paper journals with electronic versions and may cease paper publication altogether in the near future. The very high subscription cost of the most massive natural science journals is one factor encouraging this migration to digital media. Another is the importance in many of the natural sciences of very rapid dissemination of knowledge. A third factor may be of greater interest to the social sciences. Once all the scientific journals are available in digital form, the most recent search-and-retrieval technology will permit the user to find all references to a given topic with surprisingly high reliability and efficiency, even when the topics are subtle and not easily captured in old-fashioned keyword searches. A chemist, for example, could instruct his or her computer not only to find all existing literature on a particular chemical reaction, but also to scan all new scientific reports and alert him or her to any new research findings within minutes of their being reported. Incidentally, this will offer tremendous possibilities for research in fields that trace social and intellectual networks in the scientific literature, such as the sociology of science.

Some of these six digital libraries appear to have little relevance for the social sciences, but over the duration of these projects it will be possible to attach supplementary research-and-development projects to them, supported by fresh awards from the sponsoring agencies. For example, an economist could do a study of the economic transformation of a field of science or technology, in connection with any of the six. An anthropologist could add cultural information to almost, any of these archives, in a project to develop the right ways of managing and delivering ethnographic data. Cognitive psychologists and social psychologists could examine the ways users interact with the machines and with each other. Any of the social and behavioral sciences may study the digital libraries for hints of the new forms of culture and society that will take form in the next millennium.

Rumor has it that the contents of the Library of Congress, all 104 million items in it, total 40 terabytes of data (40,000,000,000,000 bytes). If this figure is not exact, at least this estimate must be about in the right range. Each of the six prototype digital libraries promises to put online in the near future one terabyte. Literally, together their contents will be within one order of magnitude of those of the Library of Congress. Of course, their point is not primarily to build the universal digital library of the future but rather to develop the technology - including scalable data storage-and-retrieval systems - that will be its basis.

Today, anyone may Telnet into the Library of Congress (MARVEL.LOC.GOV, password "marvel") and scan the comprehensive catalog or obtain some other kinds of limited data. The Library of Congress has ambitious plans to digitize its most important materials by the year 2000 and offer them over the net, but it is not in a position to carry out scientific research on new approaches or engineering to perfect the advanced technologies that will be necessary. But the joint NSF-ARPA-NASA Digital Libraries Initiative will do just that. The winning projects were designed so that they are assured of success by careful application of existing cutting-edge technology, but they also have the capability of taking advantage of new developments and even of stimulating them in many areas. These six are the flagships of the National Information Infrastructure; they will chart the course for all the other vessels that will follow.

Conclusion

The National Information Infrastructure is a grand vision that can help insure the continued progress of American technology as well as the economic prosperity that is heavily dependent on technological advances, as the United States moves into the era of the information society. The primary challenge is to make the information maximally valuable for the widest range of users. The National Science Foundation is well prepared to take the lead in accomplishing this vital goal, and the Division of Social, Behavioral and Economic Research is ready to help the NII achieve its social goals.

Notes

William Sims Bainbridge is the sociology program director at the National Science Foundation and represents the social and behavioral sciences on the NSF committees for the High Performance Computing and Communications Initiative. He can be reached at: Sociology Program, Suite 995; National Science Foundation, 4201 Wilson Blvd., Arlington, Virginia 22230; 703-306-1756; E-mail WBAINBRI@NSF.GOV.

The views expressed in this article do not necessarily represent the views of the National Science Foundation or the United States.

Further information about SBER workshops:

Baerwald, T. 1991. Computational needs and opportunities in geography (Report of the NSF Workshop on Computational Geography). ARC News 13:40-42.

Bainbridge, W. S., Brent, E. E., Carley, K. M., Heise, D. R., Macy, M. W., Markovsky, B., & Skvoretz, J. 1994. Artificial social intelligence (Report of the NSF Workshop on Artificial Intelligence in Sociology) Annual Review of Sociology 20:407-36.

Greeno, J., Anderson, ]., Bock, K., Carpenter, G., Carroll, ]., Collins, A., Dell, G., Gevins, A., Henrion, M., Hutchins, E., Hutchinson, W., Joshi, A., Keil, F., Larkin, ]., Lesgold, A., &. Smolensky, P. 1992. To strengthen American cognitive science for the twenty-first century (Report of the NSF Workshop on Cognitive Science). Washington, DC: National Science Foundation, NSF 92-4.

Kendrick, D., Bergman, B., Broder, I., David, M., Geweke, I., Judd, K., McGuckin, R., Miller, J., Nagurney, A., Pakes, A., Rosenberg, L., Rust, I., Sims, C., Tinsley, P., & Whinston, A. 1991. Research opportunities in computational economics (Report of the NSF Workshop on Research Opportunities in Computational Economics). Austin: Center for Economic Research, Department of Economics, University of Texas, Report R-91-i.

Information about other workshops and panels:

Branscomb, L., Belytschko, T., Bridenbaugh, P., Chay, T., Dozier, I., Grest, G. S., Hayes, E. F., Honig, B., Lane, N., Lester, W, McRae, G. J., Sethian, J. A., Smith, B., &. Vernon, M. 1993. From desktop to teraflop: Exploiting the U.S. lead in high performance computing (Report of the NSF Blue Ribbon Panel on High Performance Computing). Washington, DC: National Science Foundation, NSB 93-205.

Information Infrastructure Task Force. 1993. The National Information Infrastructure: Agenda for action. Washington, DC: NTIA NII Office, Department of Commerce.

National Center for Geographic Information and Analysis. 1994. Annual Report, Year 5. Santa Barbara: David Simonett Center for Spatial Analysis, University of California.

Tonn, B. 1994. Using the National Information Infrastructure for social science, education, and informed decision making (White paper of the Social Science Computing Association). Social Science Computer Review 12:166-82.

Wah, B. W., Huang, T. S., Joshi, A. K., Moldovan, D., Aloimonos, J, Bajcsy, R. K., Ballard, D., DeGroot, D., DeJong, K., Dyer, C. R., Fahlman, S. E., Grishman, R., Hirschman, L., Korf, R. E., Levinson, S. E., Miranker, D. P., Morgan, N. H., Nirenburg, S., Poggio, T., Riseman, E. M., Stanfill, C., Stolfo, C., Tanimoto, S. L., &. Weems, C. 1993. Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer vision, speech and natural language processing, and artificial intelligence. IEEE Transactions on Knowledge and Data Engineering 5:138-54.