Electric Britannicas

The Early Disks. While CMME was still in its initial development phase, Kester set his sights on the EB text. At Microsoft's CD-ROM Expo in February 1989 he had showed an executive of Britannica Learning Corp. how the digitized Compton's data, running under SmarTrieve, could almost instantaneously yield an answer to the query "Why do leaves fall from trees in the autumn?" In September he demonstrated the SmarTrieve technology at the Chicago headquarters of Britannica, and the following month he secured an agreement under which he was given access to the EB text data. As he recalls, it took his two IBM 80286 computers some three months of continuous processing to parse the whole of the EB text (into a proprietary format dubbed BSTIF, for Britannica Software Tagged Interchange Format) and produce the inverted index. But while this provided him great satisfaction, it was of only secondary interest to Britannica at the time. In March 1990 Britannica purchased the Del Mar Group, and with it the rights to SmarTrieve, and folded it into the Britannica Software division (renamed Compton's New Media in 1992). Priority was given to the development of the consumer CMME for various platforms and to an ambitious program of consumer CD-ROM publishing, ranging from the Guinness Book of Records to the Berenstain Bears.

The search for an application for the EB data, one that could be done as it were in spare time and that would not alarm sales management, led to the Britannica Electronic Index, or BEI. BEI ran in DOS and was essentially the SmarTrieve index of EB data on a CD-ROM. A typed-in query would yield a relevance-ordered list of references to the text in the form of volume-page-quadrant citations. (Sidebar 3) For the user, BEI was supposed to function as a superindex, far surpassing even the two-volume printed index in comprehensiveness and, unlike a printed index, able to deal with queries involving more than a single term. As a product, it was a giveaway - yet another premium item used in hope of promoting sales of the print set.

The first true electronic Britannica product arose in part from a spate of publicity generated in mid-1991 by critics of certain high school history textbooks, which had been found to be rife with factual error. It emerged that most textbook publishers relied on outside service vendors for such fact-checking as was done in the development of textbooks, and often none was done at all. Fact-checking is slow, tedious, and difficult. An electronic product with sophisticated search-and-retrieval capabilities running over a superior text base could make the process more effective and cost-efficient. From this insight proceeded the notion of the Britannica Instant Research System, or BIRS (at first known internally as "Fact Finder"). Intended for the professional user, BIRS ran under Windows 3.0 and consisted of two CD-ROMs, one containing the SmarTrieve indices and the other containing the Britannica text. The contents of both disks were to be transferred to the user's PC hard drive, which had thus to be of 1-gigabyte capacity, very large and very expensive for the day. BIRS was introduced in June 1992 with what was for Britannica substantial fanfare, but it was bought by only a very few venturesome customers, including the small Evanston, Ill.-based editorial service company that had served as the beta site.

"Network Britannica." While managing to see BEI and BIRS out the door, and thereby allowing the nose of the camel into the tent, Kester had been pressing for an independent budget to support research into how the core Britannica data - not only text but indexed and classified text - could best be exploited electronically. He assembled a team, the main members of which were John Dimm, already at Britannica Software by way of ESC; Bob Clarke, a consultant and freelance seer whom Kester had worked with in the earliest days of the Del Mar Group; John McInerney, whose experience lay in network administration and security; Rik Belew, a professor of computer science at UC San Diego, who served as a part-time consultant; and two doctoral students from his department, Brian Bartell and Amy Steier. By late 1992 these formed a separate, and somewhat isolated, group within Compton's New Media, and it was formally organized as the Advanced Technology Group in April 1993. In August 1993 the ATG moved out of the Compton's offices in Carlsbad, California, and opened its own office in La Jolla, near the UCSD campus. Mention must also be made here of an honorary member of ATG, Vince Star, who oversaw the Britannica publishing system in Chicago and had thus been responsible early on for preparing text data for transfer to the West Coast group. (Sidebar) The introduction of email in the Chicago office in early 1993, at first and for some time on a very limited basis (limited not merely by resources and a learning curve but also by the fact that it was "elm" mail, a UNIX application for which one had to learn some rudiments of a clumsy editor called "vi"), had the effect of cementing relationships between ATG and a few committed hands at headquarters.

The earliest and thereafter driving goal of the ATG team was set by Clarke, who forcefully held out the assertion of Scott McNealy (he of Sun Microsystems) that "the network is the platform." This could actually be taken in two ways, and Clarke argued both: First, "the" network - the Internet - was the emerging standard; second, as a development strategy, solving the hard problem first - building what he called System Britannica, essentially a networked version of Britannica - would lay a solid foundation from which to derive CD-ROM and other versions. Thus, what was finally released as the Britannica CD 1.0 (known simply as BCD) in 1994 was in effect the network product ported to a disk for use in a PC that was both server and client.

The problems were many. Britannica text included a great many "special characters," including letter forms and diacritics used in non-English words (often proper names), mathematical symbols, and scientific notation. Translating raw Britannica text into ASCII or into the larger but still limited character sets used in Windows and other platforms was difficult and error-prone, and the results were judged unsatisfactory by the Britannica editors. These transliterations also compromised the accuracy of the indexing done by the search-and-retrieval software. Large portions of mathematics and science articles (such as complex formulae or chemical diagrams) in the set had been created as "special comp" for the printing process - essentially pieces of artwork that had no counterparts in the digitized text database and thus left "holes" when the text alone was used. Much other artwork of a more conventional sort was considered essential to supplement the text.

On the technical side the problems were equally challenging. The raw text was huge - just over 300 Mb - and consisted of tens of thousands of separate files, a number that jumped once the Macropaedia articles, some of which ran to upwards of 250,000 words, were split into manageable chunks. Then there were the questions of which networks to design for, for what download speeds, and for what sorts of display? Windows had not yet penetrated the network market in any significant way, and the most common machines used for data display were the VT100 "dumb terminals."

Beyond matters of implementation were larger questions, to which Clarke, in particular, was giving much thought: What truly innovative products and what wholly new methods of information representation and use might arise from learning how to exploit the indexing within Britannica and the subject-classification codes that were used to tag the text? In a network environment, what relationships might obtain or grow up between the content of Britannica and other information resources on the network?

next

©2003 by Robert McHenry