You are here

Internet Technology

Going Beyond Google

There's a treasure trove of valuable research online--if you know how to find it.
University Business, Aug 2005

Librarians complain that students now think the web is all they need to research any topic. Students do not make a distinction between internet, web, and online sources--these are in fact the domain of much of their information. Since the appearance of the first library online public access catalogs (OPACs), the tilt from paper to electronic sources of information has been rapid and dramatic. Over the past dozen years the change has been all-encompassing: Most paper-based resources have an online edition; some products and services are new technologies in themselves.

A student's research typically begins shallow. When the topic is new, the first task is to find related basic information. But then the question becomes how much is enough.

A big part of education is to learn how far the search needs to go. Finding stuff is rarely the problem; to know how many gathered facts are sufficient has always been a threshold of learning, even the divide between information and wisdom. Online search tools are built on the premise that more results are better than fewer, and some result is better than none. As a consequence, the volume of returns to search queries leads quickly to a need to filter and reduce the scope of information.

The assessment of particulars is not a special problem of the electronic age. The hallmark of research has always been the appropriate distinction between ordinary and distinctive information. Students at all levels of education know the importance of choices they must make as their research accumulates. The key question regarding technology is how it can aid the efforts of scholars to find and evaluate information.

Problems with the web: Too much
information lying on the surface and
too much information hidden behind
locked portals.

Public, general-purpose search engines, such as Google, Altavista, and Ask Jeeves, are the usual starting points for new searches. People tend to prefer one over the others but are often unable to explain their choice in specific terms. Each search engine uses an obscure mix of advanced algorithms to index, search, match, and rank results. Even careful observers of the technology are unable to tell, in most cases, what techniques each engine uses, let alone which among them account for the set of search returns.

Conversations with students suggest that few know their preferred search engine's syntax for string matching or Boolean operations to control their searches, even though online help does put those capabilities at their fingertips. The craft of searching seems to be haphazard, partly because of the secrecy surrounding search technologies (the quest for competitive advantage) and the casual approach used by most searchers. The speed and ease of trying an alternate search seems to outweigh the benefits of a better-formed search, much as the computer "backspace" key undercut the value of accurate typing.

The observation that search engines produce differing returns for the same search criteria and control syntax led to the development of metasearch engines, which submit queries to multiple engines and aggregate the results. The concept is simple: More searches produce more results.

Dogpile, Metacrawler, and Profusion are examples of metasearches that vary in sophistication from compound submissions to inclusion of specialized ("vertical") search engines, personalized site preferences, and subscription services.

Another advance beyond the simple search is the web search directory, which groups indexes, lists, and descriptive articles by topic. Widely used general directories include Yahoo!, Librarians' Index to the Internet, and These services provide information that has been filtered or even vetted by someone, although the identity and credentials of those who do that work are rarely made available.

Not all web directories have that anonymity. In higher education, the Internet Scout Project is structured like a newsletter, complete with the reviewer's initials at the end of each entry. The materials collected in directories are findable on the web, for the most part; they have the advantage of having been winnowed already from the chaff of extraneous data.

For scholars, the problems of the web are too much information lying on the surface--where the lack of storage structure makes discovery cumbersome--and too much information hidden behind locked portals. The concealed, or "deep," web is the properly termed internet beyond the web. The web is that part of the internet that is accessible via HTTP protocol; not everything that can be reached on the network is ultimately web-accessible.

Google's "Scholar" service is founded on access to off-web sources that have allowed Google to search and index. Hits obtained via Google Scholar typically link to at least an abstract of the target text and sometimes to the full text. Conditions for access to the texts remains in the control of the copyright owners and might be restricted or available only to subscription holders. Scholar also includes links to articles citing the article that was the subject of the search query.

Complete Planet, which bills itself as "the deep web directory," claims to present resources gleaned from 70,000 databases and specialized search engines. A white paper at the company's website argues that ordinary web searching is limited by failure to retain information once a search is repeated (and in effect superseded) as a next round of inquiry. The key is instead to keep search results in an index for future retrieval, and this is in essence how Complete Planet--and other deep searches--operate.

Basic reference sources, long familiar in printed form, have moved to the internet, where they are web-approachable but available by subscription only. Merriam-Webster dictionaries and the Encyclopedia Britannica provide web travelers with a glimpse of the resources offered, just enough to serve as an advertisement.

Meanwhile, offers a sample of online human language translation services. Its brief, machine-processed translations of submitted text invite--for a fee--submissions of larger text sections, more refined translations, or even a translation written by a person.

The premium sources of scholarly knowledge online are the commercial indices and bibliographies typically available to students and researchers through academic libraries. LexisNexis combines various publishing brands that dominate the markets for legal information. EBSCO Information Services is a major aggregator of publications in print and electronic formats for libraries. Its Academic Search Premier product is one of the most widely subscribed collections of abstracted, indexed, and full-text journals. It claims indexing for more than 8,000 publications, of which 6,800 are abstracted and indexed and 4,500 are full text. Thomson Gale, another major reference and full-text provider, offers among its many products Academic ASAP, a database spanning most liberal arts disciplines and offering full text from 600 publications.

The Multimedia Educational Resource for Learning and Online Teaching is a free-membership catalog of online learning materials contributed by subscribers. At its center, MERLOT is a collection of metadata documenting digital resources available for adoption and shared use. It also offers technology, design, and policy guidelines to help standardize and improve faculty-developed materials.

Scholarly research today begins with the web but quickly branches into wider and deeper domains of information. Technologies to search, index, rank, and retain texts--or at least the means for potential access to them--are in rapid development, frequently subject to the secrecy and volatility of the commercial marketplace.

Many of the major players in online information are familiar from the print-publication industry and hold increasingly concentrated control over formerly numerous sources of information. Students and faculty have unprecedented, unmediated access to overwhelming quantities of information. Librarians and other IT professionals attempt to assist scholars in coping with this bounty, but how effective their guidance will prove is yet unknown. Research has gone paperless; figuring out how to function effectively in this environment is a work still in progress.

Tom Warger is a consulting principal for Edutech International (

You Still Have Time to Register for UBTech!

Join UBTech on June 12-14 in Orlando for “the Biggest Week in Campus AV & IT” which includes a fee pass to the InfoComm exhibit hall.

Register now for access to nationwide networking, new higher ed solutions, and sessions focused on active classroom, cybersecurity, and AV and IT strategy and implementation.

Register now>>