Research areas in science, particularly those at the cutting edge of their fields, are characterized by patterns of intense communication between scientists. This communication manifests itself in various ways, both formally and informally, but prominent among these are citations from one scientist’s work to another. Patterns of citation reflect a fine-grained selection process of how scientists build on each others’ work, and the relationship of these works to one another. Such patterns can be used to create a picture of the state of a specific research area in terms of the papers that constitute its core of seminal work.

The procedure to accomplish this in Essential Science Indicators is called Research Front analysis. It is based on identifying the most-cited papers across multiple disciplines over a five-year period, and then determining how often these papers have been jointly cited—that is, how often, in the footnotes or references of given papers, a citation to one item is accompanied by a citation to another highly cited item. This defines the frequency of co-citation of the two highly cited papers.

Identifying research fronts involves manipulating the co-cited papers in order to group together those that are strongly related. Before embarking on this process, a threshold is set on the integer co-citation frequencies to eliminate very low values, and the remaining frequencies are converted to a normalized form using the following formula:

Normalized co-citation = Integer co-citation frequency of A and B/(citation frequency A*citation frequency B)^.5

In other words, we divide the co-citation frequency by the square root of the product of the citation frequencies of the two papers. A second threshold is set on these normalized values. In the most recent data run for Essential Science Indicators, the integer threshold was set to accept co-citation frequencies of 2 or greater, and the normalized threshold was set at 0.3.

Starting with a co-cited pair that meets the thresholds, this grouping procedure then finds other pairs that share common papers. The gathering process continues until no other pairs of papers can be added to the set. This process is commonly known as single-link clustering. The resulting clusters vary in size from a minimum of two papers to some maximum size.

The numeric attributes of fronts can help determine the significance of the areas and their stage of development. The number of core papers in the front and the total citations received give indications of the size of the area. The numbers of citations per core paper give an indication of the focus or concentration of effort. The average publication year and distribution of core papers by year give an indication of currency or "hotness"—that is, how quickly research is changing and whether there are new developments. An analysis of frequently occurring keywords or phrases in the titles of the paper, as given by the front name, can give an indication of the subject content and thematic focus of the area.

Research front analysis will not identify all research areas or all the papers in an area. However, it can assist in identifying areas where important work is being done and where the scientific community is focusing its attention.

