## INTERNATIONAL KNOWLEDGE FLOWS: MODELING 3

Expected citation count for “cells”

Consider a potentially cited patent with particular t, 6, g attributes, e.g., a Japanese-invented patent in the Drug and Medical area granted in 1985. The expected number of citations that this patent will receive from a particular patent with a given T,L combination (e.g., a British patent granted in 1993 that happens to be in the same patent class) is just the above likelihood, as a function of ^,t,g,T,L and D(k,K). The expected number of citations from all patents with a given T,L combination is found by summing the frequency shown in Eq. 1 over all such patents. Similarly, the expected total number of citations to all patents with the particular ^,t,g combination will be found by summing over all such patents. The only tricky part of this double summation is dealing with D(k,K). We show in Appendix A that one can start from Eq. (1) and aggregate to derive a relationship for “cells” identified by i. t,g,T and L, where the dependent variable is the expected frequency of citationpltgTL , i.e., the ratio of the number of citations to the product of the number of potentially citing and potentially cited patents. In expectation, this frequency is a function of the characteristics of к and K, and the variable:

where fflgs is the fraction of potentially cited patents in patent class s and fTLS is the fraction of potentially citing patents in patent class s. PROX measures the extent to which the potentially citing and potentially cited patents overlap in their patent class distribution.8 It is closely related to the technological proximity measure of firms used in Jaffe (1986). This brings us to the following equation:

which can be estimated by non-linear least squares if the error s;gtTL is well-behaved.

The data set consists of one observation for each feasible combination of values of ^,t,g,T and L. Since t runs from 1963 to 1993 and T runs from (the greater of 1977 and t+1) to 1994, the number of cells for each I, g, L combination is 14×18 + (17+16+15+14+13…+1). There are 125 t, g, L combinations, so the total number of cells is 50,625. Simple statistics for this dataset are presented in Table 2. The average number of cited patents in a cell is about 1800; the minimum is 16 (French Drug and Medical patents in one particular year) and the maximum is almost 15,000 (U.S. Mechanical patents in one particular year). The number of citations varies from 0 to over 6000 with a mean of about 100; the mean of the citation frequency is about 4×1 O’6.