## Monthly Archives: July 2014

## INTERNATIONAL KNOWLEDGE FLOWS: RESULTS 4

Also noticeable in the cumulative probabilities is that the U.S. tends to both make and receive more citations than other countries. Note, for example that the entry in the U.S. column contains the largest figure other than the diagonal in every row, and the U.S. row contains the largest figure other than the diagonal in every column except Germany. This result could be driven by differences between the U.S. and other countries in the propensity to patent. If the U.S. has a low propensity to patent, then each patent granted represents (on average) a larger chunk of knowledge, which could result in more citations made and received per patent (Cabellero and Jaffe, 1993).

It is more likely, however, that the propensity of U.S. inventors to patent in the U.S. is greater than that of foreigners (Eaton and Kortum, 1996). That is, U.S. inventors are more likely to take out a U.S. patent on a trivial invention than are foreigners. All else equal, this should make the average citation rate to and from the U.S.-invented patents lower than the corresponding rates for foreign-invented patents. Since we find the opposite, this may be evidence confirming a view of the U.S. as the most open and interconnected economic and technological system.

## INTERNATIONAL KNOWLEDGE FLOWS: RESULTS 3

Geographic localization is also evident in the (3, parameters, presented in the middle panel of Table 3 in the form of the estimated modal lag. Here the diagonal elements are generally the smallest entry in each row and column, meaning modal citation lags are noticeably shorter for domestic citations, relative to citations to and from others. The only exception to this general pattern is the U.S. U.S. inventors are slightly faster to cite Japanese inventors than they are to cite U.S. inventors (p,=T.05), and Japanese inventors are faster to cite U.S. inventors than are U.S. inventors (P^l .19).

## INTERNATIONAL KNOWLEDGE FLOWS: RESULTS 2

Table 3 presents the estimates of the a and (3, parameters in several different ways. The top panel simply reproduces the a estimates presented in Appendix B, but arrays them in matrix form. The second panel presents the estimated values (with standard errors) in terms of (1/(3,), which has years as units and is equal to the lag at which the citation frequency reaches its maximum value. The bottom panel presents estimated values (with standard errors) for aP2/(Pi)2, which is the integral of the citation function from t=0 to infinity. This is an estimate of the expected number of citations that a single patent will receive from a set of patents consisting of one random patent per year forever. Thus the middle panel of the table measures the “speed” of citation diffusion and the bottom panel measures the overall intensity of citation.

## INTERNATIONAL KNOWLEDGE FLOWS: RESULTS

In order to focus on spillovers, we concentrate on the results exclusive of self-citations, but we comment briefly on the very high degree of localization of self-citations.

We estimate Eq. 3 by non-linear least squares. Since the left-hand variable is an empirical frequency, the model is heteroskedastic. To improve efficiency and get the right standard errors, we weight the observations by the reciprocal of the estimated variance, ugX-N lt) ’ 8enera^ ^is weighting greatly improves the fit of the model, but does not alter the parameter estimates materially.

## INTERNATIONAL KNOWLEDGE FLOWS: Econometric issues and interpretation 3

Variations in p, (by attributes of either the cited or the citing patents) imply differences in the timing of citations across categories of patents. Higher values of P, mean higher rates of decay, which pull the citations function downwards and leftward. In other words, the likelihood of citations would be lower everywhere for higher pi5 and would peak earlier on. Thus a higher a means more citations at all lags; a lower P, means more citations at later lags.

When both a and p, vary, the citation function can shift upward at some lags while shifting downward at others. For example, if a for citations from Japan to Japan is 2.32 and the pt for Japan to Japan is 1.54, this implies that the likelihood of citation in early years is higher than the base group, but because of the higher p„ this difference fades over time. Because obsolescence is compounded over time, differences in p, eventually result in large differences in the citation frequency. If we compute the ratio of the likelihood of citations for Japan-to-Japan relative to U.S.-to-U.S. using these parameters, we find that one year after being granted, Japan-to-Japan citations are about twice as likely as U.S.-to-U.S., but nine years down the road the frequencies for the two groups are about the same, and at a lag of 20 years Japan-to-Japan citations are actually about 70% less likely than for the base category.

## INTERNATIONAL KNOWLEDGE FLOWS: Econometric issues and interpretation 2

The estimate of any particular a(k), say a(g=Chemical), is a proportionality factor measuring the extent to which the patents in the Chemical field are more or less likely to be cited over time vis a vis patents in the base category (Drugs). Thus, an estimate of a(g=Chemical) =1.5 means that the likelihood that a patent in the field of Chemicals will receive a citation is 50% higher than the likelihood of a patent in the base category, controlling for other factors. Notice that this is true across all lags; we can think of an a greater than unity as meaning that the citation function is shifted upward proportionately, relative to the base group. Hence the integral over time (i.e., the total number of citations per patent) will also be 50% larger. Similarly, if a ((= Japan, L=U.S.) is .72, this means that a Japanese patent is 28% less likely to get a citation from a random U.S. patent than is a random U.S. patent.

We can think of the overall citation intensity measured by variations in a as composed of two parts. Citation intensity is the product of the “fertility” (Caballero and Jaffe, 1993) or “importance” (Trajtenberg, Henderson and Jaffe, 1997) of the underlying ideas in spawning future technological developments, and the average “size” of a patent, i.e., how much of the unobservable advance of knowledge is packaged in a typical patent. Within the formulation of this paper, however, it is not possible to decompose the а-effects into these two components.

## INTERNATIONAL KNOWLEDGE FLOWS: Econometric issues and interpretation

The first specification issue to consider is the difficulty of estimating effects associated with cited year, citing year and lag. This is analogous to estimating “vintage,” time, and age effects in a wage or a hedonic price model. If lag (our “age” effect) entered the model linearly, then it would be impossible to estimate all three effects. Given that lag enters our model non-linearly, all three effects are, in principle, identified. In practice, however, we found that we could not get the model to converge with the double-exponential lag function and separate a parameters for each cited year and each citing year. We were, however, able to estimate a model in which cited years are grouped into five-year periods, indexed by p. Hence we assume that a(t) is constant over t within these periods, but allow the periods to differ from each other.

## INTERNATIONAL KNOWLEDGE FLOWS: MODELING 3

Expected citation count for “cells”

Consider a potentially cited patent with particular t, 6, g attributes, e.g., a Japanese-invented patent in the Drug and Medical area granted in 1985. The expected number of citations that this patent will receive from a particular patent with a given T,L combination (e.g., a British patent granted in 1993 that happens to be in the same patent class) is just the above likelihood, as a function of ^,t,g,T,L and D(k,K). The expected number of citations from all patents with a given T,L combination is found by summing the frequency shown in Eq. 1 over all such patents. Similarly, the expected total number of citations to all patents with the particular ^,t,g combination will be found by summing over all such patents. The only tricky part of this double summation is dealing with D(k,K). We show in Appendix A that one can start from Eq. (1) and aggregate to derive a relationship for “cells” identified by i. t,g,T and L, where the dependent variable is the expected frequency of citationpltgTL , i.e., the ratio of the number of citations to the product of the number of potentially citing and potentially cited patents. In expectation, this frequency is a function of the characteristics of к and K, and the variable:

## INTERNATIONAL KNOWLEDGE FLOWS: MODELING 2

The citation frequency (the likelihood that any particular patent К granted in year T will cite some particular patent к granted in year t) is assumed to be determined by the combination of an exponential process by which knowledge diffuses and a second exponential process by which knowledge becomes obsolete. That is:

where (3, determines the rate of obsolescence and p2 determines the rate of diffusion. The parameter a is a shift parameter that depends on the attributes of both the patent к and the patent K. D(k,K) is a dummy variable, set equal to unity if the patent к is in the same patent class as the patent K, and zero otherwise. Thus, the parameter у measures the overall increase in citation frequency associated with the two patents matching by patent class. The dependence of the parameters a and Pj on к and К is meant to indicate that these could be functions of certain attributes of both the cited and citing patents. In this paper, we consider the following as attributes of the cited patent к that might affect its citation frequency:

## INTERNATIONAL KNOWLEDGE FLOWS: MODELING

Patent-pair citation frequencies

We seek to model the citation frequencies described in Section II above, the way in which these frequencies evolve over time, and how they are affected by characteristics of the citing and cited patent. One way to approach this would be with a probit-type model, in which each citation is an observation, and the regression dataset is created by combining the actual citations with a random sample of patent pairs that did not cite each other. One could then ask how the predicted probability that a patent pair will result in a citation is affected by various regressor variables.