Caution: This is an informal essay, a draft; conclusions in different parts of this text may contradict themselves. This is 'thinking aloud'/doodling with ideas.
The PRN technique is used by the GPS (Global Positioning System), some cell radios, and by covert (spy) agencies. Pseudo-Random Noise phase-shift-keying has some very desirable noise-suppression properties that make it a natural candidate for use for communication by ETI. It allows low power transmissions to cut through noise, thus reducing the amount of power that a transmitter needs to be heard. For example, the GPS encoding system allows GPS satellites to transmit at power levels 30 dB below what they might 'otherwise' have to transmit. This is a huge amount of savings: three orders of magnitude. Of course, nothing is free: the penalty one pays is vastly reduced (three orders of magnitude) data rate. The GPS system has a 50-bits-per-second data rate, despite chewing up about 5 MHz of bandwidth. But that can be OK: 50 BPS is adequate for GPS, and I would imagine would be more than adequate for ETI transmissions.
I cannot over-emphasize the noise-suppression characteristics of pseudo-random encoding. The GPS encoding allows the signal level at the receiving antenna to be 30dB below the thermal noise figure at the antenna. That is, if you used a simple amplifier hooked up to a GPS antenna, all you would see, if you looked at the power level, would be pure, unadulterated thermal noise. Yet, there's a signal in there, and its 30dB down. To the un-initiated, this sounds impossible, a contradiction, free energy. But all GPS antennas operate in exactly this way, and the system was designed to work this way on purpose. There is an easy calculation one can perform: Computing the thermal noise figure for a transistor at room temperature for a 5MHz bandwidth. Its about -105dBm. You can look up the SIS (signal-in-space) figures from the GPS spec, ICD-200C, and see that its about -130 dB. The fact that such a weak signal is receivable is an offshoot of the Shannon-Hartley theorem, which relates channel capacity to bandwidth in the presence of noise.
It is in fact this property of being under the thermal noise floor that makes PRN modulation such an appealing thing for spy agencies. If a spy transmits a signal using this technique, and the counter-spy does not know the secret encoding key, the signal cannot be received and decoded. (Well, it can in, theory, but in practice it becomes very very hard). The encoding keys can be arbitrarily long and cryptographically secure, preventing the counter-spies from detecting the transmission. Indeed, a variant of this is used by the military to encode the GPS P/Y signal, preventing non-military users from getting the more accurate positions that the system is capable of providing.
It is outside of the scope of this essay to explain how PRN encoding works; it is carefully detailed in textbooks and papers. However, a brief description is in order. One starts by choosing a 'good, random' string of bits to act as the encryption key. In the case of GPS, this key is 1023 bits long. The key has nearly equal numbers of ones and zeros in it, and the distribution of pairs of ones, pairs of zeros, etc. falls off rapidly, as it would in a random (white-noise) spectrum. Each GPS satellite has its own unique key; the keys are published and well known and serve to identify the transmitting satellite. The signal is transmitted by repeating this key over and over. For GPS, each bit is slightly less than a microsecond, so that the pattern repeats every millisecond exactly, for a 1.023 MHz 'carrier' (which is then used to PSK the 1.575GHz radio signal). To modulate the 'carrier', the ones and zeros in the pattern are inverted. For GPS, this is done every 20 repeats (for a 50 BPS signal) and in WAAS, every 2 repeats. To find a signal, one searches for the particular pattern of bits, by convoluting the received radio signal with the known key of a given GPS satellite. The integration represented by the convolution will have the effect of smoothing out all noise that is not an exact match for transmitted pattern. The fact that the integration takes place over 1023 bits is what gives the 30 dB process gain (although even more gain can be gotten by integrating over the 20 millisecond bit period; but this is descending into the details.) Note, however, how this technique 'wastes bandwidth': a 1.023MHz carrier is used to encode a 50 (500 for WAAS) bit-per-second signal. If this were plain-old AM or FM radio, one would need only 50 Hz to transmit a 50BPS signal. Three orders of magnitude of bandwidth use are traded for three orders of magnitude of signal-to-noise suppression. It is the use of a 'pseudo-random' key that provides the conversion of bandwidth to SNR process gain.
ETI are well aware of the generic theory of PRN-based PSK encoding schemes. No doubt, they know of some we do not. It is important to stress that PRN encoding is a very 'intellectually natural' way to encode a signal, much as AM radio seems 'obvious' and 'natural' to the beginning radio student. There is nothing about PRN schemes that smells of 'human nature', something bizarre or freaky that only humans want, need or would invent. The cryptographic theory behind PRN is a natural subject in mathematics, and finds pervasive application in computing. It would be as natural to ETI as it is for us.
Of course, if an ETI was broadcasting a signal that it wanted to be heard, it would be pointless to use an excessively long encoding key, since doing so would make it very hard for us to detect the signal. But this is exactly where things get interesting. Short keys are much easier to detect, but offer less noise-suppression. In fact, the process gain is just the logarithm of the length of the repeating pseudo-random noise sequence. GPS uses a 1023-bit long PRN sequence, which is what provides the 30dB of process gain. But long keys are hard to detect. That's because the detection algorithm requires a convolution of the bit-sequence with the radio signal. The convolution will yield zero (or rather, a Gaussian noise distribution) unless the bit-sequence is within about half-a-bit of alignment with the signal, at which point the convolution strength jumps dramatically. Thus, to pick out a 1023-bit pattern, one must go through three steps:
But all this is doable, especially if you have the worlds largest supercomputer. Put yourself in the shoes of an advanced civilization that would be capable of mounting a project to transmit a strong signal. Presumably, they would be technologically advanced by many thousands of years over our own. Presumably, they might still know Moore's law to be true (or not). If we imagine the computing resources that humanity may have in 30 or 100 years, searching for a PRN needle in a thermal-noise haystack might be entirely reasonable. We know this, and, importantly, the ETI know this too. They would use this technology to provide process gain of many orders of magnitude to cut through the clutter of natural phenomena, and to cut through terrestrial interference. They know that we posses (or will soon posses) the computing power needed to find the signal. They know that it is far, far cheaper and easier to gain orders of magnitude of computing power than to build radio dishes that are orders of magnitude larger (or to build radio transmitters that are orders of magnitude more powerful. ETI may not need to build a terawatt transmitter, far less may suffice. They know this, and we know this, and thus it is natural that they would solve the problem through computing brute-force rather than through amplifier brute force. They know that computing brute-force will be a lot cheaper for us than radio-reception brute force is: they'll choose the economically viable solution.
I should also point out that once one has detected a signal, once one has locked onto a signal, once one knows the key, tracking it is easy. Once one knows the key, the carrier frequency, and the phase offset, tracking is nearly trivial. The only hard part is to try all of the different possibilities and combinations. The only hard part is the search: the actual reception, once the signal has been found, is easy. This is really the key concept being touted here: The search of PRN signals is really hard, because of the combinatorial explosion. However, the use of PRN makes possible extremely long integration periods and thus a tremendous amount of noise suppression. This, in turn, makes it possible for the transmitting civilization to use relatively low power and even (gasp!) an omni-directional antenna! It overcomes the question of whether the transmitting antenna is aimed at us, of whether we were looking while they were aiming at us.
I should point out that not all astronomical noise is 'white noise'. As the SET@Home project ably shows, there are vast amounts of pseudo-CW noise. In the SETI@Home terminology, a possible detection of a narrow-band continuous wave signal is a 'Gaussian'. They have found many, many transient Gaussians. I call them transient, because when one looks again at the same patch of sky, one almost always doesn't see them again. The origin of these 'Gaussians' is unknown. If they have a purely statistical distribution, then they are just the normal tail-end of statistically distributed noise. If the distribution of these 'false positives' is beyond the expected random distribution, then one has to look for a serious explanation. Besides ETI, they may be from astronomical sources (novas, neutron stars, etc.) or they may be of a particularly subtle form of terrestrial interference. From this experience, I think it might be appropriate to conclude that the cosmos is littered with naturally occurring, narrow-band, CW 'noise', giving searchers such as SETI@Home lots of false-positive hits. By contrast, a PRN-modulated signal would cut through this clutter; it would almost certainly not be naturally occurring. While one might be able to imagine a natural process that radiates narrow-band CW, it is much harder to imagine one that radiates PRN. Such a natural source would have to be governed by some sort of chaotic attractor that happens to have the signature of a fairly complex generating boolean polynomial. It seems unlikely. So this is the point: PRN can cut through not only white-noise clutter, but also through CW clutter. With the exception of GPS and spy-agency use, it should also cut through virtually all man-made interference.
Don't be mislead by the above paragraph. The search for PRN signals will also have a combinatorially large explosion of 'false-positives'. Because white noise is uncorrelated in the time domain, by exploring a combinatorially large space of possible signals, one will find a huge number of combinations where the white noise just happened to add instead of cancelling. The search for PRN is very hard. It's 'good' properties emerge only after the signal is found.
Finally, one should note that ETI use of PRN sequences easily explains the negative results seen so far by other SETI programs (e.g. the Harvard BETA Project). If a carrier were PSK modulated at a chipping rate of a few megahertz (for example), then the 1/2 second integration time used by the BETA project spectrometers would completely wipe out the signal. To detect a PRN signal, one needs not only to perform quadrature over long periods of time, but one needs to perform it while superimposing the same exact PRN sequence as the transmitter. Any other sequence, or a mismatch of the sequence, just wipes out the signal.
Thus, based on these arguments, I believe that it is of utmost importance to begin attempts to search for these types of signals. The search could be considerably more complex, and take much much longer than searches made so far. But in a different sense, it seems to have a much, much higher probability of success. It is far more natural. I would love to participate in such an effort.
However, this 200dB gap can be bridged: if the transmitter were more powerful, if the transmitting antenna were more focused, if the transmitter were closer, and if the receiving antenna was larger and used modern space-communications electronics, then the gap between thermal noise in the receiving antenna and the received signal could drop to some 60 dB, which could be easily achievable with PRN sequences in the million-chip range, and a communications data rate of about 1 bit per second. To find which million-chip sequence was being used, and at what carrier frequency, is conceivable using todays technology. In fact, I believe that one could use technology that is available today, purchased with entirely reasonable civilian, non-governmental budgets, and communicate effectively with the nearest stars, were there anybody there to communicate with. To back this claim, more accurate estimates of the various gains & losses are needed.
The above discussion begs the question of why anyone would transmit such a signal towards earth. There are two important classes of transmitters: those who intentionally transmit towards us, and those whose signal accidentally just happens to splash onto the earth. The intentional transmitters are presumably those who have made a sky survey, identified the nearest, most-likely habitable solar systems, and are beaming messages in such a fashion as to make it easy for us to detect them. The second class would be those who are using this technique to communicate with someone else, and the earth just happens to be grazed by their signal. We should presume that such a signal would be intentionally difficult for us to receive and decode, and would be encrypted. An interesting tell-tale would be, though, another signal coming from the opposite direction of the sky!
Needed Estimates:
Note that a modified fast-Fourier technique can be used to quickly search multiple PSK combinations. This insight implies that it searching for FSK (frequency shift-keyed) signal is roughly as hard as PSK, since the FFT provides the energy bins which must be summed in every possible combination to find the FSK pattern. Note that the large number of combinations makes this seem like an NP problem, but I think a simple hill-climbing search can be pretty good. First one looks for a 'hill' on a fairly short integration period. One then needs only to compare the 'hill' to where the next repeat of the PRN pattern. If there's no hill there, then nothing, and no signal was detected. If there is one, then one needs to examine a few nearest neighbors as well. This is no longer an NP search, this is a polynomial-time search. Right? Or did I confuse myself?
The 300MHz bandwidth is the approximate width of the water window. Arguments developed in later sections indicate that a PRN signal is likely to occupy the entire width of the window, thus motivating the use of this number for the noise calculations.
The theorem also makes an important statement about minimum energy
levels. The noise power N = BN0 is a product of the
receiver bandwidth times the noise density. The signal power
S = R Eb is a product of the actual bit rate times
the energy per bit. Substituting for S and N, and using the
inequality R < C, we get an expression for the minimum energy
per transmitted data bit:
(B/R) (2R/B - 1) < Eb/N0
For R << B that we are considering, the above can be expanded
into
ln 2 (1+ (R/2B) ln 2) N0 < Eb
which defines the minimum energy per transmitted data bit.
If the energy per bit is less than this, there is no way,
using any encoding scheme, to receive the data. This limit
is called the Shannon Limit.
If you don't have a background in signalling theory, but know integrals and sine waves and noise, then you can understand the theorem by thinking about a sine wave added to white noise. To pull out the sine-wave signal, one must integrate/convolve with a sine wave. Think about how long one needs to integrate this combination before the integral of the sine-wave part equals/exceeds the integral of the noise part. Note that the inverse of this time period is the channel capacity.
For a room-temperature feedhorn, ln2 N0 = ln2 kT = 2.8E-21 Joules is the minimum bit energy. By comparison, the signal received from a one-megawatt transmitter driving an omni-directional antenna 100 light-years distant from us will be about 9E-32 Joules/(second meter2). To bridge this nearly eleven orders of magnitude, we need to hope that the ETI use a signalling rate of 0.01 Hz (for two orders of magnitude), we need a cryogenic feedhorn (for another one-n-a-half orders of magnitude), the hope that ETI is using a 10 megawatt transmitter (instead of one) and a directional antenna, and/or they're closer, and finally, a collecting area in the tens-of-thousands square meter range. Ugh. Only then can we expect the energy-per-bit to exceed the Shannon Limit imposed by thermal Johnson noise in the preamp. These limits and theoretical considerations apply equally to a simple, Morse-code-like CW modulation that traditional SETI searches look for, as it would to a PRN encoding scheme. PRN modulation does not provide a mechanism to get around the Shannon Limit.
A large collecting area can beef up the collected power to drown out the Johnson noise. However, roughly 10dB later, the Cosmic Microwave Background (CMB) kicks in, and simply increasing dish size does not change the amount of CMB received. The amount of extra CMB power received by the larger dish is exactly negated by the reduced CMB power due to the sharper focus of the dish. The dish area and the diffraction-limited solid angle vary in inverse proportion, and cancel out in the black-body noise formula. The total noise power is independent of the dish size. The Shannon Limit applies to all sources of noise in a communications channel: it applies to the CMB as well. The best way to minimize CMB in the communications channel is to make the solid angle as small as possible for a given collecting area size, using synthetic aperture/phased array interferometer techniques. A good way to understand this is to imagine building a traditional imaging radio astronomy telescope. Each pixel picks up a chunk of the CMB noise for that part of the sky. Making each pixel smaller (increasing the angular resolution) decreases the amount of sky viewed and thus decreases the CMB received in that pixel. Increasing the collecting area of the telescope makes each pixel brighter, increasing the amount of both signal (if any) and noise collected. If we now imagine that only one of the pixels contains a SETI source, and none of the others do, then we can throw away the other pixels (and their noise contribution), and focus on the one with a signal. (Of course we have to look at each pixel, first).
In an earlier section, we noted that CMB noise at this frequency (water window, 1.5 GHz) in a diffraction-limited dish is about 4E-22 Watts/Hz = 4E-22 Joules, independent of the dish size. The large dish needed to collect enough bit-energy to overcome Johnson noise should also be enough to overcome the CMB, more or less. Since one cannot build a huge dish (except at Aricebo), one must bolt together a bunch of small dishes. To keep the CMB noise from becoming additive, we need to hook up the dishes using a synthetic aperture technique so as to keep the beamwidth down. To provide a more accurate estimate, we need to look at how noise propagates through synthetic aperture electronics, including the down-conversion (digital and/or analog), Nyquist aliasing of the noise in the digitizer, and the fact that this needs to be broad-band to make the PRN detection work ... This requires some hefty work.
To recap: The Shannon-Hartley theorem shows that PRN trickery can serve to pluck a weak signal out of heavy noise. However, it only works up to a point; to go beyond that point, one needs to use antenna trickery to increase the amount of collected energy per bit, while using very narrow beam-widths to minimize noise coming from CMB and galactic sources.
ToDo: Determine if synchrotron radiation is important, and if so, how it affects the signalling theory (since its not (?) white noise). In particular, what is the autocorrelation function for galactic noise in the water window?
However, there may be some interesting games that individuals/amateurs can play, using high-bandwidth Internet connections to build very large synthetic aperture scopes. The basic idea is that 50KHz of bandwidth can be digitized into an approximate 400 kbit/second digital stream, which can be piped across DSL/Cable Internet connections. These can then be combined in more-or-less realtime to provide high-resolution images. Accurate timestamps are needed so that the raw signals can be time-correlated to create the synthetic aperture. The GPS signal can provide a (relatively) cheap microsecond-accurate timestamp. I don't know if this timestamp is enough, or if calibrated atomic clocks are needed. Hmm.... Can one even do interferometry after down-conversion, or must it be done before? I would think so, but maybe one looses too much data after narrow-band filtering. It shouldn't be too hard to make this work, and it could be fun! Small dishes on opposite ends of a continent could potentially generate some rather dramatic high-resolution images of traditional radioastronomy sources!
One of the interesting aspects of such a stunt might be that most terrestrial interference sources are geographically localized, and thus will appear in only one or a few antennas. Thus, antennas scattered across a continent can provide a degree of interference immunity. Satellites, such as GPS, and esp. geostationary satellites, can still blanket a continent, unfortunately.
The biggest problem with arrays of small dishes is their poor performance in rejecting Johnson noise. For example, if 100 small dishes are aimed at the same signal, the total signal power will be 100 times larger than it is for one dish. Noise power from Gaussian white noise sums as the root-mean-square: the total noise power of an array of 100 dishes will be only 10 times that for a single dish. Thus, the S/N ratio for 100 dishes is 100/10 = 10 times better than for a single dish. By comparison, one can achieve the same gain in S/N by increasing the diameter of a dish by sqrt(10) = 3.2. That is, a single 6-meter dish has better S/N characteristics than 100 2-meter dishes. In terms of total cost (including operating cost), building & running a single 6-meter dish is cheaper than 100 2-meter dishes.
Assume a pseudo-random binary bit sequence of length N
is given by bk for integer k in [0,N-1].
Use this to phase-shift modulate a carrier of frequency W. The phase
shift is p, the length of each bit is T. Then the
f(t) = Sumk=0N-1 Sk(t)
sin (Wt + Sk(t) p bk)
is the signal defined on the time interval [0,NT]. We used the notation
Sk(t) to be a step function that is one on the
interval t=[kT,(k+1)T] and is zero otherwise. To make f(t)
quasi-periodic with period NT, one chooses a frequency W such that
NTW = 2pi m for some integer m. The PRN (pseudo-random noise bit
sequence) then repeats with period NT. This signal, as described
above, does not carry any data; to modulate the signal with data,
one inverts all of the bits to denote a one, or leaves them to
denote a zero. That is, the maximum possible data rate is
1/NT, whereas the 'chipping' rate is 1/T.
If the data consists of all zero's, then f(t) is absolutely periodic,
and a discrete Fourier transform is appropriate to describe the
signal. The signal is given by
f(t) = Sumn=0inf
cn cos (2 pi nt / NT)
+ sn sin (2 pi nt / NT)
Solving for the Fourier coefficients cn and sn
one gets
cn =
1/ (2 pi (m+n)) Sumk=0N-1
[ cos [2pi (m+n) k/N + p bk]
- cos [2pi (m+n) (k+1)/N + p bk] ]
+ 1/ (2 pi (m-n)) Sumk=0N-1
[ cos [2pi (m-n) k/N + p bk]
- cos [2pi (m-n) (k+1)/N + p bk] ]
and
sn =
1/ (2 pi (m+n)) Sumk=0N-1
[ sin [2pi (m+n) k/N + p bk]
- sin [2pi (m+n) (k+1)/N + p bk] ]
- 1/ (2 pi (m-n)) Sumk=0N-1
[ sin [2pi (m-n) k/N + p bk]
- sin [2pi (m-n) (k+1)/N + p bk] ]
Note that if N is the product of small primes, then it is very
likely that adjacent terms will cancel each other out. If
N is a large prime, then adjacent terms will almost certainly
not cancel (unless N divides (m+n) or (m-n) which is unlikely).
Overall, there is more energy spread over a broader bandwidth
if N is prime, although it would be nice to back this
claim with a proof.
From this formula, we can also see that most of the energy will be carried by about 4N coefficients centered about m=n. Thus, the longer the PRN sequence, the more Fourier coefficients need to be considered. From this, one can see that narrow-band interference will wreck maybe a few of the coefficients without spoiling the signal as a whole. In other words, the larger the N, the better the interference suppression.
Based on the above calculations, we can place some limits:
For a given length N, how many different PRN sequences are there? The answer depends on how 'good' one wants the PRN sequence properties to be. A 'good' sequence will have the same Poisson distribution of same-bit lengths as white noise. A 'good' sequence will spread power evenly over all of the Fourier coefficients of of the signal. In fact, most random sequences will be 'good' sequences, and there are approximately 2N of these. For N=1E10 this is insanely large; it is impossible to check all of these. Thus, we need to limit the number of sequences that might be searched for. To do so, we need to guess the kinds of sequences that ETI might choose to use.
An obvious set of candidates are those sequences that have small polynomial generators: i.e. those that can be generated by small shift registers whose various taps are XOR'ed together and fed back. Sequences of length N will typically have polynomial generators of degree log2 N. Thus, for a bit sequence of approximate length N = 232 we can work with a shift register of about 32 bits. There are 232 ways to XOR the 32 taps together; however, most of these will fail to generate long sequences. I believe (this needs support/derivation) that about sqrt (232 = 216 polynomials are reasonable PRN generators. If this is correct, then there are approx 1E10 x sqrt(1E10) = 1E15 different PRN sequences that one might have to search for. This is starting to get to be a dauntingly large number. Don't forget that for any given sequence, one has to try each of the N different bit alignments, which implies 1E10 x 1E15 = 1E25 possibilities to search. Each of these in turn needs to be tried at many different frequencies and chirp rates. One might limit the later searches to chirp rates that assume that the transmitter has been corrected to appear stationary in the galactic frame of reference. However, one might also like to limit the number generators to attempt.
To this end, one might attempt to look for notable primes and generators. Are there notable, 'famous' primes in the vicinity that are 'preferable' in some way? e.g. 232+-1? Are there notable generators? e.g. generators that not only have long sequences and good PRN properties, but are also notable by their relationship to other famous problems, e.g. finite/sporadic groups, Fermat's Last Theorem, etc.?