machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic
INDUCTIVE INFERENCE OF FUNCTIONS BY PROBABILISTIC STRATEGIES
Karlis Podnieks
Institute of Mathematics and Computer Science
University of Latvia
Translation of papers [Po 75] and [Po 77-2] published in Russian.
For the results (without proofs) in English see also [FBP 91].
Table of contents
1. Definitions and results
2. Proofs
3. Proofs of Lemmas 2, 3
4. Proof of Lemma 4
5. Exclusion algorithm
6. Tail reduction proof
References
Abstract
The following model of inductive inference is considered. Arbitrary numbering tau = {tau0, tau1, tau2, ... } of total functions N->N is fixed. A "black box" outputs the values f(0), f(1), ..., f(m), ... of some function f from the numbering tau. Processing these values by some algorithm (a strategy) F we try to identify a tau-index of f (i.e. a number n such that f = taun). Strategy F outputs hypotheses h0, h1, ..., hm, ... If lim hm = n and taun = f, we say that F identifies in the limit tau-index of f. The complexity of identification is measured by the number of mindchanges, i.e. by Ftau(f) = card{m | hm <> hm+1 }. One can verify easily that for any numbering tau there exists a deterministic strategy F such that Ftau(taun) <= n for all n. This estimate is exact (see [Ba 74], [FBP 90]). In the current paper the corresponding exact estimate ln n + o(log n) is proved for probabilistic strategies. This result was published without proof in [Po 75]. Proofs were published in [Po 77-2] (in Russian).
Probabilistic strategy is defined by some:
a) probability space (W, B, P), where W is the set of elementary events, B - a Borel field over subsets of W, P - a probability measure over B;
b) mapping M: N*->Z, where N* is the set of all finite sequences of natural numbers, Z - the set of all random variables over the space (W, B, P) taking their values from NU{oo} (oo means "undefined").
Thus M associates with each elementary event e in W some deterministic strategy Fe. The hypothesis Fe (<f(0),...,f(m)>) takes its values n with fixed eprobabilities P{Fe(<f(0),...,f(m)>)=n}.
Computable (recursive) probabilistic strategies can be defined by means of probabilistic Turing machines introduced first in [LMS 56]. Let a random Bernoulli generator of the distribution (1/2, 1/2) be fixed. The generator is switched into deterministic "apparatus" of a Turing machine. As a result, the operation of the machine becomes probabilistic, and we can speak of the probability that the operation satisfies certain conditions.
Consider the following Turing machine M operating with a fixed Bernoulli generator. With input sequence
f(0), f(1), ..., f(m), ...
this machine prints as output an empty, finite or infinite sequence of natural numbers (hypotheses):
h0, h1, ..., hm, ...,
where hm depends only on the values f(0), ..., f(m). To each infinite realization of Bernoulli generator's output (i.e. an infinite sequence of 0's and 1's) corresponds a completely determined operation of the machine M as a deterministic strategy.
By P{M, tau, f} we denote the probability that a probabilistic strategy M identifies in the limit tau-index of the function f. By P{M, f, <=k} we denote the probability that strategy M makes no more than k mindchanges by the function f.
Let us denote by f[m] the code of <f(0), ..., f(m)>, then the random variable M(<f(0), ..., f(m)>) can be denoted by M(f[m]). By Pm(M, f) we denote the probability that M changes its hypothesis at the step m, i.e. P{M(f[m+1])<>M(f[m])}.
First, let us consider some sufficient condition for P{M, tau, f}=1. We will say that strategy M is tau-consistent on the function f if, for all m,
a) M(f[m]) is always defined (i.e. defined for all events e in W),
b) if M(f[m])=n for some event e in W, then taun(j)=f(j) for all j<=m.
Thus, consistent strategies do not output "explicitly incorrect" hypotheses.
THEOREM 1. For any enumerated class (U, tau) there is a probabilistic strategy M and a constant C>0 such that: a) M always identifies in the limit tau-index of each function f in U, and b) M changes its mind by the function taun no more than ln n+C*sqrt(ln n)*lnln n times with probability ->1 as n->oo. For a computable numbering tau, a computable probabilistic strategy M can be constructed..
THEOREM 2. For any countable set FI of probabilistic strategies there exists an enumerated class (U, tau) and a constant C>0 such that: for any strategy M in FI, which identifies in the limit tau-index of each function f in U, there is an increasing sequence {nk} such that M changes its mind by the function taunk at least ln n - C*sqrt(ln n)*lnln n times with probability ->1 as k->oo. For the class of all computable probabilistic strategies, a computable numbering tau can be constructed.
The upper bound ln n can be proved by means of a strategy from [BF 72]. Essential difficulties arise, however, not in the construction of the strategy but in its analysis.
The main idea is as follows. Let us suppose that the function f in the "black box" which outputs the values f(0), f(1), f(2), ..., is chosen at random from the numbering tau, according to some probability distribution pi={pin} (pin is the probability that f=taun, pi1+pi2+pi3+... =1). Then, having received the values f(0), ..., f(m), let us consider the set Em of all tau-indices "suitable" for such f, i.e.
Em = {n | (Aj<=m) taun(j) = f(j)}.
It would be natural to output as a hypothesis any tau-index n in Em with probability pin / s, where s = Sum{pin |n in Em}.
To put it precisely, we define a probabilistic strategy BFtau,pi as follows. Let tau be any numbering of total functions. Take some probability distribution {pin}, where pin>0 for all n, and pi1+pi2+pi3+... =1.
If the set E0 = {n | taun(0)=f(0)} is empty, then we put BFtau,pi(f[0]) undefined with probability 1. If E0 is non-empty, we put BFtau,pi(f[0])=n with probability pin/s for every n in E0, where s = Sumn{pin | n in E }.
Let us assume now that the hypotheses BFtau,pi(f[j]) have already been determined for j<m, and BFtau,pi(f[m-1])=p. If p is "undefined", then we set BFtau,pi(f[m]) undefined with probability 1. Else, if taup(m)=f(m) (i.e. the hypothesis p is correct also for the next argument m), we set BFtau,pi(f[m])=p with probability 1.
Suppose now that taup(m)<>f(m). Let us take the set of all appropriate (for the present) hypotheses, i.e.
Em = {n | (Aj<=m) taun(j) = f(j)}.
If Em is empty, then we put BFtau,pi(f[m]) undefined with probability 1. If Em is nonempty, we put BFtau,pi(f[m])=n with probability pn/s for every n in Em, where s = Sumn{ pin | n in Em }.
Of course, BFtau,pi always identifies in the limit tau-index of each function f in U.
LEMMA 1. Let {Xm} be a series of independent random variables taking values from {0,1} in such a way that: a) SummXm is always finite, b) E = SummP{Xm=1}is finite. Then for any number t>0:
P{ |SummXm - E|>=t*sqrt(E) } <= 1/t2.
PROOF. Immediately, by Chebishev inequality.
Lemma 1 allows deriving upper and lower bounds of the number of mindchanges from the corresponding bounds of Summ Pm(M, f).
LEMMA 2. For all n,
Summ{ Pm(BFtau,pi, taun) } <= ln(1/pin).
LEMMA 3. Let a function f of the numbering tau be fixed. Then the following events are independent:
Am = { BFtau,pi(f[m]) <> BFtau,pi(f[m+1]) }, m = 0, 1, 2, ...
It is curious that the events Am (i.e. "at the m-th step BFtau,pi changes its mind") do not display any striking indications of independence; nevertheless, they do satisfy the formal independence criterion.
If we take
pi'n = c / (n* (ln n)2),
with the convention that 1/0=1 and ln 0 = 1, then by Lemma 2 the sum of the probabilities of BFtau,pi changing hypothesis by the function taun will be less than ln n + 2lnln n - ln c. Lemma 3 and Lemma 1 (let us take t = lnln n) allow us to derive from this that, as n->oo, with probability ->1, the number of mindchanges of BFtau,pi' by taun does not exceed ln n + C*sqrt(ln n)*lnln n, thus proving Theorem 1. For proofs of Lemmas 2, 3 see Section 3.
It is easy to verify that if the numbering tau is computable, then the strategy BFtau,pi' can be modified so that it becomes computable (see Section 3).
The lower bound ln n of Theorem 2 is proved by diagonalization. Let {Mi} be some numbering of strategies of the set FI. In the numbering tau to be constructed, for each strategy Mi an infinite sequence of blocks Bij is built in, each block consisting of a finite number of functions taun. If Mi identifies (with probability 1) tau-indices of all functions of Bij, then by some function of this block Mi will change its mind sufficiently often.
To construct an individual block Bij the following Lemma 4 will be used.
Let {Zj} be a sequence of independent random variables taking values 0,1 in such a way that
P{Zj = 1} = 1/j, P{Zj = 0}=1- 1/j.
Thus,
P{Z2=1}+ ... +P{Zn=1}= 1/2+...+1/n = ln n+O(1).
Let us take t = lnln n in Lemma 1, then for some C>0, as n->oo,
P{ Z2+...+Zn >= ln n - Csqrt(ln n)lnln n) }->1.
Now one can understand easily the significance of
LEMMA 4. Let M be a probabilistic strategy, k and n - natural numbers with k<n, e>0 - a rational number, gamma - a binary string. Then there is a set of n functions s1, ..., sn such that: a) each function sj has gamma as initial segment, and b) if M identifies with probability 1 s-indices of all functions sj, then by one of these functions M changes its mind >=k times with probability
>=(1-e) P{ Z2+...+Zn >=k }.
If M is recursive strategy, then the set s1, ..., sn can be constructed effectively.
Now, to prove Theorem 2, we derive the block Bij from the set of functions of Lemma 4 for
M = Mi, n = 2j, k = ]j*ln 2 - sqrt(j)ln j[, e = 2-j , gamma = 0...01
with <i, j> zeros in gamma, where <i, j> is Cantor's number of the pair (i, j).
Let us introduce a special coding of triples (i, j, s) such that 0<=s<=2j (see [BF 74]). The code of (i, j, s) is defined as the binary number
100....0at...a0100....0,
----------(*)
|------ j ----|--|-- i --|--------------
where at...a0 is the number s. Clearly,
2i+j+1 + 2i <= code(i, j, s) <= 2i+j+2 - 2i.
Now the numbering tau can be defined as follows. If n is (*) for some (i, j, s), 0<=s<=2j, then taun is the s-th function of the block Bij. Else taun is set equal to zero.
Let a probabilistic strategy M in FI identify with probability 1 tau-indices of all functions of the numbering tau. Then M=Mi for some i, and for each j>1 the block Bij contains a function by which M changes its mind at least ]j*ln 2 - sqrt(j) ln j[ times with probability
>= (1-2j) P{ Z2+...+Zn >= ]j*ln 2 - sqrt(j) ln j[ } -------------(**)
Let us denote by nj the (unique) tau-index of this "bad" function. Obviously,
2i+j+1 < nj < 2i+j+2.
Hence, {nj} is an increasing sequence, and
log2 nj -i-2 < j < log2 nj -i-1.
By the function taunj the strategy M changes its mind at least
j*ln 2-sqrt(j) ln j > ln nj - Csqrt(ln nj)lnln nj
times with probability (**) which ->1 as j->oo, thus proving Theorem 2.
For a proof of Lemma 4 see Section 4 and Section 5.
machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic