A theory of the learnable
Published:
Summary: PAC learning
The main contribution of this paper is that it shows that it is possible to design learning machines that have all three of the following properties:
- The machines can provably learn whole classes of concepts. Furthermore, these classes can be characterized.
- The classes of concepts are appropriate and nontrivial for general-purpose knowledge.
- The computational process by which the machines deduce the desired programs requires a feasible (i.e., polynomial) number of steps.
details:
We shall say that a concept Q has been learned if a program for recognizing it has been deduced.
- we have a set of positive examples given by some unknown underlying distribution. There are two ways information is given:
- routine: “EXAMPLES”; a call to this gives one positive example v out: F(v)=1 where v is sampled according to D(v).
- routine “ORACLE”: give it v and it returns the label; F(v)=0 or F(v)=1.
- boolean classes considered with t boolean variables and their oracle calls:
- k-CNF –> polynomial; (2t)^k+1 calls to EXAMPLES (no ORACLE calls required). start with g as the product of all possible clauses (each has k literals –> we have 2t possible literals from t variables so we can get (2t)^k possible clauses). so g is the most general k-CNF possible. Then for each example from the routine, delete the clause that does not satisfy the literals value in the example.
- monotone DNF –> polynomial; dt calls to ORACLE and poly in d calls to EXAMPLES, where d is the degree of the DNF expression we want to learn. satrt with g=0. for each example v, add monomial to g that satisfies v. Class of DNF is learnable (only functions not concepts) if we only work with total vectors; D(v)=0 if v is not total. The question as to whether monotone DNF expressions can be learned from EXAMPLES alone is open.
- arbitrary mu-expressions –> only poly use of oracles with t^3 calls; set of examples not required.
output: boolean expression that closely approximates the target concept.
- we say that a class X of programs is learnable with respect to a given learning protocol iff there exists an algorithm A {the deduction procedure} invoking the learning protocol with the following properties:
- runs in time polynomial in VC-dim, eps, delta (1/h used in paper) and number of variables
- for all functions in X and distributions D, output g makes error with small probability.
Definitions:
- vector: given t boolean variables, a vector is an assignment to each of the t variables of a value from {0,1,-}. The symbol - denotes that a variable is undetermined.
- total vector: A vector is total if every variable is determined.
- Boolean function: maps a total vector from {0,1}^t to {0,1}.
- concept: boolean function whose domain is all vectors. For a vector v, F(v) = 1 iff F(w) = 1 for all total vectors w that agree with v on all variables for which v is determined.
literal q: either a boolean variable p or the negation ~p of a variable.
- clause: a sum q_1 + q_2 + … + q_i of literals
- CNF: any product c_1c_2…c_r of clauses. For example, (p_1 + ~p_2) * (~p_1 + ~p_2 + p_3) is a CNF expression.
- k-CNF: CNF expression where each clause is the sum of at most k literals.
- for two concepts, F and G, F –> G means that for all vectors v whenever F(v) = 1, it is also the case that G(v) = 1.
So we can denote both boolean expressions and vectors by the concepts they represent. The concept represented by expression f is obtained by considering the Boolean function computed by f and extending its domain to become a concept.
- monomials: product of literals q_1 * q_3 * q_i
- DNF: any sum m_1 + m_2 + … + m_r of monomials. For example, p_1 * ~p_2 + p_1 * p_3 * ~p_4 is a DNF expression.
- monotone expression: no variable is negated in it. For example, p_1 * p_2 + p_1 * p_3 * p_4 is a monotone DNF expression.
- prime DNF: A monomial m is a prime implicant of a function F if m –> F and if m’ ~–> F for any m’ obtained by deleting one literal from m. A DNF expression is prime if it consists of the sum of prime implicants none of which is redundant in the sense of being implied by the sum of the others. There is always a unique prime DNF expression in the monotone case, but not in general.
degree of a DNF expression: largest number of monomials that a prime DNF expression equivalent to it can have.
- mu-expression: each variable occurs just once. we can assume them to be monotone.
refs: Angluin, D., Smith, C.H. Inductive inference: Theory and methods. Comput. Surv. 15, 3 (Sept. 1983)