Barren plateaus in quantum neural network training landscapes

4 minute read

Published:

Summary:

When random circuit anzats are used for VQCs, the probablity of vanishing gradients becomes large as we increase the number of qubits.

Details:

  • VQCs typically tend to rely upon the optimization of some parameterized unitary circuit with respect to an objective function that is typically a simple sum of Pauli operators or fidelity with respect to some state.

  • As with any non-linear optimization, the choice of both the parameterization and the initial state is important.

  • Within quantum simulation, using parametrized random circuit approach has been referred to as a “hardware efficient ansatz”. This is in contrast to the previous proposals, such as the variational quantum eigensolver, which used parametrized structured circuits inspired by the problem at hand.

  • in the quantum case, the estimation of even a single gradient component will scale as O(1/ε^α) for some small power α as opposed to classical implementations where the same is achieved in O(log(1/ε)) time, where ε is the desired accuracy in the gradient.

  • for a large class of random circuits, the average value of the gradient of the objective function is zero

  • the probability that any given instance of such a random circuit deviates from this average value by a small constant ε is exponentially small in the number of qubits: Levy’s lemma. The fraction of states that fall outside a fixed angular distance from zero along any coordinate decreases exponentially in the number of qubits.

  • we use approximations to 1-design to get gradient and approximation to 2-design to get variances. The deeper the quantum circuit, the better this approximation.

  • gradient becomes the function we are averaging. This would be zero (due to symmetry) and variance from the expectation of the gradient decreases exponenially from the average of zero in the number of qubits.

  • The region where the gradient is zero does not correspond to local minima of interest, but rather an exponentially large plateau of states that have exponentially small deviations in the objective value from the average of the totally mixed state.

Definitions:

  • continuous function: continuous change in x –> continuous change in f(x)

  • uniformly continuous: if x1 and x2 are close (fixed for any two points beforehand), so are f(x1) and f(x2)

  • Lipschitz continuous: this is a strong form of uniform continuity. Intuitively, a Lipschitz continuous function is limited in how fast it can change: for any x1 and x2, the absolute value of the slope of the line connecting them is not greater than some real number; the smallest such real bumber (bound) is called the Lipschitz constant of the function.

  • measure: generalized volume of a subset A of a set X; not all subsets will be measureable. given measureable space(set X, sigma algebra A), measure mu is a map from A to [0,inf] such that mu(empty set)=0 and sum of mu(A_i) = mu(countable union of A_i) for disjoint A_i. Thsi gives measure space (X,A,mu). A probability measure is a measure with total measure one – mu(X) = 1. A probability space is a measure space with a probability measure.

  • sigma algebra A: a family of subsets of power set of set X that fulfill 3 requirements (contains empty and power set, the complement of any member also belongs to this family and we can have countably infinite unions on the members which also belong to this family). these members will be measureable sets.

  • Borel sigma-algebra: sigma algebra on a topological space generated by its open sets. B(X) = sigma(T) where X are the open sets and T is the topological space.

  • concentration of measure: given a metric space (X,d) with a measure mu on the Borel sets with mu(X)=1. given a subset B such that mu(B)>=1/2, conc(ε)=sup{mu(X\B_ε)}, where B_ε = {x given d(x,B)< ε}. the space X exhibits a concentration phenomenon if conc(ε) decays very fast as ε grows.

  • levy’s lemma: each n-dim quantum state psi has n complex variables z_i. each z_i = x_i + i y_i where x_i^2 + y_i^2 =1 means they are bound on a unit circle. all n variables are bound on a sphere in R^(2n). see

  • Haar measure: distribution over the U(N) of N-dim unitaries.

  • unitary t-design: We can use unitary design as a shortcut to evaluate the average of a polynomial function (with max degree t). rather than integrating the function over the whole Haar measure distributed unitaries, use a set of ranodmly sampled set of unitary matrices andaverageover them. This would be exact unless we dont use the required number of sampled unitaries, in which case we get an approximation to the unitary t-design.

refs: quantum expectation values precision: 36