Matematički blogovi
A tool to verify estimates, II: a flexible proof assistant
In a recent post, I talked about a proof of concept tool to verify estimates automatically. Since that post, I have overhauled the tool twice: first to turn it into a rudimentary proof assistant that could also handle some propositional logic; and second into a much more flexible proof assistant (deliberately designed to mimic the Lean proof assistant in several key aspects) that is also powered by the extensive Python package sympy for symbolic algebra, following the feedback from previous commenters. This I think is now a stable framework with which one can extend the tool much further; my initial aim was just to automate (or semi-automate) the proving of asymptotic estimates involving scalar functions, but in principle one could keep adding tactics, new sympy types, and lemmas to the tool to handle a very broad range of other mathematical tasks as well.
The current version of the proof assistant can be found here. (As with my previous coding, I ended up relying heavily on large language model assistance to understand some of the finer points of Python and sympy, with the autocomplete feature of Github Copilot being particularly useful.) While the tool can support fully automated proofs, I have decided to focus more for now on semi-automated interactive proofs, where the human user supplies high-level “tactics” that the proof assistant then performs the necessary calculations for, until the proof is completed.
It’s easiest to explain how the proof assistant works with examples. Right now I have implemented the assistant to work inside the interactive mode of Python, in which one enters Python commands one at a time. (Readers from my generation may be familiar with text adventure games, which have a broadly similar interface.) I would be interested developing at some point a graphical user interface for the tool, but for prototype purposes, the Python interactive version suffices. (One can also run the proof assistant within a Python script, of course.)
After downloading the relevant files, one can launch the proof assistant inside Python by typing from main import * and then loading one of the pre-made exercises. Here is one such exercise:
>>> from main import * >>> p = linarith_exercise() Starting proof. Current proof state: x: pos_real y: pos_real z: pos_real h1: x < 2*y h2: y < 3*z + 1 |- x < 7*z + 2This is the proof assistant’s formalization of the following problem: If are positive reals such that and $y < 3z+1$, prove that $x < 7z+2$.
The way the proof assistant works is that one directs the assistant to use various “tactics” to simplify the problem until it is solved. In this case, the problem can be solved by linear arithmetic, as formalize dby the Linarith() tactic:
>>> p.use(Linarith()) Goal solved by linear arithmetic! Proof complete!If instead one wanted a bit more detail on how the linear arithmetic worked, one could have run this tactic instead with a verbose flag:
>>> p.use(Linarith(verbose=true)) Checking feasibility of the following inequalities: 1*z > 0 1*x + -7*z >= 2 1*y + -3*z < 1 1*y > 0 1*x > 0 1*x + -2*y < 0 Infeasible by summing the following: 1*z > 0 multiplied by 1/4 1*x + -7*z >= 2 multiplied by 1/4 1*y + -3*z < 1 multiplied by -1/2 1*x + -2*y < 0 multiplied by -1/4 Goal solved by linear arithmetic! Proof complete!Sometimes, the proof involves case splitting, and then the final proof has the structure of a tree. Here is one example, where the task is to show that the hypotheses and $(y>-2) \wedge (y<2)$ imply :
>>> from main import * >>> p = split_exercise() Starting proof. Current proof state: x: real y: real h1: (x > -1) & (x < 1) h2: (y > -2) & (y < 2) |- (x + y > -3) & (x + y < 3) >>> p.use(SplitHyp("h1")) Decomposing h1: (x > -1) & (x < 1) into components x > -1, x < 1. 1 goal remaining. >>> p.use(SplitHyp("h2")) Decomposing h2: (y > -2) & (y < 2) into components y > -2, y < 2. 1 goal remaining. >>> p.use(SplitGoal()) Split into conjunctions: x + y > -3, x + y < 3 2 goals remaining. >>> p.use(Linarith()) Goal solved by linear arithmetic! 1 goal remaining. >>> p.use(Linarith()) Goal solved by linear arithmetic! Proof complete! >>> print(p.proof()) example (x: real) (y: real) (h1: (x > -1) & (x < 1)) (h2: (y > -2) & (y < 2)): (x + y > -3) & (x + y < 3) := by split_hyp h1 split_hyp h2 split_goal . linarith linarithHere at the end we gave a “pseudo-Lean” description of the proof in terms of the three tactics used: a tactic cases h1 to case split on the hypothesis h1, followed by two applications of the simp_all tactic to simplify in each of the two cases.
The tool supports asymptotic estimation. I found a way to implement the order of magnitude formalism from the previous post within sympy. It turns out that sympy, in some sense, already natively implements nonstandard analysis: its symbolic variables have an is_number flag which basically corresponds to the concept of a “standard” number in nonstandard analysis. For instance, the sympy version S(3) of the number 3 has S(3).is_number == True and so is standard, whereas an integer variable n = Symbol("n", integer=true) has n.is_number == False and so is nonstandard. Within sympy, I was able to construct orders of magnitude Theta(X) of various (positive) expressions X, with the property that Theta(n)=Theta(1) if n is a standard number, and use this concept to then define asymptotic estimates such as $X \lesssim Y$ (implemented as lesssim(X,Y)). One can then apply a logarithmic form of linear arithmetic to then automatically verify some asymptotic estimates. Here is a simple example, in which one is given a positive integer and positive reals such that and , and the task is to conclude that :
>>> p = loglinarith_exercise() Starting proof. Current proof state: N: pos_int x: pos_real y: pos_real h1: x <= 2*N**2 h2: y < 3*N |- Theta(x)*Theta(y) <= Theta(N)**4 >>> p.use(LogLinarith(verbose=True)) Checking feasibility of the following inequalities: Theta(N)**1 >= Theta(1) Theta(x)**1 * Theta(N)**-2 <= Theta(1) Theta(y)**1 * Theta(N)**-1 <= Theta(1) Theta(x)**1 * Theta(y)**1 * Theta(N)**-4 > Theta(1) Infeasible by multiplying the following: Theta(N)**1 >= Theta(1) raised to power 1 Theta(x)**1 * Theta(N)**-2 <= Theta(1) raised to power -1 Theta(y)**1 * Theta(N)**-1 <= Theta(1) raised to power -1 Theta(x)**1 * Theta(y)**1 * Theta(N)**-4 > Theta(1) raised to power 1 Proof complete!One challenge right now is that the logarithmic linear programming solver is not currently well equipped to handle “max” type expressions, thus for instance adding some lower order terms (specifically, asking whether and implies ) currently prevents this tactic from “one-shotting” the problem:
>>> p = loglinarith_hard_exercise() Starting proof. Current proof state: N: pos_int x: pos_real y: pos_real h1: x <= 2*N**2 + 1 h2: y < 3*N + 4 |- Theta(x)*Theta(y) <= Theta(N)**3 >>> p.use(LogLinarith(verbose=True)) Checking feasibility of the following inequalities: Theta(x)**1 * Max(Theta(1), Theta(N)**2)**-1 <= Theta(1) Theta(x)**1 * Theta(y)**1 * Theta(N)**-3 > Theta(1) Theta(y)**1 * Max(Theta(1), Theta(N))**-1 <= Theta(1) Theta(N)**1 >= Theta(1) Feasible with the following values, for an unbounded order of magnitude X: Theta(y) = X**0 Theta(x) = X**1/2 Max(Theta(1), Theta(N)**2) = X**1/2 Theta(N) = X**0 Max(Theta(1), Theta(N)) = X**0Currently, LogLinarith() treats maxima such as Max(Theta(1), Theta(N)) and Max(Theta(1), Theta(N)**2) as variables independent of Theta(N), and thus misses some key (nonlinear) relations between these quantities that would allow one to prove the claim by contradiction. Currently I can prove this result within my proof assistant by a lengthier application of further tactics, but I plan to develop a more powerful version of LogLinarith() that performs the necessary case splitting to properly handle Max type terms to be able to “one-shot” problems such as the one above.
After that, I plan to start developing tools for estimating function space norms of symbolic functions, for instance creating tactics to deploy lemmas such as Holder’s inequality and the Sobolev embedding inequality. It looks like the sympy framework is flexible enough to allow for creating further object classes for these sorts of objects. (Right now, I only have one proof-of-concept lemma to illustrate the framework, the arithmetic mean-geometric mean lemma.)
I am satisfied enough with the basic framework of this proof assistant that I would be open to further suggestions or contributions of new features, for instance by introducing new data types, lemmas, and tactics, or by contributing example problems that ought to be easily solvable by such an assistant, but are currently beyond its ability, for instance due to the lack of appropriate tactics and lemmas.
Orders of infinity
Many problems in analysis (as well as adjacent fields such as combinatorics, theoretical computer science, and PDE) are interested in the order of growth (or decay) of some quantity that depends on one or more asymptotic parameters (such as ) – for instance, whether the quantity grows or decays linearly, quadratically, polynomially, exponentially, etc. in . In the case where these quantities grow to infinity, these growth rates had once been termed “orders of infinity” – for instance, in the 1910 book of this name by Hardy – although this term has fallen out of use in recent years. (Hardy fields are still a thing, though.)
In modern analysis, asymptotic notation is the preferred device to organize orders of infinity. There are a couple of flavors of this notation, but here is one such (a blend of Hardy’s notation and Landau’s notation). Formally, we need a parameter space equipped with a non-principal filter that describes the subsets of parameter space that are “sufficiently large” (e.g., the cofinite (Fréchet) filter on , or the cocompact filter on ). We will use to denote elements of this filter; thus, an assertion holds for sufficiently large if and only if it holds for all in some element of the filter . Given two positive quantities that are defined for sufficiently large , one can then define the following notions:
- (i) We write , , or (and sometimes ) if there exists a constant such that for all sufficiently large .
- (ii) We write , , or (and sometimes ) if, for every , one has for all sufficiently large .
- (iii) We write (and sometimes or ) if , or equivalently if there exist constants such that for all sufficiently large .
We caution that in analytic number theory and adjacent fields, the slightly different notation of Vinogradov is favored, in which would denote the concept (i) instead of (ii), and would denote a fourth concept instead of (iii). However, we will use the Hardy-Landau notation exclusively in this blog post.
Anyone who works with asymptotic notation for a while will quickly recognize that it enjoys various algebraic properties akin to the familiar algebraic properties of order on the real line. For instance, the symbols behave very much like , , , , with properties such as the following:
- If and , then .
- if and only if .
- If is restricted to a sequence, then (after passing to a subsequence if necessary), exactly one of , , and is true.
However, in contrast with other standard algebraic structures (such as ordered fields) that blend order and arithmetic operations, the precise laws of orders of infinity are usually not written down as a short list of axioms. Part of this is due to cultural differences between analysis and algebra – as discussed in this essay by Gowers, analysis is often not well suited to the axiomatic approach to mathematics that algebra benefits so much from. But another reason is due to our orthodox implementation of analysis via “epsilon-delta” type concepts, such as the notion of “sufficiently large” used above, which notoriously introduces a large number of both universal and existential quantifiers into the subject (for every epsilon, there exists a delta…) which tends to interfere with the smooth application of algebraic laws (which are optimized for the universal quantifier rather than the existential quantifier).
But there is an alternate approach to analysis, namely nonstandard analysis, which rearranges the foundations so that many of quantifiers (particularly the existential ones) are concealed from view (usually via the device of ultrafilters). This makes the subject of analysis considerably more “algebraic” in nature, as the “epsilon management” that is so prevalent in orthodox analysis is now performed much more invisibly. For instance, as we shall see, in the nonstandard framework, orders of infinity acquire the algebraic structure of a totally ordered vector space that also enjoys a completeness property reminiscent, though not identical to, the completeness of the real numbers. There is also a transfer principle that allows one to convert assertions in orthodox asymptotic notation into logically equivalent assertions about nonstandard orders of infinity, allowing one to then prove asymptotic statements in a purely algebraic fashion. There is a price to pay for this “algebrization” of analysis; the spaces one works with become quite large (in particular, they tend to be “inseparable” and not “countably generated” in any reasonable fashion), and it becomes difficult to extract explicit constants (or explicit decay rates) from the asymptotic notation. However, there are some cases in which the tradeoff is worthwhile. For instance, symbolic computations tend to be easier to perform in algebraic settings than in orthodox analytic settings, so formal computations of orders of infinity (such as the ones discussed in the previous blog post) could benefit from the nonstandard approach. (See also my previous posts on nonstandard analysis for more discussion about these tradeoffs.)
Let us now describe the nonstandard approach to asymptotic notation. With the above formalism, the switch from standard to nonstandard analysis is actually quite simple: one assumes that the asymptotic filter is in fact an ultrafilter. In terms of the concept of “sufficiently large”, this means adding the following useful axiom:
- Given any predicate , exactly one of the two statements “ holds for sufficiently large ” and “ does not hold for sufficiently large ” is true.
This can be compared with the situation with, say, the Fréchet filter on the natural numbers , in which one has to insert some qualifier such as “after passing to a subsequence if necessary” in order to make the above axiom true.
The existence of an ultrafilter requires some weak version of the axiom of choice (specifically, the ultrafilter lemma), but for this post we shall just take the existence of ultrafilters for granted.
We can now define the nonstandard orders of infinity to be the space of all non-negative functions defined for sufficiently large , modulo the equivalence relation defined previously. That is to say, a nonstandard order of infinity is an equivalence class
of functions defined on elements of the ultrafliter. For instance, if is the natural numbers, then itself is an order of infinity, as is , , , , and so forth. But we exclude ; it will be important for us that the order of infinity is strictly positive for all sufficiently large .We can place various familiar algebraic operations on :
- Addition. We define . It is easy to see that this is well-defined, by verifying that if and , then . However, because of our positivity requirement, we do not define subtraction on .
- Multiplication and division We define and ; we do not need to worry about division by zero, thanks to the positivity requirement. Again it is easy to verify this is well-defined.
- Scalar exponentiation If is a real number, we define . Again, this is easily checked to be well-defined.
- Order We define if , and if . Again, this is easily checked to be well-defined. (And the ultrafilter axiom ensures that holds iff exactly one of and holds.)
With these operations, combined with the ultrafilter axiom, we see that obeys the laws of many standard algebraic structures, the proofs of which we leave as exercises for the reader:
- is a totally ordered set.
- In fact, is a totally ordered vector space, with playing the role of the zero vector, multiplication playing the role of vector addition, and scalar exponentiation playing the role of scalar multiplication. (Of course, division would then play the role of vector subtraction.) To avoid confusion, one might refer to as a log-vector space rather than a vector space to emphasize the fact that the vector structure is coming from multiplication (and exponentiation) rather than addition (and multiplication). Ordered log-vector spaces may seem like a strange and exotic concept, but they are actually already studied implicitly in analysis, albeit under the guise of other names such as log-convexity.
- is a semiring (albeit one without an additive identity element), which is idempotent: for all .
- More generally, addition can be described in purely order-theoretic terms: for all . (It would therefore be natural to call a tropical semiring, although the precise axiomatization of this term does not appear to be fully standardized currently.)
The ordered (log-)vector space structure of in particular opens up the ability to prove asymptotic implications by (log-)linear programming; this was implicitly used in my previous post. One can also use the language of (log-)linear algebra to describe further properties of various orders of infinity. For instance, if is the natural numbers, we can form the subspace
of consisting of those orders of infinity which are of polynomial type in the sense that for some ; this is then a (log)-vector subspace of , and has a canonical (log-)linear surjection that assigns to each order of infinity of polynomial type the unique real number such that , that is to say for all one has for all sufficiently large . (The existence of such an follows from the ultrafilter axiom and by a variant of the proof of the Bolzano–Weierstrass theorem; the uniqueness is also easy to establish.) The kernel of this surjection is then the log-subspace of quasilogarithmic orders of infinity – for which for all .In addition to the above algebraic properties, the nonstandard orders of infinity also enjoy a completeness property that is reminiscent of the completeness of the real numbers. In the reals, it is true that any nested sequence of non-empty closed intervals has a non-empty intersection, which is a property closely tied to the more familiar definition of completeness as the assertion that Cauchy sequences are always convergent. This claim of course fails for open intervals: for instance, for is a nested sequence of non-empty open intervals whose intersection is empty. However, in the nonstandard orders of infinity , we have the same property for both open and closed intervals!
Lemma 1 (Completeness for arbitrary intervals) Let be a nested sequence of non-empty intervals in (which can be open, closed, or half-open). Then the intersection is non-empty.
Proof: For sake of notation we shall assume the intervals are open intervals , although much the same argument would also work for closed or half-open intervals (and then by the pigeonhole principle one can then handle nested sequences of arbitrary intervals); we leave this extension to the interested reader.
Pick an element of each , then we have whenever . In particular, one can find a set in the ultrafilter such that
whenever and , and by taking suitable intersections that these sets are nested: . If we now define to equal for (and leave undefined outside of ), one can check that for all , thus lies in the intersection of all the , giving the claim.This property is closely related to the countable saturation and overspill properties in nonstandard analysis. From this property one might expect that has better topological structure than say the reals. This is not exactly true, because unfortunately is not metrizable (or separable, or first or second countable). It is perhaps better to view as obeying a parallel type of completeness that is neither strictly stronger nor strictly weaker than the more familiar notion of metric completeness, but is otherwise rather analogous.
A proof of concept tool to verify estimates
This post was inspired by some recent discussions with Bjoern Bringmann.
Symbolic math software packages are highly developed for many mathematical tasks in areas such as algebra, calculus, and numerical analysis. However, to my knowledge we do not have similarly sophisticated tools for verifying asymptotic estimates – inequalities that are supposed to hold for arbitrarily large parameters, with constant losses. Particularly important are functional estimates, where the parameters involve an unknown function or sequence (living in some suitable function space, such as an space); but for this discussion I will focus on the simpler situation of asymptotic estimates involving a finite number of positive real numbers, combined using arithmetic operations such as addition, multiplication, division, exponentiation, and minimum and maximum (but no subtraction). A typical inequality here might be the weak arithmetic mean-geometric mean inequality
where are arbitrary positive real numbers, and the here indicates that we are willing to lose an unspecified (multiplicative) constant in the estimates.
I have wished in the past (e.g., in this MathOverflow answer) for a tool that could automatically determine whether such an estimate was true or not (and provide a proof if true, or an asymptotic counterexample if false). In principle, simple inequalities of this form could be automatically resolved by brute force case splitting. For instance, with (1), one first observes that is comparable to up to constants, so it suffices to determine if
Next, to resolve the maximum, one can divide into three cases: ; ; and . Suppose for instance that . Then the estimate to prove simplifies to
and this is (after taking logarithms) a positive linear combination of the hypotheses , . The task of determining such a linear combination is a standard linear programming task, for which many computer software packages exist.
Any single such inequality is not too difficult to resolve by hand, but there are applications in which one needs to check a large number of such inequalities, or split into a large number of cases. I will take an example at random from an old paper of mine (adapted from the equation after (51), and ignoring some epsilon terms for simplicity): I wanted to establish the estimate
for any obeying the constraints
where , , and are the maximum, median, and minimum of respectively, and similarly for , , and , and . This particular bound could be dispatched in three or four lines from some simpler inequalities; but it took some time to come up with those inequalities, and I had to do a dozen further inequalities of this type. This is a task that seems extremely ripe for automation, particularly with modern technology.
Recently, I have been doing a lot more coding (in Python, mostly) than in the past, aided by the remarkable facility of large language models to generate initial code samples for many different tasks, or to autocomplete partially written code. For the most part, I have restricted myself to fairly simple coding tasks, such as computing and then plotting some mildly complicated mathematical functions, or doing some rudimentary data analysis on some dataset. But I decided to give myself the more challenging task of coding a verifier that could handle inequalities of the above form. After about four hours of coding, with frequent assistance from an LLM, I was able to produce a proof of concept tool for this, which can be found at this Github repository. For instance, to verify (1), the relevant Python code is
a = Variable("a") b = Variable("b") c = Variable("c") assumptions = Assumptions() assumptions.can_bound((a * b * c) ** (1 / 3), max(a, b, c))and the (somewhat verbose) output verifying the inequality is
Checking if we can bound (((a * b) * c) ** 0.3333333333333333) by max(a, b, c) from the given axioms. We will split into the following cases: [[b <~ a, c <~ a], [a <~ b, c <~ b], [a <~ c, b <~ c]] Trying case: ([b <~ a, c <~ a],) Simplify to proving (((a ** 0.6666666666666667) * (b ** -0.3333333333333333)) * (c ** -0.3333333333333333)) >= 1. Bound was proven true by multiplying the following hypotheses : b <~ a raised to power 0.33333333 c <~ a raised to power 0.33333333 Trying case: ([a <~ b, c <~ b],) Simplify to proving (((b ** 0.6666666666666667) * (a ** -0.3333333333333333)) * (c ** -0.3333333333333333)) >= 1. Bound was proven true by multiplying the following hypotheses : a <~ b raised to power 0.33333333 c <~ b raised to power 0.33333333 Trying case: ([a <~ c, b <~ c],) Simplify to proving (((c ** 0.6666666666666667) * (a ** -0.3333333 333333333)) * (b ** -0.3333333333333333)) >= 1. Bound was proven true by multiplying the following hypotheses : a <~ c raised to power 0.33333333 b <~ c raised to power 0.33333333 Bound was proven true in all cases!This is of course an extremely inelegant proof, but elegance is not the point here; rather, that it is automated. (See also this recent article of Heather Macbeth for how proof writing styles change in the presence of automated tools, such as formal proof assistants.)
The code is close to also being able to handle more complicated estimates such as (3); right now I have not written code to properly handle hypotheses such as that involve complex expressions such as , as opposed to hypotheses that only involve atomic variables such as , , but I can at least handle such complex expressions in the left and right-hand sides of the estimate I am trying to verify.
In any event, the code, being a mixture of LLM-generated code and my own rudimentary Python skills, is hardly an exemplar of efficient or elegant coding, and I am sure that there are many expert programmers who could do a much better job. But I think this is proof of concept that a more sophisticated tool of this form could be quite readily created to do more advanced tasks. One such example task was the one I gave in the above MathOverflow question, namely being able to automatically verify a claim such as
for all . Another task would be to automatically verify the ability to estimate some multilinear expression of various functions, in terms of norms of such functions in standard spaces such as Sobolev spaces; this is a task that is particularly prevalent in PDE and harmonic analysis (and can frankly get somewhat tedious to do by hand). As speculated in that MO post, one could eventually hope to also utilize AI to assist in the verification process, for instance by suggesting possible splittings of the various sums or integrals involved, but that would be a long-term objective.
This sort of software development would likely best be performed as a collaborative project, involving both mathematicians and expert programmers. I would be interested to receive advice on how best to proceed with such a project (for instance, would it make sense to incorporate such a tool into an existing platform such as SageMATH), and what features for a general estimate verifier would be most desirable for mathematicians. One thing on my wishlist is the ability to give a tool an expression to estimate (such as a multilinear integral of some unknown functions), as well as a fixed set of tools to bound that integral (e.g., splitting the integral into pieces, integrating by parts, using the Hölder and Sobolev inequalities, etc.), and have the computer do its best to optimize the bound it can produce with those tools (complete with some independently verifiable proof certificate for its output). One could also imagine such tools having the option to output their proof certificates in a formal proof assistant language such as Lean. But perhaps there are other useful features that readers may wish to propose.
Stonean spaces, projective objects, the Riesz representation theorem, and (possibly) condensed mathematics
A basic type of problem that occurs throughout mathematics is the lifting problem: given some space that “sits above” some other “base” space due to a projection map , and some map from a third space into the base space , find a “lift” of to , that is to say a map such that . In many applications we would like to have preserve many of the properties of (e.g., continuity, differentiability, linearity, etc.).
Of course, if the projection map is not surjective, one would not expect the lifting problem to be solvable in general, as the map to be lifted could simply take values outside of the range of . So it is natural to impose the requirement that be surjective, giving the following commutative diagram to complete:
If no further requirements are placed on the lift , then the axiom of choice is precisely the assertion that the lifting problem is always solvable (once we require to be surjective). Indeed, the axiom of choice lets us select a preimage in the fiber of each point , and one can lift any by setting . Conversely, to build a choice function for a surjective map , it suffices to lift the identity map to .
Of course, the maps provided by the axiom of choice are famously pathological, being almost certain to be discontinuous, non-measurable, etc.. So now suppose that all spaces involved are topological spaces, and all maps involved are required to be continuous. Then the lifting problem is not always solvable. For instance, we have a continuous projection from to , but the identity map cannot be lifted continuously up to , because is contractable and is not.
However, if is a discrete space (every set is open), then the axiom of choice lets us solve the continuous lifting problem from for any continuous surjection , simply because every map from to is continuous. Conversely, the discrete spaces are the only ones with this property: if is a topological space which is not discrete, then if one lets be the same space equipped with the discrete topology, then the only way one can continuously lift the identity map through the “projection map” (that maps each point to itself) is if is itself discrete.
These discrete spaces are the projective objects in the category of topological spaces, since in this category the concept of an epimorphism agrees with that of a surjective continuous map. Thus can be viewed as the unique (up to isomorphism) projective object in this category that has a bijective continuous map to .
Now let us narrow the category of topological spaces to the category of compact Hausdorff (CH) spaces. Here things should be better behaved; for instance, it is a standard fact in this category that continuous bijections are homeomorphisms, and it is still the case that the epimorphisms are the continuous surjections. So we have a usable notion of a projective object in this category: CH spaces such that any continuous map into another CH space can be lifted via any surjective continuous map to another CH space.
By the previous discussion, discrete CH spaces will be projective, but this is an extremely restrictive set of examples, since of course compact discrete spaces must be finite. Are there any others? The answer was worked out by Gleason:
Proposition 1 A compact Hausdorff space is projective if and only if it is extremally disconnected, i.e., the closure of every open set is again open.
Proof: We begin with the “only if” direction. Let was projective, and let be an open subset of . Then the closure and complement are both closed, hence compact, subsets of , so the disjoint union is another CH space, which has an obvious surjective continuous projection map to formed by gluing the two inclusion maps together. As is projective, the identity map must then lift to a continuous map . One easily checks that has to map to the first component of the disjoint union, and ot the second component; hence , and so is open, giving extremal disconnectedness.
Conversely, suppose that is extremally disconnected, that is a continuous surjection of CH spaces, and is continuous. We wish to lift to a continuous map .
We first observe that it suffices to solve the lifting problem for the identity map , that is to say we can assume without loss of generality that and is the identity. Indeed, for general maps , one can introduce the pullback space
which is clearly a CH space that has a continuous surjection . Any continuous lift of the identity map to , when projected onto , will give a desired lift .
So now we are trying to lift the identity map via a continuous surjection . Let us call this surjection minimally surjective if no restriction of to a proper closed subset of remains surjective. An easy application of Zorn’s lemma shows that every continuous surjection can be restricted to a minimally surjective continuous map . Thus, without loss of generality, we may assume that is minimally surjective.
The key claim now is that every minimally surjective map into an extremally disconnected space is in fact a bijection. Indeed, suppose for contradiction that there were two distinct points in that mapped to the same point under . By taking contrapositives of the minimal surjectivity property, we see that every open neighborhood of must contain at least one fiber of , and by shrinking this neighborhood one can ensure the base point is arbitrarily close to . Thus, every open neighborhood of must intersect every open neighborhood of , contradicting the Hausdorff property.
It is well known that continuous bijections between CH spaces must be homeomorphisms (they map compact sets to compact sets, hence must be open maps). So is a homeomorphism, and one can lift the identity map to the inverse map .
Remark 2 The property of being “minimally surjective” sounds like it should have a purely category-theoretic definition, but I was unable to match this concept to a standard term in category theory (something along the lines of a “minimal epimorphism”, I would imagine).
In view of this proposition, it is now natural to look for extremally disconnected CH spaces (also known as Stonean spaces). The discrete CH spaces are one class of such spaces, but they are all finite. Unfortunately, these are the only “small” examples:
Lemma 3 Any first countable extremally disconnected CH space is discrete.
Proof: If such a space were not discrete, one could find a sequence in converging to a limit such that for all . One can sparsify the elements to all be distinct, and from the Hausdorff property one can construct neighbourhoods of each that avoid , and are disjoint from each other. Then and then are disjoint open sets that both have as an adherent point, which is inconsistent with extremal disconnectedness: the closure of contains but is disjoint from , so cannot be open.
Thus for instance there are no extremally disconnected compact metric spaces, other than the finite spaces; for instance, the Cantor space is not extremally disconnected, even though it is totally disconnected (which one can easily see to be a property implied by extremal disconnectedness). On the other hand, once we leave the first-countable world, we have plenty of such spaces:
Lemma 4 Let be a complete Boolean algebra. Then the Stone dual of (i.e., the space of boolean homomorphisms ) is an extremally disconnected CH space.
Proof: The CH properties are standard. The elements of give a basis of the topology given by the clopen sets . Because the Boolean algebra is complete, we see that the closure of the open set for any family of sets is simply the clopen set , which obviously open, giving extremal disconnectedness.
Remark 5 In fact, every extremally disconnected CH space is homeomorphic to a Stone dual of a complete Boolean algebra (and specifically, the clopen algebra of ); see Gleason’s paper.
Corollary 6 Every CH space is the surjective continuous image of an extremally disconnected CH space.
Proof: Take the Stone-Čech compactification of equipped with the discrete topology, or equivalently the Stone dual of the power set (i.e., the ultrafilters on ). By the previous lemma, this is an extremally disconnected CH space. Because every ultrafilter on a CH space has a unique limit, we have a canonical map from to , which one can easily check to be continuous and surjective.
Remark 7 In fact, to each CH space one can associate an extremally disconnected CH space with a minimally surjective continuous map . The construction is the same, but instead of working with the entire power set , one works with the smaller (but still complete) Boolean algebra of domains – closed subsets of which are the closure of their interior, ordered by inclusion. This is unique up to homoeomorphism, and is thus a canonical choice of extremally disconnected space to project onto . See the paper of Gleason for details.
Several facts in analysis concerning CH spaces can be made easier to prove by utilizing Corollary 6 and working first in extremally disconnected spaces, where some things become simpler. My vague understanding is that this is highly compatible with the modern perspective of condensed mathematics, although I am not an expert in this area. Here, I will just give a classic example of this philosophy, due to Garling and presented in this paper of Hartig:
Theorem 8 (Riesz representation theorem) Let be a CH space, and let be a bounded linear functional. Then there is a (unique) Radon measure on (on the Baire -algebra, generated by ) such for all .
Uniqueness of the measure is relatively straightforward; the difficult task is existence, and most known proofs are somewhat complicated. But one can observe that the theorem “pushes forward” under surjective maps:
Proposition 9 Suppose is a continuous surjection between CH spaces. If the Riesz representation theorem is true for , then it is also true for .
Proof: As is surjective, the pullback map is an isometry, hence every bounded linear functional on can be viewed as a bounded linear functional on a subspace of , and hence by the Hahn–Banach theorem it extends to a bounded linear functional on . By the Riesz representation theorem on , this latter functional can be represented as an integral against a Radon measure on . One can then check that the pushforward measure is then a Radon measure on , and gives the desired representation of the bounded linear functional on .
In view of this proposition and Corollary 6, it suffices to prove the Riesz representation theorem for extremally disconnected CH spaces. But this is easy:
Proposition 10 The Riesz representation theorem is true for extremally disconnected CH spaces.
Proof: The Baire -algebra is generated by the Boolean algebra of clopen sets. A functional induces a finitely additive measure on this algebra by the formula . This is in fact a premeasure, because by compactness the only way to partition a clopen set into countably many clopen sets is to have only finitely many of the latter sets non-empty. By the Carathéodory extension theorem, then extends to a Baire measure, which one can check to be a Radon measure that represents (the finite linear combinations of indicators of clopen sets are dense in ).
A suspicious conference
I have received multiple queries from colleagues who have been invited (from a non-academic email address) to speak a strange-sounding conference that is allegedly supported by major mathematical institutions, allegedly hosted at a prestigious university, and allegedly having myself (and two other Fields Medalists) as plenary speakers. The invitees are asked to pay “registration fees” upfront, with the promise of future reimbursement. (There is a bare-bones web site, which seems to be partially copy-pasted from some previous “conferences” in chemistry and physics, but I will not link to it here.)
I have not agreed (or even been asked) to participate in this event, and I can confirm the same for at least one of the other supposed plenary speakers. There is also no confirmation of the support or location claimed.
As such, this does not appear to be a legitimate scientific conference, and I would advise anyone receiving such an email to discard it.
EDIT: in order to have this post be picked up by appropriate search engine queries, the name of the alleged conference is “Infinity ’25: Horizons in Mathematical Thought”.
SECOND EDIT: I am *not* referring to the 2026 International Congress of Mathematicians (ICM), which is also sending out speaker invitations currently, and is of course an extremely legitimate conference.
Decomposing a factorial into large factors
I’ve just uploaded to the arXiv the paper “Decomposing a factorial into large factors“. This paper studies the quantity , defined as the largest quantity such that it is possible to factorize into factors , each of which is at least . The first few values of this sequence are
(OEIS A034258). For instance, we have , because on the one hand we can factor but on the other hand it is not possible to factorize into nine factors, each of which is or higher.This quantity was introduced by Erdös, who asked for upper and lower bounds on ; informally, this asks how equitably one can split up into factors. When factoring an arbitrary number, this is essentially a variant of the notorious knapsack problem (after taking logarithms), but one can hope that the specific structure of the factorial can make this particular knapsack-type problem more tractable. Since
for any putative factorization, we obtain an upper bound thanks to the Stirling approximation. At one point, Erdös, Selfridge, and Straus claimed that this upper bound was asymptotically sharp, in the sense that as ; informally, this means we can split into factors that are (mostly) approximately the same size, when is large. However, as reported in this later paper, Erdös “believed that Straus had written up our proof… Unfortunately Straus suddenly died and no trace was ever found of his notes. Furthermore, we never could reconstruct our proof, so our assertion now can be called only a conjecture”.Some further exploration of was conducted by Guy and Selfridge. There is a simple construction that gives the lower bound
that comes from starting with the standard factorization and transferring some powers of from the later part of the sequence to the earlier part to rebalance the terms somewhat. More precisely, if one removes one power of two from the even numbers between and , and one additional power of two from the multiples of four between to , this frees up powers of two that one can then distribute amongst the numbers up to to bring them all up to at least in size. A more complicated procedure involving transferring both powers of and then gives the improvement . At this point, however, things got more complicated, and the following conjectures were made by Guy and Selfridge:- (i) Is for all ?
- (ii) Is for all ? (At , this conjecture barely fails: .)
- (iii) Is for all ?
In this note we establish the bounds
as , where is the explicit constant In particular this recovers the lost result (2). An upper bound of the shape for some was previously conjectured by Erdös and Graham (Erdös problem #391). We conjecture that the upper bound in (3) is sharp, thus which is consistent with the above conjectures (i), (ii), (iii) of Guy and Selfridge, although numerically the convergence is somewhat slow.The upper bound argument for (3) is simple enough that it could also be modified to establish the first conjecture (i) of Guy and Selfridge; in principle, (ii) and (iii) are now also reducible to a finite computation, but unfortunately the implied constants in the lower bound of (3) are too weak to make this directly feasible. However, it may be possible to now crowdsource the verification of (ii) and (iii) by supplying a suitable set of factorizations to cover medium sized , combined with some effective version of the lower bound argument that can establish for all past a certain threshold. The value singled out by Guy and Selfridge appears to be quite a suitable test case: the constructions I tried fell just a little short of the conjectured threshold of , but it seems barely within reach that a sufficiently efficient rearrangement of factors can work here.
We now describe the proof of the upper and lower bound in (3). To improve upon the trivial upper bound (1), one can use the large prime factors of . Indeed, every prime between and divides at least once (and the ones between and divide it twice), and any factor that contains such a factor therefore has to be significantly larger than the benchmark value of . This observation already readily leads to some upper bound of the shape (4) for some ; if one also uses the primes that are slightly less than (noting that any multiple of that exceeds , must in fact exceed ) is what leads to the precise constant .
For previous lower bound constructions, one started with the initial factorization and then tried to “improve” this factorization by moving around some of the prime factors. For the lower bound in (3), we start instead with an approximate factorization roughly of the shape
where is the target lower bound (so, slightly smaller than ), and is a moderately sized natural number parameter (we will take , although there is significant flexibility here). If we denote the right-hand side here by , then is basically a product of numbers of size at least . It is not literally equal to ; however, an easy application of Legendre’s formula shows that for odd small primes , and have almost exactly the same number of factors of . On the other hand, as is odd, contains no factors of , while contains about such factors. The prime factorizations of and differ somewhat at large primes, but has slightly more such prime factors as (about such factors, in fact). By some careful applications of the prime number theorem, one can tweak some of the large primes appearing in to make the prime factorization of and agree almost exactly, except that is missing most of the powers of in , while having some additional large prime factors beyond those contained in to compensate. With a suitable choice of threshold , one can then replace these excess large prime factors with powers of two to obtain a factorization of into terms that are all at least , giving the lower bound.The general approach of first locating some approximate factorization of (where the approximation is in the “adelic” sense of having not just approximately the right magnitude, but also approximately the right number of factors of for various primes ), and then moving factors around to get an exact factorization of , looks promising for also resolving the conjectures (ii), (iii) mentioned above. For instance, I was numerically able to verify that by the following procedure:
- Start with the approximate factorization of , by . Thus is the product of odd numbers, each of which is at least .
- Call an odd prime -heavy if it divides more often than , and -heavy if it divides more often than . It turns out that there are more -heavy primes than -heavy primes (counting multiplicity). On the other hand, contains powers of , while has none. This represents the (multi-)set of primes one has to redistribute in order to convert a factorization of to a factorization of .
- Using a greedy algorithm, one can match a -heavy prime to each -heavy prime (counting multiplicity) in such a way that for a small (in most cases one can make , and often one also has ). If we then replace in the factorization of by for each -heavy prime , this increases (and does not decrease any of the factors of ), while eliminating all the -heavy primes. With a somewhat crude matching algorithm, I was able to do this using of the powers of dividing , leaving powers remaining at my disposal. (I don’t claim that this is the most efficient matching, in terms of powers of two required, but it sufficed.)
- There are still -heavy primes left over in the factorization of (the modified version of) . Replacing each of these primes with , and then distributing the remaining powers of two arbitrarily, this obtains a factorization of into terms, each of which are at least .
However, I was not able to adjust parameters to reach in this manner. Perhaps some readers here who are adept with computers can come up with a more efficient construction to get closer to this bound? If one can find a way to reach this bound, most likely it can be adapted to then resolve conjectures (ii) and (iii) above after some additional numerical effort.
UPDATE: There is now an active Github project to track the latest progress, coming from multiple contributors.
The three-dimensional Kakeya conjecture, after Wang and Zahl
There has been some spectacular progress in geometric measure theory: Hong Wang and Joshua Zahl have just released a preprint that resolves the three-dimensional case of the infamous Kakeya set conjecture! This conjecture asserts that a Kakeya set – a subset of that contains a unit line segment in every direction, must have Minkowski and Hausdorff dimension equal to three. (There is also a stronger “maximal function” version of this conjecture that remains open at present, although the methods of this paper will give some non-trivial bounds on this maximal function.) It is common to discretize this conjecture in terms of small scale . Roughly speaking, the conjecture then asserts that if one has a family of tubes of cardinality , and pointing in a -separated set of directions, then the union of these tubes should have volume . Here we shall be a little vague as to what means here, but roughly one should think of this as “up to factors of the form for any “; in particular this notation can absorb any logarithmic losses that might arise for instance from a dyadic pigeonholing argument. For technical reasons (including the need to invoke the aforementioned dyadic pigeonholing), one actually works with slightly smaller sets , where is a “shading” of the tubes in that assigns a large subset of to each tube in the collection; but for this discussion we shall ignore this subtlety and pretend that we can always work with the full tubes.
Previous results in this area tended to center around lower bounds of the form
for various intermediate dimensions , that one would like to make as large as possible. For instance, just from considering a single tube in this collection, one can easily establish (1) with . By just using the fact that two lines in intersect in a point (or more precisely, a more quantitative estimate on the volume between the intersection of two tubes, based on the angle of intersection), combined with a now classical -based argument of Córdoba, one can obtain (1) with (and this type of argument also resolves the Kakeya conjecture in two dimensions). In 1995, building on earlier work by Bourgain, Wolff famously obtained (1) with using what is now known as the “Wolff hairbrush argument”, based on considering the size of a “hairbrush” – the union of all the tubes that pass through a single tube (the hairbrush “stem”) in the collection.In their new paper, Wang and Zahl established (1) for . The proof is lengthy (127 pages!), and relies crucially on their previous paper establishing a key “sticky” case of the conjecture. Here, I thought I would try to summarize the high level strategy of proof, omitting many details and also oversimplifying the argument at various places for sake of exposition. The argument does use many ideas from previous literature, including some from my own papers with co-authors; but the case analysis and iterative schemes required are remarkably sophisticated and delicate, with multiple new ideas needed to close the full argument.
A natural strategy to prove (1) would be to try to induct on : if we let represent the assertion that (1) holds for all configurations of tubes of dimensions , with -separated directions, we could try to prove some implication of the form for all , where is some small positive quantity depending on . Iterating this, one could hope to get arbitrarily close to .
A general principle with these sorts of continuous induction arguments is to first obtain the trivial implication in a non-trivial fashion, with the hope that this non-trivial argument can somehow be perturbed or optimized to get the crucial improvement . The standard strategy for doing this, since the work of Bourgain and then Wolff in the 1990s (with precursors in older work of Córdoba), is to perform some sort of “induction on scales”. Here is the basic idea. Let us call the tubes in “thin tubes”. We can try to group these thin tubes into “fat tubes” of dimension for some intermediate scale ; it is not terribly important for this sketch precisely what intermediate value is chosen here, but one could for instance set if desired. Because of the -separated nature of the directions in , there can only be at most thin tubes in a given fat tube, and so we need at least fat tubes to cover the thin tubes. Let us suppose for now that we are in the “sticky” case where the thin tubes stick together inside fat tubes as much as possible, so that there are in fact a collection of fat tubes , with each fat tube containing about of the thin tubes. Let us also assume that the fat tubes are -separated in direction, which is an assumption which is highly consistent with the other assumptions made here.
If we already have the hypothesis , then by applying it at scale instead of we conclude a lower bound on the volume occupied by fat tubes:
Since , this morally tells us that the typical multiplicity of the fat tubes is ; a typical point in should belong to about fat tubes.Now, inside each fat tube , we are assuming that we have about thin tubes that are -separated in direction. If we perform a linear rescaling around the axis of the fat tube by a factor of to turn it into a tube, this would inflate the thin tubes to be rescaled tubes of dimensions , which would now be -separated in direction. This rescaling does not affect the multiplicity of the tubes. Applying again, we see morally that the multiplicity of the rescaled tubes, and hence the thin tubes inside , should be .
We now observe that the multiplicity of the full collection of thin tubes should morally obey the inequality
since if a given point lies in at most fat tubes, and within each fat tube a given point lies in at most thin tubes in that fat tube, then it should only be able to lie in at most tubes overall. This heuristically gives , which then recovers (1) in the sticky case.In their previous paper, Wang and Zahl were roughly able to squeeze a little bit more out of this argument to get something resembling in the sticky case, loosely following a strategy of Nets Katz and myself that I discussed in this previous blog post from over a decade ago. I will not discuss this portion of the argument further here, referring the reader to the introduction to that paper; instead, I will focus on the arguments in the current paper, which handle the non-sticky case.
Let’s try to repeat the above analysis in a non-sticky situation. We assume (or some suitable variant thereof), and consider some thickened Kakeya set
where is something resembling what we might call a “Kakeya configuration” at scale : a collection of thin tubes of dimension that are -separated in direction. (Actually, to make the induction work, one has to consider a more general family of tubes than these, satisfying some standard “Wolff axioms” instead of the direction separation hypothesis; but we will gloss over this issue for now.) Our goal is to prove something like for some , which amounts to obtaining some improved volume bound that improves upon the bound coming from . From the previous paper we know we can do this in the “sticky” case, so we will assume that is “non-sticky” (whatever that means).A typical non-sticky setup is when there are now fat tubes for some multiplicity (e.g., for some small constant ), with each fat tube containing only thin tubes. Now we have an unfortunate imbalance: the fat tubes form a “super-Kakeya configuration”, with too many tubes at the coarse scale for them to be all -separated in direction, while the thin tubes inside a fat tube form a “sub-Kakeya configuration” in which there are not enough tubes to cover all relevant directions. So one cannot apply the hypothesis efficiently at either scale.
This looks like a serious obstacle, so let’s change tack for a bit and think of a different way to try to close the argument. Let’s look at how intersects a given -ball . The hypothesis suggests that might behave like a -dimensional fractal (thickened at scale ), in which case one might be led to a predicted size of of the form . Suppose for sake of argument that the set was denser than this at this scale, for instance we have
for all and some . Observe that the -neighborhood is basically , and thus has volume by the hypothesis (indeed we would even expect some gain in , but we do not attempt to capture such a gain for now). Since -balls have volume , this should imply that needs about balls to cover it. Applying (3), we then heuristically have which would give the desired gain . So we win if we can exhibit the condition (3) for some intermediate scale . I think of this as a “Frostman measure violation”, in that the Frostman type bound is being violated.The set , being the union of tubes of thickness , is essentially the union of cubes. But it has been observed in several previous works (starting with a paper of Nets Katz, Izabella Laba, and myself) that these Kakeya type sets tend to organize themselves into larger “grains” than these cubes – in particular, they can organize into disjoint prisms (or “grains”) in various orientations for some intermediate scales . The original “graininess” argument of Nets, Izabella and myself required a stickiness hypothesis which we are explicitly not assuming (and also an “x-ray estimate”, though Wang and Zahl were able to find a suitable substitute for this), so is not directly available for this argument; however, there is an alternate approach to graininess developed by Guth, based on the polynomial method, that can be adapted to this setting. (I am told that Guth has a way to obtain this graininess reduction for this paper without invoking the polynomial method, but I have not studied the details.) With rescaling, we can ensure that the thin tubes inside a single fat tube will organize into grains of a rescaled dimension . The grains associated to a single fat tube will be essentially disjoint; but there can be overlap between grains from different fat tubes.
The exact dimensions of the grains are not specified in advance; the argument of Guth will show that is significantly larger than , but other than that there are no bounds. But in principle we should be able to assume without loss of generality that the grains are as “large” as possible. This means that there are no longer grains of dimensions with much larger than ; and for fixed , there are no wider grains of dimensions with much larger than .
One somewhat degenerate possibility is that there are enormous grains of dimensions approximately (i.e., ), so that the Kakeya set becomes more like a union of planar slabs. Here, it turns out that the classical arguments of Córdoba give good estimates, so this turns out to be a relatively easy case. So we can assume that least one of or is small (or both).
We now revisit the multiplicity inequality (2). There is something slightly wasteful about this inequality, because the fat tubes used to define occupy a lot of space that is not in . An improved inequality here is
where is the multiplicity, not of the fat tubes , but rather of the smaller set . The point here is that by the graininess hypotheses, each is the union of essentially disjoint grains of some intermediate dimensions . So the quantity is basically measuring the multiplicity of the grains.It turns out that after a suitable rescaling, the arrangement of grains looks locally like an arrangement of tubes. If one is lucky, these tubes will look like a Kakeya (or sub-Kakeya) configuration, for instance with not too many tubes in a given direction. (More precisely, one should assume here some form of the Wolff axioms, which the authors refer to as the “Katz-Tao Convex Wolff axioms”). A suitable version if the hypothesis will then give the bound
Meanwhile, the thin tubes inside a fat tube are going to be a sub-Kakeya configuration, having about times fewer tubes than a Kakeya configuration. It turns out to be possible to use to then get a gain in here, for some small constant . Inserting these bounds into (4), one obtains a good bound which leads to the desired gain .So the remaining case is when the grains do not behave like a rescaled Kakeya or sub-Kakeya configuration. Wang and Zahl introduce a “structure theorem” to analyze this case, concluding that the grains will organize into some larger convex prisms , with the grains in each prism behaving like a “super-Kakeya configuration” (with significantly more grains than one would have for a Kakeya configuration). However, the precise dimensions of these prisms is not specified in advance, and one has to split into further cases.
One case is when the prisms are “thick”, in that all dimensions are significantly greater than . Informally, this means that at small scales, looks like a super-Kakeya configuration after rescaling. With a somewhat lengthy induction on scales argument, Wang and Zahl are able to show that (a suitable version of) implies an “x-ray” version of itself, in which the lower bound of super-Kakeya configurations is noticeably better than the lower bound for Kakeya configurations. The upshot of this is that one is able to obtain a Frostman violation bound of the form (3) in this case, which as discussed previously is already enough to win in this case.
It remains to handle the case when the prisms are “thin”, in that they have thickness . In this case, it turns out that the arguments of Córdoba, combined with the super-Kakeya nature of the grains inside each of these thin prisms, implies that each prism is almost completely occupied by the set . In effect, this means that these prisms themselves can be taken to be grains of the Kakeya set. But this turns out to contradict the maximality of the dimensions of the grains (if everything is set up properly). This treats the last remaining case needed to close the induction on scales, and obtain the Kakeya conjecture!
Closing the “green gap”: from the mathematics of the landscape function to lower electricity costs for households
I recently returned from the 2025 Annual Meeting of the “Localization of Waves” collaboration (supported by the Simons Foundation, with additional related support from the NSF), where I learned (from Svitlana Mayboroda, the director of the collaboration as well as one of the principal investigators) of a remarkable statistic: net electricity consumption by residential customers in the US has actually experienced a slight decrease in recent years:
The decrease is almost entirely due to gains in lighting efficiency in households, and particularly the transition from incandescent (and compact fluorescent) light bulbs to LED light bulbs:
Annual energy savings from this switch to consumers in the US were already estimated to be $14.7 billion in 2020 – or several hundred dollars per household – and are projected to increase, even in the current inflationary era, with the cumulative savings across the US estimated to reach $890 billion by 2035.
What I also did not realize before this meeting is the role that recent advances in pure mathematics – and specifically, the development of the “landscape function” that was a primary focus of this collaboration – played in accelerating this transition. This is not to say that this piece of mathematics was solely responsible for these developments; but, as I hope to explain here, it was certainly part of the research and development ecosystem in both academia and industry, spanning multiple STEM disciplines and supported by both private and public funding. This application of the landscape function was already reported upon by Quanta magazine at the very start of this collaboration back in 2017; but it is only in the last few years that the mathematical theory has been incorporated into the latest LED designs and led to actual savings at the consumer end.
LED lights are made from layers of semiconductor material (e.g., Gallium nitride or Indium gallium nitride) arranged in a particular fashion. When enough of a voltage difference is applied to this material, electrons are injected into the “n-type” side of the LED, while holes of electrons are injected into the “p-type” side, creating a current. In the active layer of the LED, these electrons and holes recombine in the quantum wells of the layer, generating radiation (light) via the mechanism of electroluminescence. The brightness of the LED is determined by the current, while the power consumption is the product of the current and the voltage. Thus, to improve energy efficiency, one seeks to design LEDs to require as little voltage as possible to generate a target amount of current.
As it turns out, the efficiency of an LED, as well as the spectral frequencies of light they generate, depend in many subtle ways on the precise geometry of the chemical composition of the semiconductors, the thickness of the layers, the geometry of how the layers are placed atop one another, the temperature of the materials, and the amount of disorder (impurities) introduced into each layer. In particular, in order to create quantum wells that can efficiently trap the electrons and holes together to recombine to create light of a desired frequency, it is useful to introduce a certain amount of disorder into the layers in order to take advantage of the phenomenon of Anderson localization. However, one cannot add too much disorder, lest the electron states become fully bound and the material behaves too much like an insulator to generate appreciable current.
One can of course make empirical experiments to measure the performance of various proposed LED designs by fabricating them and then testing them in a laboratory. But this is an expensive and painstaking process that does not scale well; one cannot test thousands of candidate designs this way to isolate the best performing ones. So, it becomes desirable to perform numerical simulations of these designs instead, which – if they are sufficiently accurate and computationally efficient – can lead to a much shorter and cheaper design cycle. (In the near future one may also hope to accelerate the design cycle further by incorporating machine learning and AI methods; but these techniques, while promising, are still not fully developed at the present time.)
So, how can one perform numerical simulation of an LED? By the semiclassical approximation, the wave function of an individual electron should solve the time-independent Schrödinger equation
where is the wave function of the electron at this energy level, and is the conduction band energy. The behavior of hole wavefunctions follows a similar equation, governed by the valence band energy instead of . However, there is a complication: these band energies are not solely coming from the semiconductor, but also contain a contribution that comes from electrostatic effects from the electrons and holes, and more specifically by solving the Poisson equation
where is the dielectric constant of the semiconductor, are the carrier densities of electrons and holes respectively, , are further densities of ionized acceptor and donor atoms, and are physical constants. This equation looks somewhat complicated, but is mostly determined by the carrier densities , which in turn ultimately arise from the probability densities associated to the eigenfunctions via the Born rule, combined with the Fermi-Dirac distribution from statistical mechanics; for instance, the electron carrier density is given by the formula
with a similar formula for . In particular, the net potential depends on the wave functions , turning the Schrödinger equation into a nonlinear self-consistent Hartree-type equation. From the wave functions one can also compute the current, determine the amount of recombination between electrons and holes, and therefore also calculate the light intensity and absorption rates. But the main difficulty is to solve for the wave functions for the different energy levels of the electron (as well as the counterpart for holes).
One could attempt to solve this nonlinear system iteratively, by first proposing an initial candidate for the wave functions , using this to obtain a first approximation for the conduction band energy and valence band energy , and then solving the Schrödinger equations to obtain a new approximation for , and repeating this process until it converges. However, the regularity of the potentials plays an important role in being able to solve the Schrödinger equation. (The Poisson equation, being elliptic, is relatively easy to solve to high accuracy by standard methods, such as finite element methods.) If the potential is quite smooth and slowly varying, then one expects the wave functions to be quite delocalizated, and for traditional approximations such as the WKB approximation to be accurate.
However, in the presence of disorder, such approximations are no longer valid. As a consequence, traditional methods for numerically solving these equations had proven to be too inaccurate to be of practical use in simulating the performance of a LED design, so until recently one had to rely primarily on slower and more expensive empirical testing methods. One real-world consequence of this was the “green gap“; while reasonably efficient LED designs were available in the blue and red portions of the spectrum, there was not a suitable design that gave efficient output in the green spectrum. Given that many applications of LED lighting required white light that was balanced across all visible colors of the spectrum, this was a significant impediment to realizing the energy-saving potential of LEDs.
Here is where the landscape function comes in. This function started as a purely mathematical discovery: when solving a Schrödinger equation such as
(where we have now suppressed all physical constants for simplicity), it turns out that the behavior of the eigenfunctions at various energy levels is controlled to a remarkable extent by the landscape function , defined to be the solution to the equation
As discussed in this previous blog post (discussing a paper on this topic I wrote with some of the members of this collaboration), one reason for this is that the Schrödinger equation can be transformed after some routine calculations to
thus making an effective potential for the Schrödinger equation (and also being the coefficients of an effective geometry for the equation). In practice, when is a disordered potential, the effective potential tends to be behave like a somewhat “smoothed out” or “homogenized” version of that exhibits superior numerical performance. For instance, the classical Weyl law predicts (assuming a smooth confining potential ) that the density of states up to energy – that is to say, the number of bound states up to – should asymptotically behave like . This is accurate at very high energies , but when is disordered, it tends to break down at low and medium energies. However, the landscape function makes a prediction for this density of states that is significantly more accurate in practice in these regimes, with a mathematical justification (up to multiplicative constants) of this accuracy obtained in this paper of David, Filoche, and Mayboroda. More refined predictions (again with some degree of theoretical support from mathematical analysis) can be made on the local integrated density of states, and with more work one can then also obtain approximations for the carrier density functions mentioned previously in terms of the energy band level functions , . As the landscape function is relatively easy to compute (coming from solving a single elliptic equation), this gives a very practical numerical way to carry out the iterative procedure described previously to model LEDs in a way that has proven to be both numerically accurate, and significantly faster than empirical testing, leading to a significantly more rapid design cycle.
In particular, recent advances in LED technology have largely closed the “green gap” by introducing designs that incorporate “-defects”: -shaped dents in the semiconductor layers of the LED that create lateral carrier injection pathways and modify the internal electric field, enhancing hole transport into the active layer. The ability to accurately simulate the effects of these defects has allowed researchers to largely close this gap:
My understanding is that the major companies involved in developing LED lighting are now incorporating landscape-based methods into their own proprietary simulation models to achieve similar effects in commercially produced LEDs, which should lead to further energy savings in the near future.
Thanks to Svitlana Mayboroda and Marcel Filoche for detailed discussions, comments, and corrections of the material here.
Cosmic Distance Ladder videos with Grant Sanderson (3blue1brown): commentary and corrections
Grant Sanderson (who runs, and creates most of the content for, the website and Youtube channel 3blue1brown) has been collaborating with myself and others (including my coauthor Tanya Klowden) on producing a two-part video giving an account of some of the history of the cosmic distance ladder, building upon a previous public lecture I gave on this topic, and also relating to a forthcoming popular book with Tanya on this topic. The first part of this video is available here; the second part is available here.
The videos were based on a somewhat unscripted interview that Grant conducted with me some months ago, and as such contained some minor inaccuracies and omissions (including some made for editing reasons to keep the overall narrative coherent and within a reasonable length). They also generated many good questions from the viewers of the Youtube video. I am therefore compiling here a “FAQ” of various clarifications and corrections to the videos; this was originally placed as a series of comments on the Youtube channel, but the blog post format here will be easier to maintain going forward. Some related content will also be posted on the Instagram page for the forthcoming book with Tanya.
Questions on the two main videos are marked with an appropriate timestamp to the video.
Comments on part 1 of the video- 4:26 Did Eratosthenes really check a local well in Alexandria?
This was a narrative embellishment on my part. Eratosthenes’s original work is lost to us. The most detailed contemperaneous account, by Cleomedes, gives a simplified version of the method, and makes reference only to sundials (gnomons) rather than wells. However, a secondary account of Pliny states (using this English translation), “Similarly it is reported that at the town of Syene, 5000 stades South of Alexandria, at noon in midsummer no shadow is cast, and that in a well made for the sake of testing this the light reaches to the bottom, clearly showing that the sun is vertically above that place at the time”. However, no mention is made of any well in Alexandria in either account. - 4:50 How did Eratosthenes know that the Sun was so far away that its light rays were close to parallel?
This was not made so clear in our discussions or in the video (other than a brief glimpse of the timeline at 18:27), but Eratosthenes’s work actually came after Aristarchus, so it is very likely that Eratosthenes was aware of Aristarchus’s conclusions about how distant the Sun was from the Earth. Even if Aristarchus’s heliocentric model was disputed by the other Greeks, at least some of his other conclusions appear to have attracted some support. Also, after Eratosthenes’s time, there was further work by Greek, Indian, and Islamic astronomers (such as Hipparchus, Ptolemy, Aryabhata, and Al-Battani) to measure the same distances that Aristarchus did, although these subsequent measurements for the Sun also were somewhat far from modern accepted values. - 5:17 Is it completely accurate to say that on the summer solstice, the Earth’s axis of rotation is tilted “directly towards the Sun”?
Strictly speaking, “in the direction towards the Sun” is more accurate than “directly towards the Sun”; it tilts at about 23.5 degrees towards the Sun, but it is not a total 90-degree tilt towards the Sun. - 5:39 Wait, aren’t there two tropics? The tropic of Cancer and the tropic of Capricorn?
Yes! This corresponds to the two summers Earth experiences, one in the Northern hemisphere and one in the Southern hemisphere. The tropic of Cancer, at a latitude of about 23 degrees north, is where the Sun is directly overhead at noon during the Northern summer solstice (around June 21); the tropic of Capricorn, at a latitude of about 23 degrees south, is where the Sun is directly overhead at noon during the Southern summer solstice (around December 21). But Alexandria and Syene were both in the Northern Hemisphere, so it is the tropic of Cancer that is relevant to Eratosthenes’ calculations. - 5:41 Isn’t it kind of a massive coincidence that Syene was on the tropic of Cancer?
Actually, Syene (now known as Aswan) was about half a degree of latitude away from the tropic of Cancer, which was one of the sources of inaccuracy in Eratosthenes’ calculations. But one should take the “look-elsewhere effect” into account: because the Nile cuts across the tropic of Cancer, it was quite likely to happen that the Nile would intersect the tropic near some inhabited town. It might not necessarily have been Syene, but that would just mean that Syene would have been substituted by this other town in Eratosthenes’s account.
On the other hand, it was fortunate that the Nile ran from South to North, so that distances between towns were a good proxy for the differences in latitude. Apparently, Eratosthenes actually had a more complicated argument that would also work if the two towns in question were not necessarily oriented along the North-South direction, and if neither town was on the tropic of Cancer; but unfortunately the original writings of Eratosthenes are lost to us, and we do not know the details of this more general argument. (But some variants of the method can be found in later work of Posidonius, Aryabhata, and others.)
Nowadays, the “Eratosthenes experiment” is run every year on the March equinox, in which schools at the same longitude are paired up to measure the elevation of the Sun at the same point in time, in order to obtain a measurement of the circumference of the Earth. (The equinox is more convenient than the solstice when neither location is on a tropic, due to the simple motion of the Sun at that date.) With modern timekeeping, communications, surveying, and navigation, this is a far easier task to accomplish today than it was in Eratosthenes’ time. - 6:30 I thought the Earth wasn’t a perfect sphere. Does this affect this calculation?
Yes, but only by a small amount. The centrifugal forces caused by the Earth’s rotation along its axis cause an equatorial bulge and a polar flattening so that the radius of the Earth fluctuates by about 20 kilometers from pole to equator. This sounds like a lot, but it is only about 0.3% of the mean Earth radius of 6371 km and is not the primary source of error in Eratosthenes’ calculations. - 7:27 Are the riverboat merchants and the “grad student” the leading theories for how Eratosthenes measured the distance from Alexandria to Syene?
There is some recent research that suggests that Eratosthenes may have drawn on the work of professional bematists (step measurers – a precursor to the modern profession of surveyor) for this calculation. This somewhat ruins the “grad student” joke, but perhaps should be disclosed for the sake of completeness. - 8:51 How long is a “lunar month” in this context? Is it really 28 days?
In this context the correct notion of a lunar month is a “synodic month” – the length of a lunar cycle relative to the Sun – which is actually about 29 days and 12 hours. It differs from the “sidereal month” – the length of a lunar cycle relative to the fixed stars – which is about 27 days and 8 hours – due to the motion of the Earth around the Sun (or the Sun around the Earth, in the geocentric model). [A similar correction needs to be made around 14:59, using the synodic month of 29 days and 12 hours rather than the “English lunar month” of 28 days (4 weeks).] - 10:47 Is the time taken for the Moon to complete an observed rotation around the Earth slightly less than 24 hours as claimed?
Actually, I made a sign error: the lunar day (also known as a tidal day) is actually 24 hours and 50 minutes, because the Moon rotates in the same direction as the spinning of Earth around its axis. The animation therefore is also moving in the wrong direction as well (related to this, the line of sight is covering up the Moon in the wrong direction to the Moon rising at around 10:38). - 11:32 Is this really just a coincidence that the Moon and Sun have almost the same angular width?
I believe so. First of all, the agreement is not that good: due to the non-circular nature of the orbit of the Moon around the Earth, and Earth around the Sun, the angular width of the Moon actually fluctuates to be as much as 10% larger or smaller than the Sun at various times (cf. the “supermoon” phenomenon). All other known planets with known moons do not exhibit this sort of agreement, so there does not appear to be any universal law of nature that would enforce this coincidence. (This is in contrast with the empirical fact that the Moon always presents the same side to the Earth, which occurs in all other known large moons (as well as Pluto), and is well explained by the physical phenomenon of tidal locking.)
On the other hand, as the video hopefully demonstrates, the existence of the Moon was extremely helpful in allowing the ancients to understand the basic nature of the solar system. Without the Moon, their task would have been significantly more difficult; but in this hypothetical alternate universe, it is likely that modern cosmology would have still become possible once advanced technology such as telescopes, spaceflight, and computers became available, especially when combined with the modern mathematics of data science. Without giving away too many spoilers, a scenario similar to this was explored in the classic short story and novel “Nightfall” by Isaac Asimov. - 12:58 Isn’t the illuminated portion of the Moon, as well as the visible portion of the Moon, slightly smaller than half of the entire Moon, because the Earth and Sun are not an infinite distance away from the Moon?
Technically yes (and this is actually for a very similar reason to why half Moons don’t quite occur halfway between the new Moon and the full Moon); but this fact turns out to have only a very small effect on the calculations, and is not the major source of error. In reality, the Sun turns out to be about 86,000 Moon radii away from the Moon, so asserting that half of the Moon is illuminated by the Sun is actually a very good first approximation. (The Earth is “only” about 220 Moon radii away, so the visible portion of the Moon is a bit more noticeably less than half; but this doesn’t actually affect Aristarchus’s arguments much.)
The angular diameter of the Sun also creates an additional thin band between the fully illuminated and fully non-illuminated portions of the Moon, in which the Sun is intersecting the lunar horizon and so only illuminates the Moon with a portion of its light, but this is also a relatively minor effect (and the midpoints of this band can still be used to define the terminator between illuminated and non-illuminated for the purposes of Aristarchus’s arguments). - 13:27 What is the difference between a half Moon and a quarter Moon?
If one divides the lunar month, starting and ending at a new Moon, into quarters (weeks), then half moons occur both near the end of the first quarter (a week after the new Moon, and a week before the full Moon), and near the end of the third quarter (a week after the full Moon, and a week before the new Moon). So, somewhat confusingly, half Moons come in two types, known as “first quarter Moons” and “third quarter Moons”. - 14:49 I thought the sine function was introduced well after the ancient Greeks.
It’s true that the modern sine function only dates back to the Indian and Islamic mathematical traditions in the first millennium CE, several centuries after Aristarchus. However, he still had Euclidean geometry at his disposal, which provided tools such as similar triangles that could be used to reach basically the same conclusions, albeit with significantly more effort than would be needed if one could use modern trigonometry.
On the other hand, Aristarchus was somewhat hampered by not knowing an accurate value for , which is also known as Archimedes’ constant: the fundamental work of Archimedes on this constant actually took place a few decades after that of Aristarchus! - 15:17 I plugged in the modern values for the distances to the Sun and Moon and got 18 minutes for the discrepancy, instead of half an hour.
Yes; I quoted the wrong number here. In 1630, Godfried Wendelen replicated Aristarchus’s experiment. With improved timekeeping and the then-recent invention of the telescope, Wendelen obtained a measurement of half an hour for the discrepancy, which is significantly better than Aristarchus’s calculation of six hours, but still a little bit off from the true value of 18 minutes. (As such, Wendelinus’s estimate for the distance to the Sun was 60% of the true value.) - 15:27 Wouldn’t Aristarchus also have access to other timekeeping devices than sundials?
Yes, for instance clepsydrae (water clocks) were available by that time; but they were of limited accuracy. It is also possible that Aristarchus could have used measurements of star elevations to also estimate time; it is not clear whether the astrolabe or the armillary sphere was available to him, but he would have had some other more primitive astronomical instruments such as the dioptra at his disposal. But again, the accuracy and calibration of these timekeeping tools would have been poor.
However, most likely the more important limiting factor was the ability to determine the precise moment at which a perfect half Moon (or new Moon, or full Moon) occurs; this is extremely difficult to do with the naked eye. (The telescope would not be invented for almost two more millennia.) - 17:37 Could the parallax problem be solved by assuming that the stars are not distributed in a three-dimensional space, but instead on a celestial sphere?
Putting all the stars on a fixed sphere would make the parallax effects less visible, as the stars in a given portion of the sky would now all move together at the same apparent velocity – but there would still be visible large-scale distortions in the shape of the constellations because the Earth would be closer to some portions of the celestial sphere than others; there would also be variability in the brightness of the stars, and (if they were very close) the apparent angular diameter of the stars. (These problems would be solved if the celestial sphere was somehow centered around the moving Earth rather than the fixed Sun, but then this basically becomes the geocentric model with extra steps.) - 18:29 Did nothing of note happen in astronomy between Eratosthenes and Copernicus?
Not at all! There were significant mathematical, technological, theoretical, and observational advances by astronomers from many cultures (Greek, Islamic, Indian, Chinese, European, and others) during this time, for instance improving some of the previous measurements on the distance ladder, a better understanding of eclipses, axial tilt, and even axial precession, more sophisticated trigonometry, and the development of new astronomical tools such as the astrolabe. See for instance this “deleted scene” from the video, as well as the FAQ entry for 14:49 for this video and 24:54 for the second video, or this instagram post. But in order to make the overall story of the cosmic distance ladder fit into a two-part video, we chose to focus primarily on the first time each rung of the ladder was climbed. - 18:30 Is that really Kepler’s portrait?
We have since learned that this portrait was most likely painted in the 19th century, and may have been based more on Kepler’s mentor, Michael Mästlin. A more commonly accepted portrait of Kepler may be found at his current Wikipedia page. - 19:07 Isn’t it tautological to say that the Earth takes one year to perform a full orbit around the Sun?
Technically yes, but this is an illustration of the philosophical concept of “referential opacity“: the content of a sentence can change when substituting one term for another (e.g., “1 year” and “365 days”), even when both terms refer to the same object. Amusingly, the classic illustration of this, known as Frege’s puzzles, also comes from astronomy: it is an informative statement that Hesperus (the evening star) and Phosphorus (the morning star, also known as Lucifer) are the same object (which nowadays we call Venus), but it is a mere tautology that Hesperus and Hesperus are the same object: changing the reference from Phosphorus to Hesperus changes the meaning. - 19:10 How did Copernicus figure out the crucial fact that Mars takes 687 days to go around the Sun? Was it directly drawn from Babylonian data?
Technically, Copernicus drew from tables by European astronomers that were largely based on earlier tables from the Islamic golden age, which in turn drew from earlier tables by Indian and Greek astronomers, the latter of which also incorporated data from the ancient Babylonians, so it is more accurate to say that Copernicus relied on centuries of data, at least some of which went all the way back to the Babylonians. Among all of this data was the times when Mars was in opposition to the Sun; if one imagines the Earth and Mars as being like runners going around a race track circling the Sun, with Earth on an inner track and Mars on an outer track, oppositions are analogous to when the Earth runner “laps” the Mars runner. From the centuries of observational data, such “laps” were known to occur about once every 780 days (this is known as the synodic period of Mars). Because the Earth takes roughly 365 days to perform a “lap”, it is possible to do a little math and conclude that Mars must therefore complete its own “lap” in 687 days (this is known as the sidereal period of Mars). (See also this post on the cosmic distance ladder Instagram for some further elaboration.) - 20:52 Did Kepler really steal data from Brahe?
The situation is complex. When Kepler served as Brahe’s assistant, Brahe only provided Kepler with a limited amount of data, primarily involving Mars, in order to confirm Brahe’s own geo-heliocentric model. After Brahe’s death, the data was inherited by Brahe’s son-in-law and other relatives, who intended to publish Brahe’s work separately; however, Kepler, who was appointed as Imperial Mathematician to succeed Brahe, had at least some partial access to the data, and many historians believe he secretly copied portions of this data to aid his own research before finally securing complete access to the data from Brahe’s heirs after several years of disputes. On the other hand, as intellectual property rights laws were not well developed at this time, Kepler’s actions were technically legal, if ethically questionable. - 21:39 What is that funny loop in the orbit of Mars?
This is known as retrograde motion. This arises because the orbital velocity of Earth (about 30 km/sec) is a little bit larger than that of Mars (about 24 km/sec). So, in opposition (when Mars is in the opposite position in the sky than the Sun), Earth will briefly overtake Mars, causing its observed position to move westward rather than eastward. But in most other times, the motion of Earth and Mars are at a sufficient angle that Mars will continue its apparent eastward motion despite the slightly faster speed of the Earth. - 21:59 Couldn’t one also work out the direction to other celestial objects in addition to the Sun and Mars, such as the stars, the Moon, or the other planets? Would that have helped?
Actually, the directions to the fixed stars were implicitly used in all of these observations to determine how the celestial sphere was positioned, and all the other directions were taken relative to that celestial sphere. (Otherwise, all the calculations would be taken on a rotating frame of reference in which the unknown orbits of the planets were themselves rotating, which would have been an even more complex task.) But the stars are too far away to be useful as one of the two landmarks to triangulate from, as they generate almost no parallax and so cannot distinguish one location from another.
Measuring the direction to the Moon would tell you which portion of the lunar cycle one was in, and would determine the phase of the Moon, but this information would not help one triangulate, because the Moon’s position in the heliocentric model varies over time in a somewhat complicated fashion, and is too tied to the motion of the Earth to be a useful “landmark” to one to determine the Earth’s orbit around the Sun.
In principle, using the measurements to all the planets at once could allow for some multidimensional analysis that would be more accurate than analyzing each of the planets separately, but this would require some sophisticated statistical analysis and modeling, as well as non-trivial amounts of compute – neither of which were available in Kepler’s time. - 22:57 Can you elaborate on how we know that the planets all move on a plane?
The Earth’s orbit lies in a plane known as the ecliptic (it is where the lunar and solar eclipses occur). Different cultures have divided up the ecliptic in various ways; in Western astrology, for instance, the twelve main constellations that cross the ecliptic are known as the Zodiac. The planets can be observed to only wander along the Zodiac, but not other constellations: for instance, Mars can be observed to be in Cancer or Libra, but never in Orion or Ursa Major. From this, one can conclude (as a first approximation, at least), that the planets all lie on the ecliptic.
However, this isn’t perfectly true, and the planets will deviate from the ecliptic by a small angle known as the ecliptic latitude. Tycho Brahe’s observations on these latitudes for Mars were an additional useful piece of data that helped Kepler complete his calculations (basically by suggesting how to join together the different “jigsaw pieces”), but the math here gets somewhat complicated, so the story here has been somewhat simplified to convey the main ideas. - 23:04 What are the other universal problem solving tips?
Grant Sanderson has a list (in a somewhat different order) in this previous video. - 23:28 Can one work out the position of Earth from fixed locations of the Sun and Mars when the Sun and Mars are in conjunction (the same location in the sky) or opposition (opposite locations in the sky)?
Technically, these are two times when the technique of triangulation fails to be accurate; and also in the former case it is extremely difficult to observe Mars due to the proximity to the Sun. But again, following the Universal Problem Solving Tip from 23:07, one should initially ignore these difficulties to locate a viable method, and correct for these issues later. This video series by Welch Labs goes into Kepler’s methods in more detail. - 24:04 So Kepler used Copernicus’s calculation of 687 days for the period of Mars. But didn’t Kepler discard Copernicus’s theory of circular orbits?
Good question! It turns out that Copernicus’s calculations of orbital periods are quite robust (especially with centuries of data), and continue to work even when the orbits are not perfectly circular. But even if the calculations did depend on the circular orbit hypothesis, it would have been possible to use the Copernican model as a first approximation for the period, in order to get a better, but still approximate, description of the orbits of the planets. This in turn can be fed back into the Copernican calculations to give a second approximation to the period, which can then give a further refinement of the orbits. Thanks to the branch of mathematics known as perturbation theory, one can often make this type of iterative process converge to an exact answer, with the error in each successive approximation being smaller than the previous one. (But performing such an iteration would probably have been beyond the computational resources available in Kepler’s time; also, the foundations of perturbation theory require calculus, which only was developed several decades after Kepler.) - 24:21 Did Brahe have exactly 10 years of data on Mars’s positions?
Actually, it was more like 17 years, but with many gaps, due both to inclement weather, as well as Brahe turning his attention to other astronomical objects than Mars in some years; also, in times of conjunction, Mars might only be visible in the daytime sky instead of the night sky, again complicating measurements. So the “jigsaw puzzle pieces” in 25:26 are in fact more complicated than always just five locations equally spaced in time; there are gaps and also observational errors to grapple with. But to understand the method one should ignore these complications; again, see “Universal Problem Solving Tip #1”. Even with his “idea of true genius”, it took many years of further painstaking calculation for Kepler to tease out his laws of planetary motion from Brahe’s messy and incomplete observational data. - 26:44 Shouldn’t the Earth’s orbit be spread out at perihelion and clustered closer together at aphelion, to be consistent with Kepler’s laws?
Yes, you are right; there was a coding error here. - 26:53 What is the reference for Einstein’s “idea of pure genius”?
Actually, the precise quote was “an idea of true genius”, and can be found in the introduction to Carola Baumgardt’s “Life of Kepler“.
- Was Al-Biruni really of Arab origin?
Strictly speaking; no; his writings are all in Arabic, and he was nominally a subject of the Abbasid Caliphate whose rulers were Arab; but he was born in Khwarazm (in modern day Uzbekistan), and would have been a subject of either the Samanid empire or the Khrawazmian empire, both of which were largely self-governed and primarily Persian in culture and ethnic makeup, despite being technically vassals of the Caliphate. So he would have been part of what is sometimes called “Greater Persia” or “Greater Iran”.
Another minor correction: while Al-Biruni was born in the tenth century, his work on the measurement of the Earth was published in the early eleventh century. - Is really called the angle of declination?
This was a misnomer on my part; this angle is more commonly called the dip angle. - But the height of the mountain would be so small compared to the radius of the Earth! How could this method work?
Using the Taylor approximation , one can approximately write the relationship between the mountain height , the Earth radius , and the dip angle (in radians) as . The key point here is the inverse quadratic dependence on , which allows for even relatively small values of to still be realistically useful for computing . Al-Biruni’s measurement of the dip angle was about radians, leading to an estimate of that is about four orders of magnitude larger than , which is within ballpark at least of a typical height of a mountain (on the order of a kilometer) and the radius of the Earth (6400 kilometers). - Was the method really accurate to within a percentage point?
This is disputed, somewhat similarly to the previous calculations of Eratosthenes. Al-Biruni’s measurements were in cubits, but there were multiple incompatible types of cubit in use at the time. It has also been pointed out that atmospheric refraction effects would have created noticeable changes in the observed dip angle . It is thus likely that the true accuracy of Al-Biruni’s method was poorer than 1%, but that this was somehow compensated for by choosing a favorable conversion between cubits and modern units.
- 1:13 Did Captain Cook set out to discover Australia?
One of the objectives of Cook’s first voyage was to discover the hypothetical continent of Terra Australis. This was considered to be distinct from Australia, which at the time was known as New Holland. As this name might suggest, prior to Cook’s voyage, the northwest coastline of New Holland had been explored by the Dutch; Cook instead explored the eastern coastline, naming this portion New South Wales. The entire continent was later renamed to Australia by the British government, following a suggestion of Matthew Flinders; and the concept of Terra Australis was abandoned. - 4:40 The relative position of the Northern and Southern hemisphere observations is reversed from those earlier in the video.
Yes, this was a slight error in the animation; the labels here should be swapped for consistency of orientation. - 7:06 So, when did they finally manage to measure the transit of Venus, and use this to compute the astronomical unit?
While Le Gentil had the misfortune to not be able to measure either the 1761 or 1769 transits, other expeditions of astronomers (led by Dixon-Mason, Chappe d’Auteroche, and Cook) did take measurements of one or both of these transits with varying degrees of success, with the measurements of Cook’s team of the 1769 transit in Tahiti being of particularly high quality. All of this data was assembled later by Lalande in 1771, leading to the most accurate measurement of the astronomical unit at the time (within 2.3% of modern values, which was about three times more accurate than any previous measurement). - 8:53 What does it mean for the transit of Io to be “twenty minutes ahead of schedule” when Jupiter is in opposition (Jupiter is opposite to the Sun when viewed from the Earth)?
Actually, it should be halved to “ten minutes ahead of schedule”, with the transit being “ten minutes behind schedule” when Jupiter is in conjunction, with the net discrepancy being twenty minutes (or actually closer to 16 minutes when measured with modern technology). Both transits are being compared against an idealized periodic schedule in which the transits are occuring at a perfectly regular rate (about 42 hours), where the period is chosen to be the best fit to the actual data. This discrepancy is only noticeable after carefully comparing transit times over a period of months; at any given position of Jupiter, the Doppler effects of Earth moving towards or away from Jupiter would only affect shift each transit by just a few seconds compared to the previous transit, with the delays or accelerations only becoming cumulatively noticeable after many such transits.
Also, the presentation here is oversimplified: at times of conjunction, Jupiter and Io are too close to the Sun for observation of the transit. Rømer actually observed the transits at other times than conjunction, and Huygens used more complicated trigonometry than what was presented here to infer a measurement for the speed of light in terms of the astronomical unit (which they had begun to measure a bit more accurately than in Aristarchus’s time; see the FAQ entry for 15:17 in the first video). - 10:05 Are the astrological signs for Earth and Venus swapped here?
Yes, this was a small mistake in the animation. - 10:34 Shouldn’t one have to account for the elliptical orbit of the Earth, as well as the proper motion of the star being observed, or the effects of general relativity?
Yes; the presentation given here is a simplified one to convey the idea of the method, but in the most advanced parallax measurements, such as the ones taken by the Hipparcos and Gaia spacecraft, these factors are taken into account, basically by taking as many measurements (not just two) as possible of a single star, and locating the best fit of that data to a multi-parameter model that incorporates the (known) orbit of the Earth with the (unknown) distance and motion of the star, as well as additional gravitational effects from other celestial bodies, such as the Sun and other planets. - 14:53 The formula I was taught for apparent magnitude of stars looks a bit different from the one here.
This is because astronomers use a logarithmic scale to measure both apparent magnitude and absolute magnitude . If one takes the logarithm of the inverse square law in the video, and performs the normalizations used by astronomers to define magnitude, one arrives at the standard relation between absolute and apparent magnitude.
But this is an oversimplification, most notably due to neglect of the effects of extinction effects caused by interstellar dust. This is not a major issue for the relatively short distances observable via parallax, but causes problems at larger scales of the ladder (see for instance the FAQ entry here for 18:08). To compensate for this, one can work in multiple frequencies of the spectrum (visible, x-ray, radio, etc.), as some frequencies are less susceptible to extinction than others. From the discrepancies between these frequencies one can infer the amount of extinction, leading to “dust maps” that can then be used to facilitate such corrections for subsequent measurements in the same area of the universe. (More generally, the trend in modern astronomy is towards “multi-messenger astronomy” in which one combines together very different types of measurements of the same object to obtain a more accurate understanding of that object and its surroundings.) - 18:08 Can we really measure the entire Milky Way with this method?
Strictly speaking, there is a “zone of avoidance” on the far side of the Milky way that is very difficult to measure in the visible portion of the spectrum, due to the large amount of intervening stars, dust, and even a supermassive black hole in the galactic center. However, in recent years it has become possible to explore this zone to some extent using the radio, infrared, and x-ray portions of the spectrum, which are less affected by these factors. - 18:19 How did astronomers know that the Milky Way was only a small portion of the entire universe?
This issue was the topic of the “Great Debate” in the early twentieth century. It was only with the work of Hubble using Leavitt’s law to measure distances to Magellanic clouds and “spiral nebulae” (that we now know to be other galaxies), building on earlier work of Leavitt and Hertzsprung, that it was conclusively established that these clouds and nebulae in fact were at much greater distances than the diameter of the Milky Way. - 18:45 How can one compensate for light blending effects when measuring the apparent magnitude of Cepheids?
This is a non-trivial task, especially if one demands a high level of accuracy. Using the highest resolution telescopes available (such as HST or JWST) is of course helpful, as is switching to other frequencies, such as near-infrared, where Cepheids are even brighter relative to nearby non-Cepheid stars. One can also apply sophisticated statistical methods to fit to models of the point spread of light from unwanted sources, and use nearby measurements of the same galaxy without the Cepheid as a reference to help calibrate those models. Improving the accuracy of the Cepheid portion of the distance ladder is an ongoing research activity in modern astronomy. - 18:54 What is the mechanism that causes Cepheids to oscillate?
For most stars, there is an equilibrium size: if the star’s radius collapses, then the reduced potential energy is converted to heat, creating pressure to pushing the star outward again; and conversely, if the star expands, then it cools, causing a reduction in pressure that no longer counteracts gravitational forces. But for Cepheids, there is an additional mechanism called the kappa mechanism: the increased temperature caused by contraction increases ionization of helium, which drains energy from the star and accelerates the contraction; conversely, the cooling caused by expansion causes the ionized helium to recombine, with the energy released accelerating the expansion. If the parameters of the Cepheid are in a certain “instability strip”, then the interaction of the kappa mechanism with the other mechanisms of stellar dynamics create a periodic oscillation in the Cepheid’s radius, which increases with the mass and brightness of the Cepheid.
For a recent re-analysis of Leavitt’s original Cepheid data, see this paper. - 19:10 Did Leavitt mainly study the Cepheids in our own galaxy?
This was an inaccuracy in the presentation. Leavitt’s original breakthrough paper studied Cepheids in the Small Magellanic Cloud. At the time, the distance to this cloud was not known; indeed, it was a matter of debate whether this cloud was in the Milky Way, or some distance away from it. However, Leavitt (correctly) assumed that all the Cepheids in this cloud were roughly the same distance away from our solar system, so that the apparent brightness was proportional to the absolute brightness. This gave an uncalibrated form of Leavitt’s law between absolute brightness and period, subject to the (then unknown) distance to the Small Magellanic Cloud. After Leavitt’s work, there were several efforts (by Hertzsprung, Russell, and Shapley) to calibrate the law by using the few Cepheids for which other distance methods were available, such as parallax. (Main sequence fitting to the Hertzsprung-Russell diagram was not directly usable, as Cepheids did not lie on the main sequence; but in some cases one could indirectly use this method if the Cepheid was in the same stellar cluster as a main sequence star.) Once the law was calibrated, it could be used to measure distances to other Cepheids, and in particular to compute distances to extragalactic objects such as the Magellanic clouds. - 19:15 Was Leavitt’s law really a linear law between period and luminosity?
Strictly speaking, the period-luminosity relation commonly known as Leavitt’s law was a linear relation between the absolute magnitude of the Cepheid and the logarithm of the period; undoing the logarithms, this becomes a power law between the luminosity and the period. - 20:26 Was Hubble the one to discover the redshift of galaxies?
This was an error on my part; Hubble was using earlier work of Vesto Slipher on these redshifts, and combining it with his own measurements of distances using Leavitt’s law to arrive at the law that now bears his name; he was also assisted in his observations by Milton Humason. It should also be noted that Georges Lemaître had also independently arrived at essentially the same law a few years prior, but his work was published in a somewhat obscure journal and did not receive broad recognition until some time later. - 20:37 Hubble’s original graph doesn’t look like a very good fit to a linear law.
Hubble’s original data was somewhat noisy and inaccurate by modern standards, and the redshifts were affected by the peculiar velocities of individual galaxies in addition to the expanding nature of the universe. However, as the data was extended to more galaxies, it became increasingly possible to compensate for these effects and obtain a much tighter fit, particularly at larger scales where the effects of peculiar velocity are less significant. See for instance this article from 2015 where Hubble’s original graph is compared with a more modern graph. This more recent graph also reveals a slight nonlinear correction to Hubble’s law at very large scales that has led to the remarkable discovery that the expansion of the universe is in fact accelerating over time, a phenomenon that is attributed to a positive cosmological constant (or perhaps a more complex form of dark energy in the universe). On the other hand, even with this nonlinear correction, there continues to be a roughly 10% discrepancy of this law with predictions based primarily on the cosmic microwave background radiation; see the FAQ entry for 23:49. - 20:46 Does general relativity alone predict an uniformly expanding universe?
This was an oversimplification. Einstein’s equations of general relativity contain a parameter , known as the cosmological constant, which currently is only computable indirectly from fitting to experimental data. But even with this constant fixed, there are multiple solutions to these equations (basically because there are multiple possible initial conditions for the universe). For the purposes of cosmology, a particularly successful family of solutions are the solutions given by the Lambda-CDM model. This family of solutions contains additional parameters, such as the density of dark matter in the universe. Depending on the precise values of these parameters, the universe could be expanding or contracting, with the rate of expansion or contraction either increasing, decreasing, or staying roughly constant. But if one fits this model to all available data (including not just red shift measurements, but also measurements on the cosmic microwave background radiation and the spatial distribution of galaxies), one deduces a version of Hubble’s law which is nearly linear, but with an additional correction at very large scales; see the next item of this FAQ. - 21:07 Is Hubble’s original law sufficiently accurate to allow for good measurements of distances at the scale of the observable universe?
Not really; as mentioned in the end of the video, there were additional efforts to cross-check and calibrate Hubble’s law at intermediate scales between the range of Cepheid methods (about 100 million light years) and observable universe scales (about 100 billion light years) by using further “standard candles” than Cepheids, most notably Type Ia supernovae (which are bright enough and predictable enough to be usable out to about 10 billion light years), the Tully-Fisher relation between the luminosity of a galaxy and its rotational speed, and gamma ray bursts. It turns out that due to the accelerating nature of the universe’s expansion, Hubble’s law is not completely linear at these large scales; this important correction cannot be discerned purely from Cepheid data, but also requires the other standard candles, as well as fitting that data (as well as other observational data, such as the cosmic microwave background radiation) to the cosmological models provided by general relativity (with the best fitting models to date being some version of the Lambda-CDM model).
On the other hand, a naive linear extrapolation of Hubble’s original law to all larger scales does provide a very rough picture of the observable universe which, while too inaccurate for cutting edge research in astronomy, does give some general idea of its large-scale structure. - 21:15 Where did this guess of the observable universe being about 20% of the full universe come from?
There are some ways to get a lower bound on the size of the entire universe that go beyond the edge of the observable universe. One is through analysis of the cosmic microwave background radiation (CMB), that has been carefully mapped out by several satellite observatories, most notably WMAP and Planck. Roughly speaking, a universe that was less than twice the size of the observable universe would create certain periodicities in the CMB data; such periodicities are not observed, so this provides a lower bound (see for instance this paper for an example of such a calculation). The 20% number was a guess based on my vague recollection of these works, but there is no consensus currently on what the ratio truly is; there are some proposals that the entire universe is in fact several orders of magnitude larger than the observable one.
The situation is somewhat analogous to Aristarchus’s measurement of the distance to the Sun, which was very sensitive to a small angle (the half-moon discrepancy). Here, the predicted size of the universe under the standard cosmological model is similarly dependent in a highly sensitive fashion on a measure of the flatness of the universe which, for reasons still not fully understood (but likely caused by some sort of inflation mechanism), happens to be extremely close to zero. As such, predictions for the size of the universe remain highly volatile at the current level of measurement accuracy. - 23:44 Was it a black hole collision that allowed for an independent measurement of Hubble’s law?
This was a slight error in the presentation. While the first gravitational wave observation by LIGO in 2015 was of a black hole collision, it did not come with an electromagnetic counterpart that allowed for a redshift calculation that would yield a Hubble’s law measurement. However, a later collision of neutron stars, observed in 2017, did come with an associated kilonova in which a redshift was calculated, and led to a Hubble measurement which was independent of most of the rungs of the distance ladder. - 23:49 Where can I learn more about this 10% discrepancy in Hubble’s law?
This is known as the Hubble tension (or, in more sensational media, the “crisis in cosmology”): roughly speaking, the various measurements of Hubble’s constant (either from climbing the cosmic distance ladder, or by fitting various observational data to standard cosmological models) tend to arrive at one of two values, that are about 10% apart from each other. The values based on gravitational wave observations are currently consistent with both values, due to significant error bars in this extremely sensitive method; but other more mature methods are now of sufficient accuracy that they are basically only consistent with one of the two values. Currently there is no consensus on the origin of this tension: possibilities include systemic biases in the observational data, subtle statistical issues with the methodology used to interpret the data, a correction to the standard cosmological model, the influence of some previously undiscovered law of physics, or some partial breakdown of the Copernican principle.
For an accessible recent summary of the situation, see this video by Becky Smethurst (“Dr. Becky”). - 24:49 So, what is a Type Ia supernova and why is it so useful in the distance ladder?
A Type Ia supernova occurs when a white dwarf in a binary system draws more and more mass from its companion star, until it reaches the Chandrasekhar limit, at which point its gravitational forces are strong enough to cause a collapse that increases the pressure to the point where a supernova is triggered via a process known as carbon detonation. Because of the universal nature of the Chandrasekhar limit, all such supernovae have (as a first approximation) the same absolute brightness and can thus be used as standard candles in a similar fashion to Cepheids (but without the need to first measure any auxiliary observable, such as a period). But these supernovae are also far brighter than Cepheids, and can so this method can be used at significantly larger distances than the Cepheid method (roughly speaking it can handle distances of ~100 billion light years, whereas Cepheids are reliable out to ~10 billion light years). Among other things, the supernovae measurements were the key to detecting an important nonlinear correction to Hubble’s law at these scales, leading to the remarkable conclusion that the expansion of the universe is in fact accelerating over time, which in the Lambda-CDM model corresponds to a positive cosmological constant, though there are more complex “dark energy” models that are also proposed to explain this acceleration. - 24:54 Besides Type Ia supernovae, I felt that a lot of other topics relevant to the modern distance ladder (e.g., the cosmic microwave background radiation, the Lambda CDM model, dark matter, dark energy, inflation, multi-messenger astronomy, etc.) were omitted.
This is partly due to time constraints, and the need for editing to tighten the narrative, but was also a conscious decision on my part. Advanced classes on the distance ladder will naturally focus on the most modern, sophisticated, and precise ways to measure distances, backed up by the latest mathematics, physics, technology, observational data, and cosmological models. However, the focus in this video series was rather different; we sought to portray the cosmic distance ladder as evolving in a fully synergestic way, across many historical eras, with the evolution of mathematics, science, and technology, as opposed to being a mere byproduct of the current state of these other disciplines. As one specific consequence of this change of focus, we emphasized the first time any rung of the distance ladder was achieved, at the expense of more accurate and sophisticated later measurements at that rung. For instance, refinements in the measurement of the radius of the Earth since Eratosthenes, improvements in the measurement of the astronomical unit between Aristarchus and Cook, or the refinements of Hubble’s law and the cosmological model of the universe in the twentieth and twenty-first centuries, were largely omitted (though some of the answers in this FAQ are intended to address these omissions).
Many of the topics not covered here (or only given a simplified treatment) are discussed in depth in other expositions, including other Youtube videos. I would welcome suggestions from readers for links to such resources in the comments to this post. Here is a partial list:- “Eratosthenes” – Cosmos (Carl Sagan), video posted Apr 24, 2009 (originally released Oct 1, 1980, as part of the episode “The Shores of the Cosmic Ocean”).
- “How Far Away Is It” – David Butler, a multi-part series beginning Aug 16 2013.
- “How the Bizarre Path of Mars Reshaped Astronomy [Kepler’s Laws Part 1]“, Welch Labs, May 8, 2024. See also Part 2.
- “An ASTROPHYSICIST’S TOP 5 space news stories of 2024“, Becky Smethurst (Dr. Becky), Dec 26, 2024 – covers the Hubble tension as one of the stories.
- “Measuring the Earth… from a vacation photo“, George Lowther (Almost sure), Feb 22 2025.
- “How Did This Ancient Genius Measure The Sun?“, Ben Syversen, Feb 28 2025.
New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach
Timothy Trudgian, Andrew Yang and I have just uploaded to the arXiv the paper “New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach“. This paper launches a project envisioned in this previous blog post, in which the (widely dispersed) literature on various exponents in classical analytic number theory, as well as the relationships between these exponents, are collected in a living database, together with computer code to optimize the relations between them, with one eventual goal being to automate as much as possible the “routine” components of many analytic number theory papers, in which progress on one type of exponent is converted via standard arguments to progress on other exponents.
The database we are launching concurrently with this paper is called the Analytic Number Theory Exponent Database (ANTEDB). This Github repository aims to collect the latest results and relations on exponents such as the following:
- The growth exponent of the Riemann zeta function at real part (i.e., the best exponent for which as );
- Exponent pairs (used to bound exponential sums for various phase functions and parameters );
- Zero density exponents (used to bound the number of zeros of of real part larger than );
Information on these exponents is collected both in a LaTeX “blueprint” that is available as a human-readable set of web pages, and as part of our Python codebase. In the future one could also imagine the data being collected in a Lean formalization, but at present the database only contains a placeholder Lean folder.
As a consequence of collecting all the known bounds in the literature on these sorts of exponents, as well as abstracting out various relations between these exponents that were implicit in many papers in this subject, we were then able to run computer-assisted searches to improve some of the state of the art on these exponents in a largely automated fashion (without introducing any substantial new inputs from analytic number theory). In particular, we obtained:
- four new exponent pairs;
- several new zero density estimates; and
- new estimates on the additive energy of zeroes of the Riemann zeta function.
We are hoping that the ANTEDB will receive more contributions in the future, for instance expanding to other types of exponents, or to update the database as new results are obtained (or old ones added). In the longer term one could also imagine integrating the ANTEDB with other tools, such as Lean or AI systems, but for now we have focused primarily on collecting the data and optimizing the relations between the exponents.