P	U	S	Č	P	S	N
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Terrence Tao

Updates on my research and expository papers, discussion of open problems, and other maths-related topics. By Terence Tao

URL: https://terrytao.wordpress.com

Ažurirano: prije 1 tjedan 2 dana

A digestion of the Jacobian conjecture counterexample

Uto, 2026-07-21 23:04

The notorious Jacobian conjecture can be formulated concretely over the complex numbers as follows.

Conjecture 1 (Jacobian Conjecture) Let be a polynomial map in complex variables, whose Jacobian is a non-zero constant. Then is invertible (with polynomial inverse).

The condition that the Jacobian is non-zero is equivalent to being locally invertible. (The implication of local invertibility from non-vanishing Jacobian follows from the inverse function theorem; the converse implication can be derived from the Weierstrass preparation theorem, but is omitted here.) Also, from the fundamental theorem of algebra, once the Jacobian polynomial is non-zero, it must be constant. So the hypothesis “Jacobian is a non-zero constant” can be replaced with “ is locally invertible”. So the Jacobian conjecture can be viewed as an assertion that local invertibility implies global invertibility. The complex numbers can be easily replaced with other fields of characteristic zero by the Lefschetz principle, but I prefer to work in the concrete setting of the complex numbers.

It was recently shown (using the Fable AI) that the conjecture is false in three dimensions (and thus in higher dimensions as well):

Theorem 2 (Counterexample to conjecture) There exists a polynomial which has non-zero constant Jacobian, but is not invertible.

The conjecture remains open in two dimensions, and is easy to establish in one dimension.

The example can be stated completely explicitly: one can take

and one can verify by a brief calculation that

and

While this is an extremely quick verification, the construction presented in this fashion appears like a massive miracle. The polynomial has degree seven, so a priori the Jacobian ought to be a polynomial in three variables of degree as large as , so the fact that all non-constant coefficients of this polynomial vanish looks like a massive cancellation involving equations, which is much larger than the degrees of freedom for a generic degree seven polynomial map of three variables. So finding such a polynomial looks highly unlikely to be located by brute force.

The example has since been retroactively explained in more geometric terms. As a “digestion” exercise to myself, I sought to write this explanation with relatively little use of algebraic geometry, in a manner that minimizes the amount of “miracles” required, although there are still a few places where some remarkable phenomena occur.

It is convenient to use the local injectivity formulation, and to generalize the domain to an equivalent affine variety. Namely, we will show

Theorem 3 (Counterexample, reformulated) There exists an affine variety that is isomorphic to by polynomial changes of variable, and a polynomial map which is locally injective, but not globally injective.

Clearly one can get from Theorem 3 to Theorem 2 by composing with the isomorphism and using the previously mentioned fact that local injectivity implies non-zero constant Jacobian. Our objective is now to find data , that obeys three separate properties:

(a) is locally injective on .
(b) is not globally injective on .
(c) is isomorphic to by polynomial changes of variable.

The advantage of splitting the problem in to these three components is that we can build towards each of them separately.

(A pedantic remark: strictly speaking, in the arguments below, we not only replace the domain of by an equivalent variety , but also replace the range of by an equivalent variety . But the equivalence between and is a boring linear isomorphism ( will just be a hyperplane in a four-dimensional vector space ), so we do not highlight this aspect of the construction.)

It turns out that and can be built out of the operation of multiplication of low degree polynomials. Namely, consider the following three simple affine spaces:

The space of linear homogeneous polynomials of two complex variables .
The space of quadratic homogeneous polynomials of two complex variables .
The space of cubic homogeneous polynomials of two complex variables .

(The notation here refers to the symmetric power of a vector space .) Clearly these spaces are isomorphic to respectively. Furthermore, we have a multiplication map , mapping a pair of a linear polynomial and a quadratic polynomial to a cubic polynomial

(Right now, the domain and range of this map is larger dimensional than the target of three; we will cut the dimensions down to three as the argument progresses.)

The map , essentially a map from to , is clearly polynomial; it is given explicitly in coordinates as

The map also enjoys two basic (and commuting) symmetries:

If one applies a scaling for some non-zero complex numbers , then the product is scaled by : .
If one applies a change of variables for some invertible linear transformation , then the product is transformed by : .

So this map enjoys a huge amount of equivariance, basically with respect to an action of the five-dimensional group .

The five-dimensional domain is of course larger than the four-dimensional range , so the map clearly cannot be injective. This can already be seen from the scaling symmetry, as the specific scalings

for modify the linear and quadratic polynomials but not their product . But even if one quotients out by this symmetry (3) to cut the dimension of the domain down to four, the map is still not injective for the following basic reason. A generically chosen cubic polynomial will split into the product of three independent linear polynomials. Then there are three pairs

which all map to the same cubic polynomial

under the multiplication map , but are not related to each other by scaling symmetry (3). Thus, we see that even after quotienting out by the scaling symmetry (3), the multiplication map is generically non-injective in a three-to-one fashion. Thus we already have achieved something resembling goal (b)!

It will be convenient to “spend” the scaling symmetry to obtain a useful normalization. If is a linear polynomial and is a quadratic polynomial, the resultant can be defined by the determinant

If we have a factorization

then the resultant can also be described as

Thus the resultant measures whether the linear polynomial and the quadratic polynomial share a common root. A fundamental fact about resultants is that they are -invariant: for any , we have

One way to see this is to check it first for translations (which translate the roots by while leaving unchanged) and for inversions (which map to while mapping to and respectively), and then noting that these transformations generate all of . They also interact very nicely with scaling:

In particular, the scaling symmetry (3) multiplies by :

Thus, we can (generically) normalize away this scaling symmetry by imposing the condition

We now have a restricted multiplication map (which by abuse of notation we will continue to call ) from the four-dimensional variety

to the four-dimensional space . This map is still not globally injective, as we can take the three pairs in (4) from before and apply the scaling (3) separately to each of the three pairs to obtain the normalization (7). So we have kept property (b). Furthermore, this map retains the -equivariance (and also one remaining scaling symmetry, though we will not make much further use of that symmetry).

But we now also have property (a)! Suppose we want to show the local injectivity of in the neighborhood of a pair with . As the resultant is non-vanishing, the root of (which exists in the Riemann sphere, or projective line if you prefer) is distinct from the two roots of (though the latter two roots could be equal to each other). Applying the action (which performs Möbius transforms on the roots), one can assume without loss of generality that is the point at infinity (or equivalently ), thus for some complex number and for some complex numbers , with the resultant condition (7) simplifies to (so in particular are also non-zero). It is then clear that if one perturbs and by a small amount (say, modifying each coefficient by ), then the root of will perturb to something large (), while the roots of stay bounded. Thus, just from knowledge of the product , one can reconstruct which of the three roots of this cubic polynomial will be the perturbed root of , and which two will be the perturbed roots of ; from this and (6), (7) we can also reconstruct the leading coefficient of , and this completely determines both and . This establishes the local injectivity property (a). (In fact it is étale, but we will not need the machinery of étale maps here.)

Unfortunately, (the four-dimensional analogue of) condition (c) fails: the quadric hypersurface (8) is not isomorphic to the affine space . But we can try to get around this by passing to a three-dimensional slice. Let be some three-dimensional affine plane of (which we will take to avoid the origin for technical reasons), then we can restrict as a map from the set

to . is clearly identifiable (by linear changes of coordinate) to . As was already locally invertible, it remains locally invertible under restriction; and because generic cubic polynomials had three preimages under in (8), this continues to be the case after restricting to (9) (unless was somehow so degenerate that it had no generic elements, but this turns out to be impossible). So we have retained properties (a) and (b). The miracle is that, with a good choice of , we can also obtain (c) and obtain the desired counterexample to the Jacobian conjecture: despite appearances, the variety (9) is in fact equivalent to the affine space by polynomial changes of variable!

Let’s see how. The affine hyperplanes in avoiding the origin are parameterized by the dual space of avoiding the origin, which one can think of as the non-zero third order homogeneous differential operators in two variables. Indeed, every such operator generates an affine hyperplane that avoids the origin, and conversely by duality every affine hyperplane avoiding the origin arises in this form uniquely. Just as the cubic polynomials in can be factored into three linear polynomials, the differential operators in the dual space can also be factored into three linear differential operators, e.g.,

in the case that is non-zero. The action moves the roots around the Riemann sphere by Möbius transformations. As these transformations are -transitive, the actual selection of such roots is not too important (and the scaling symmetry similarly makes the choice of leading coefficient unimportant); the only thing to keep track of is whether the roots repeat. Up to the symmetries, there are in fact just three different equivalence classes of differential operator (and thus of affine hyperplane ) to consider:

Operators where the three roots are all distinct, thus for independent first-order operators .
Operators where two roots coincide and one is distinct, thus for independent first-order operators .
Operators where all three roots coincide, thus for some first-order operator .

It turns out that the affine miracle for (9) occurs precisely in the second case, when has two identical roots. I do not have a completely satisfactory geometric explanation for this miracle, but one can verify it by the following coordinate computation.

By applying the action, we can normalize so that , thus is now the affine hyperplane of cubic polynomials with . Using (2) and (5), the variety (9) can now be described explicitly in coordinates as

At first glance this seems to be a generic-looking variety cut out by a cubic equation and a quadratic equation – hardly a candidate to be affine! But observe that if is non-zero, then the second equation can be solved for ,

and the first equation can be solved for ,

Putting these two equations together, we see that as long as one removes the case , the quintuple is uniquely determined by by a change of variables which is Laurent in and polynomial in . Thus we have a nice birational equivalence

Thus we have already almost established property (c): the variety (9) becomes birationally equivalent to after cutting out the subvariety. In particular, for each fixed non-zero value of , the corresponding fiber

of (10) is equivalent to by polynomial changes of variable, since we can reconstruct from the coordinates by the polynomial formulae

So we just need to glue back in the fiber. Indeed, from (10) we see that the fiber at is just

Now we observe a key miracle: the cubic equation and quadratic equation have a unique affine solution (as opposed to the six possible solutions that Bezout’s theorem might suggest – the other five solutions live on the line at infinity). So the fiber here is also affine:

This is extremely encouraging for the purposes of establishing property (c), as it strongly suggests that the variety (10) has the structure of an -bundle over , which is already extremely close to being isomorphic to the affine space . The main remaining task is to make sure that nothing singular happens in the limit , and that a global polynomial coordinate chart for (10) that covers both the and fibers can be constructed.

The standard way to proceed here is to manipulate various tangent spaces using the modern machinery of algebraic geometry and commutative algebra, but given my own background, I prefer to adopt the language of analysis, and in particular big-O notation (in place of the ideals used in algebraic geometry), in order to investigate the limit by hand. On the variety (10), let us use to denote any multiple of by a polynomial expression in . Thus, for instance, the equation implies that

while the equation implies that

as well as the more refined estimate

In the case we could conclude that . Now we perturb this observation. Multiplying (13) by we have , which on substitution into (14) gives ; substituting this back into either (13) or (14) also gives .

We can get some more precise asymptotics by also taking advantage of (15). Substituting into (15), we obtain after some algebra

So if we write more explicitly as , then we have

and thus

Substituting this back into (11) gives an asymptotic for :

Finally, one can insert these estimates into (12), although one only gets a trivial bound in this case:

Expanding the error term in (16) as , and doing a little more algebra, we thus have a polynomial change of variables

which completely parameterizes the variety (10) by polynomial combinations of three coordinates . This already gives (a) and thus completes the proof of Theorem 3.

The previous computations, when expanded out, also gives polynomial inverse maps:

The map from to the coefficients of (dropping the coefficient which is constrained to equal ), we obtain a polynomial map

with

which theory predicts to have a constant Jacobian, and indeed one can calculate that the Jacobian is . This is essentially the original example up to trivial changes of variable; indeed, one can check that the map

is exactly the map given in (1).

AI disclosure: I used an AI chatbot to discuss various aspects of this problem and to confirm several of the calculations made here.

Kategorije: Matematički blogovi

Two more apps: visualizing the zeta process and the motions of the heavens

Pet, 2026-07-17 04:23

I believe that the creation of visualization apps to illustrate mathematical or scientific concepts is a particularly favorable use case for modern coding agents, as many of the downside risks attached to other LLM use cases are limited:

Not mission-critical. As such apps are not authorative sources of truth and only used for secondary purposes, a small positive error rate in the output can be acceptable.
Stand-alone. As the applets are not destined to be incorporated into a larger codebase or literature, the technical debt incurred by delegating all the coding to an LLM agent is bounded.
End product is deterministic (and sandboxed). As the applets run on a deterministic language (Javascript), are sandboxed against file or internet access, and do not make any LLM calls at run-time, security and privacy concerns are minimal, and the applet can be maintained without continued premium LLM access or resource-intensive compute.
Not replacing primary skills. While deskilling is the tradeoff one accepts when relying on these tools to accelerate output, I am perfectly willing to forego the opportunity to keep my Javascript skills at a high level, as this is a tertiary skill for me at best in my chosen profession. (I continue to manually program in Lean and in Python to keep in practice with programming in general.)
Not competing with humans. To my knowledge, there is no existing human effort that is being duplicated by these applets (the activity in this direction appears to have peaked two decades ago).

I would however caution against unrestricted LLM use when one or more of the above five favorable situations is not in effect.

With these points in mind, I have used such an agent to create two further apps. The first app illustrates the “zeta process” that was introduced in my recent paper with Alexeev, Barreto, Li, Lichtman, Price, Shah, and Tang, though it was first discovered by an AI. For each , the zeta distribution is a random natural number with distribution

It has long been known that this distribution has good number-theoretic properties: for instance, the number of times a given prime divides has a geometric distribution of mean . However, the new observation is that these random variables can be chained together into a single stochastic process, which we call the “zeta process”, which is an infinite divisibility chain. I used an agent to create an app to visualize this process:

The underlying process is generated by several exponential random variables at each prime: in the above instantiation of the process, two such variables are visible at the prime , and one variable at the primes . At a given choice of , is formed by collecting all the variables below this threshold (and for which all predecessors also lie below the threshold); in the above illustration, this amounts to one variable at each of the primes , leading to in this case. Additional visualizations in the app display the distribution of each , as well as the distribution of the hitting probability , which among other things can be used to give a quick solution to Erdős problem #1196.

The second app is rather different in nature, and is a somewhat whimsical attempt to display the motion of the heavens, both at “human” scales of space and time, and at more “astronomical” scales (in which the motion of the planets in particular are more apparent). It is very loosely inspired by the game “Katamari Damacy“, in which one absorbs both terrestrial and celestial objects of many different scales. Here is how the app typically looks at a human scale:

And here is how it looks when one’s perspective leaves the Earth’s atmosphere:

(As I did not want to render an entire explorable world in this app, the observer in the app is only limited to changing his or her size, from a human to a creature of comparable size to the Earth itself; they cannot move horizontally on the planet.) At the largest scales of space and time, the classic orrery diagram appears:

After lengthy conversations with the agent, I was able to implement many astronomical phenomena, including phases of the Moon, the effect of Earth’s rotation against the fixed stars (though one can also stabilize one’s view against those stars to see the Earth’s rotation more directly), and so forth.

Kategorije: Matematički blogovi

Visualizing the Gilbreath expectation sequence

Uto, 2026-07-14 19:16

One byproduct of learning how to use coding agents to create visualization apps is that it now becomes straightforward to convert any figure in one’s papers that had already been generated by code (e.g., in Python) into a more interactive, animated applet.

I can illustrate this with Figure 1 from my recent paper on the Gilbreath conjecture with Chase and Hunter, reproduced below:

This plot displays both exact and numerically simulated values of a certain poorly understood sequence relating to the Gilbreath conjecture, which I will call the “Gilbreath expectation sequence” here for lack of a better name. The definition of the sequence is as follows. Consider a “Gilbreath array” which is an inverted pyramid, where the top entries are independent exponential random variables of mean 1, and all the other entries are the absolute values of the differences of the two entries immediately above it. Thanks to the visualizer app, I can quickly give an example (with ):

The left diagonal entries are then random variables; the sequence are defined to be the expectation of these values. (The process is stationary, so in fact any entry on the row will have expectation .)

If one starts with the first normalized prime gaps (which have expectation about , and are conjecturally distributed asymptotically according to a geometric distribution), then standard conjectures (e.g., the prime tuples conjecture) predict that the row entries should decay like , at least for small . So the Gilbreath conjecture appears to be tied to how fast the sequence decays with .

One can in principle work out each value of as an explicit rational number by performing a certain complicated multivariate integral, but in the paper we only did this for (the orange line in the above figure); for the remaining we performed a Monte Carlo simulation with Gilbreath arrays to obtain a numerical approximation (in blue), which (as per the law of large numbers) agreed well with the theoretical values. A later calculation of Michael Ross extended the theoretical values to , maintaining the good fit:

The asymptotic behavior of the sequence remains mysterious. Clearly, it is not monotonic; in fact we cannot even prove it is bounded. The best we could do in our paper was establish an inequality which, roughly speaking, showed that cannot decay faster than .

In a recent preprint of Ross, these numerics were extended, and a rough empirical prediction

was proposed for some constants and (empirically ), where is the number of 1’s in the binary expansion of ; in particular, it is the fluctuation in this quantity that is intended to explain much of the non-monotonic behavior of . These are now all displayed in the following companion applet, which was a routine matter to generate in about an hour by the coding agent (which by this point has extensive experience with creating such apps, encoded via a “skill” markdown file that it maintains):

The appearance of the quantity may initially appear mysterious, but it is related to Lucas’s theorem, Kummer’s theorem, and the Sierpinski gasket. Consider for instance a Gilbreath array where all the entries are zero except for a single “spike”. Then the following Sierpinski pattern emerges:

Here is what an version of this picture looks like (with the spike positioned at the 32th entry):

The number of 1s in the row is then (if we index the rows starting from zero), which is at least of the same shape as the empirical prediction, albeit with different constants. (This sequence is also known as Gould’s sequence.)

Numerically, we seem to observe fragments of Sierpinski gaskets being generated before decaying (often due to “collisions” with other gaskets):

However, it is not clear to me at all what the asymptotic probabilistic model should be, even heuristically; it does not resemble any random shape model that I am familiar with. But perhaps there are readers more expert in probability theory or statistical physics who may be able to suggest such an asymptotic limit?

Kategorije: Matematički blogovi

Call for long programs, workshops, and summer schools at IPAM

Uto, 2026-07-14 17:51

(I am writing here in my capacity as Director of Special Projects at IPAM.)

IPAM seeks program proposals from the mathematical, statistical, and scientific communities for long programs, workshops, and summer schools. Most program proposals are reviewed at IPAM’s Science Advisory Board meeting, held in November each year. Programs are selected on the basis of their scientific impact and contribution to IPAM’s goals. IPAM is committed to supporting a community where people of all backgrounds and points of view can engage, learn, and thrive. If you would like to discuss your program ideas and prepare a proposal for IPAM’s consideration, you are encouraged to contact the IPAM Director. For more information visit: https://www.ipam.ucla.edu/propose-a-program/long-programs-2/

Kategorije: Matematički blogovi

A paper diagram visualizer

Uto, 2026-07-14 07:34

I am finding the newly revealed capability to code old applet ideas into reality to be very tempting to sink more time into, though I am certainly encountering the common “vibe coding” experience that the process can produce something that superficially resembles a finished product well before a satisfactory level of testing and review has been completed; indeed, it is the review process which is now the most time-consuming, to the point where I think any further advances in coding agent capability will have little impact on the new bottlenecks in the design process.

In any event, I spent a few hours working to realize a proposal I had made back in 2023 to automatically create diagrams to visually illustrate the logical flow of a given mathematical paper. At the time, Freddie Manners, extrapolating from the half-decent capability of the then-newly released ChatGPT 3.5 at this task, presciently predicted that “by the time a dedicated tool had been completed, the next general purpose engine would be better than it”.

With that in mind, I decided to focus not on the generation of the diagram – which now can be done at various levels of quality by any number of large language models – but on its presentation. The result is the following app, which can take a certain formatted JSON file of dependencies between theorem objects and produce an interactive graph which can be explored, edited and also exported (somewhat lossily) into other standard formats such as SVG, TikZ, quiver, or Mermaid. Here is a screenshot of a diagramming of the celebrated proof by Wang and Zahl of the three-dimensional Kakeya conjecture:

Using an LLM, I generated diagrams for eight papers for demonstration purposes, including for instance a diagram for Wiles’s proof of Fermat’s last theorem, or of Szemeredi’s proof of his famous theorem on arithmetic progressions (which sports a notoriously convoluted such diagram in the original paper), as well as a few papers of my own. If there are other requests to diagram particular papers, I can try to use an LLM to generate more examples; but my intention is for users of the app to create their own such diagrams, either by manually constructing them, or by directing their own AI tools to build the diagram in the required format (which is a JSON, with the precise specification given here).

I mentioned in the previous post that for these sorts of visualization apps, which work deterministically for a given set of inputs, the downside risk of LLM use to build the app is acceptably low. For this particular app, there is a complicating factor, which is that while the app does remain deterministic, the data I used to populate the app – namely, the above diagrams – are also LLM-generated. I have done spot checks comparing the diagrams against the source papers, and did not find any errors; however, they are not guaranteed to be 100% accurate, and should only be used as approximations to the logical structure of these papers rather than completely exact representations. (The latter might become deterministically extractable should the results of these papers become formalized in a proof assistant language, but this is not currently the case.) Still, I hope these sorts of diagrams can serve as a helpful initial guide when first trying to read and understand a complex paper.

Kategorije: Matematički blogovi

A random variable visualizer

Pon, 2026-07-13 02:14

With the advent of modern coding agents, many visualization projects that I had proposed in the past, but dropped due to the time and complexity of the coding portion of the task, have now become relatively feasible, in that a reasonable quality prototype (suitable for non-mission-critical tasks such as providing secondary visual aids, where it is not absolutely necessary that the product is 100% bug-free) can now be generated in a matter of hours using such tools. I don’t immediately plan to work on my entire backlog of such projects, but I did spend a few hours this weekend on one such project, namely the proposal from this 2016 blog post to visualize random variables as animated quantities, which can be either viewed numerically or displayed as a scatterplot. After some back and forth with the coding agent, I was able to come up with a working app. Here is a screenshot of the app displaying a visualization of Berkson’s paradox, which asserts that independent variables can become correlated to each other after applying a conditioning:

The app in fact is a “compiler” for a small, custom programming language (think of a simplified hybrid of Python and Excel) which allows for the introduction of random variables, manipulates them through operations such as arithmetic operations or conditioning, and then plots them as animations or as text. (The screenshot above is static, but when the app is live, it will update at the indicated speed.)

As always, I would be happy to receive feedback on the app, which I hope can be useful as a visual aid to understand basic probabilistic concepts such as independence or conditioning.

Kategorije: Matematički blogovi

Old and new apps, via modern coding agents

Ned, 2026-07-12 00:48

I have been interested in machine-assisted ways to do and teach mathematics from as far back as 1999, when I started coding several applets in Java 1.0, both for my complex analysis and linear algebra courses, as well as to visualize various mathematical objects I was interested in (such as honeycombs or Besicovitch sets). This was moderately successful; but the applets were time-consuming to program. Eventually, the standards for web pages stopped supporting this version of Java, and the applets became non-functional.

However, in the last few days I have begun the process of migrating much of my old web page and blog data to a more maintainable repository, using modern AI assistance. As an experiment, I asked the agent to port my old applets to a modern supported language (we landed on Javascript), and it managed to do so in a matter of hours, with all of my old applets now functional again, with even a few graphical upgrades (for instance, the Besicovitch set applet is now colorized, in contrast to my original monochrome version). I am particularly pleased to see the honeycomb applet that I wrote with Allen Knutson in 1999 come back to life, as this was a particularly tricky one to code by hand:

Notoriously, LLM-based coding agents can create various blatant or subtle bugs in their code; but in the porting of these two dozen or so applets, I could only find one minor bug (the handling of a drag event in one of the complex analysis applets had unwanted behavior when dragging outside of the main box), and in fact the agent identified two bugs in the original code that I was not aware of, so it ended up being a net wash as far as code quality was concerned. In any event, as these applets are meant to be secondary visual aids rather than critical components of a mathematical argument, the downside risk of such bugs is relatively low.

The process was painless enough that I decided to also try coding some new apps, in addition to porting the old ones. Back in 1999 I had an ambitious idea for a visualization tool for special relativity; this was before the release of the software tool Inkscape, but the idea I had in mind was basically “Inkscape, but in Minkowski space”. I had even started writing Java code for this app, but the code complexity became too much for me, and I abandoned the project. However, after a couple hours of “vibe coding” with an AI agent, I was finally able to generate an applet that matched the vision I had back in 1999, which can now be found here. A summary of the conversation I had with the agent to generate this code can be found here (it has been edited down to remove a large number of tedious technical implementation reports). While I have playtested the app somewhat, I would be interested in receiving further feedback on this “alpha” version of the applet, as I am sure (especially given the LLM-generated nature of the code) that there are still some bugs and rough edges to be ironed out.

After writing my blog post on the Gilbreath conjecture paper earlier today, I realized that I could similarly ask the agent to code a visualization tool for the Gilbreath conjecture to accompany the paper and blog post. After another few hours of conversation, this is now done; you can try out the visualization here. Again, the procedure was quite painless (see this transcript of the process), and I think I may add such interactive visualizations as supplements for future papers; as such supplements are not mission-critical to the core of the paper, I again feel that the downside risk of using guided interaction with LLM agents to generate such visualizations is acceptable.

Kategorije: Matematički blogovi

Gilbreath’s conjecture: a Cramér random model and a deterministic analysis

Sub, 2026-07-11 20:59

Zachary Chase, Zach Hunter and I have uploaded to the arXiv our preprint Gilbreath’s conjecture: a Cramér random model and a deterministic analysis. This paper is motivated by a notorious conjecture of Gilbreath (also proposed eighty years prior by Proth), which one can state as follows: if one starts with the sequence of primes and repeatedly takes absolute differences of consecutive terms, then the first term of each subsequent row is always :

Coming from a PDE background, I like to think of this conjecture as a (discrete) nonlinear “wave equation” problem, where the primes are the “initial data”, the downward direction in the above pyramid is the arrow of “time”, and the “equation of motion” is that the value of the “scalar field” at any given point in “spacetime” is the absolute difference of the values of the two points directly above it. We will informally refer to solutions to such an “equation” as “Gilbreath arrays”.

Numerically, the conjecture has been verified for the first rows by Odlyzko. Asymptotically, the conjecture can be heuristically justified as follows. Firstly, because all primes other than are odd, it is easy to see that the first term of each row is odd, while all other terms are even. Next, if one starts with the first primes for some large and takes initial differences, then the prime number theorem tells us that the average size of the next row is about , and Cramér’s conjecture predicts that the maximum size should be . With each new row, the maximum size can only decrease (since for any natural numbers ), and so one would expect it likely on each row that the maximum size should drop by at least (unless it has already reached ). Since there are rows to go before one reaches the end, it seems extremely likely that the maximum size should drop down to at most by then, at which point the result is forced from parity reasons.

However, it seems well beyond current technology to try to make these heuristics rigorous; even the first step of proving Cramér’s conjecture is far out of reach. In our paper, we consider two more feasible directions:

What is a realistic probabilistic model of the primes, and can one confirm the (asymptotic version of the) conjecture almost surely for such a model?
Can one use deterministic arguments to reduce the (asymptotic) Gilbreath conjecture to more tractable looking (and heuristically plausible) statements about iterated differences of primes?

Let us first discuss the question of analyzing probabilistic models. One can strip away the first row and initialize using prime gaps rather than primes; it is convenient to also strip away the aforementioned parity structure, by eliminating the initial gap , and dividing all remaining gaps by , so that one now works with an initial sequence with no parity bias. The conjecture is now equivalent to the first row always being -valued:

The Cramér model suggests that the first normalized prime gaps should behave like geometric random variables of mean about . My co-author, Zachary Chase, established an analogue of the Gilbreath conjecture for a more slowly growing model. Here is a special case of his main theorem:

Theorem 1 Suppose the initial row entries of a Gilbreath array are drawn independently from a uniform distribution on for some . Then almost surely, all but finitely many of the rows have a -valued first entry.

The Cramér model morally corresponds to a value of comparable to , which is too large for the above theorem to apply. However, we were able to improve the argument, basically allowing to be anything of size . Furthermore, it was not necessary that the distribution be uniform: the important hypothesis was that the distribution not be concentrated in any -separated set, such as the even numbers, the odd numbers, or the multiples of . (See the paper for the precise formulation of “non-concentrated”.) Such a hypothesis is needed since if for instance all initial entries were divisible by , then this property would propagate down the array, and it would become extremely unlikely that the initial values would remain -valued. Our hypotheses are obeyed by the Cramér random model, and so we obtain a heuristic confirmation of the original Gilbreath conjecture for the primes.

One can informally explain our proof of the above result as follows. We consider the portion of the array generated by the first values for some large . Suppose that at some point deep in this portion of the array, a value that is larger than is attained. Then the two values above must satisfy the equation . So, either one of these values is at least , or one of them is and the other is . If one iterates this observation, one sees that is the base of an upside-down triangle of values, topped off by at least one location where the value is at least . If one iterates that observation in turn, we see that forms the base of a “tower” of upside-down triangles stacked atop each other, with the number of such triangles bounded by the maximum size of the initial data (in the “backwards light cone” of ). In the regime, it turns out that the number (or “entropy”) of such towers is subexponential in . So if we can show that each tower only can be created with an exponentially small probability, we can conclude by the standard techniques of the union bound and the Borel–Cantelli lemma.

At this point we use the following elementary observation. Suppose that some finite Gilbreath array coming from say initial data has been generated, and consider the effect of adding a new value to the initial data, which then triggers iterations of the absolute value difference operation for various values of until one reaches the new bottom vertex of the array. This difference operation has the property that the preimage of any -separated set is still -separated. Iterating this, we see that the set of values that make iterate to a -valued bottom vertex is also -separated. So as long as the distribution of avoids -separated sets, one can iterate this observation in to show that it is exponentially rare that large triangles of -valued vertices can be created.

We also consider an asymptotic continuous random model, in which the initial data are not natural numbers, but instead independently random non-negative real numbers with an exponential distribution, which we can normalize to have mean ; this heuristically is an approximate model for the Gilbreath array generated by the first normalized prime gaps, after dividing by the mean . In this normalized model, each entry of the row ends up having the same mean . The first few values of can be computed explicitly

However, the asymptotic behavior of remains unclear to us. We were able to show an inequality for any , indicating that cannot decay faster than , but we do not know whether this is the true decay rate. In any case a decay rate of (which is very weakly supported by numerical evidence) is consistent with the Gilbreath conjecture, as it would indicate that the Gilbreath array from the first prime gaps should end up being almost entirely -valued by merely steps, well before the steps needed to reach the bottom of the array.

Now we turn to deterministic analysis of Gilbreath arrays. Suppose we found some initial data that did not grow too quickly (e.g., one had a Cramér-type bound ), but still iterated to a final value that was not . What features of the initial data could generate such a failure of a Gilbreath-type conjecture? One way in which the conjecture could fail is if the Gilbreath iteration somehow produced a reasonably long consecutive string of zeroes (say, longer than ), as then the next few iterations would not act to decrease the magnitude of the non-zero entries bordering this string of zeroes. Such a scenario would be heuristically rate, as the parity of each element of the array can be worked out explicitly using the parity identity , and so constant-parity sequences of length say should be almost surely non-existent asymptotically by standard probabilistic heuristics.

Another bad scenario is if the Gilbreath iteration, after some medium number of iterations, produced an extremely long consecutive block (say of length ) which was entirely -valued for some . This block would then persist as a -block for a large number of iterations (equal to the length of the block), thus potentially delaying for a significant time the drop-down of the maximal value to below . For odd , one can use the parity analysis alluded to earlier to argue that the formation of such a block is extremely unlikely; but for even , we can only use such heuristics if we make strong assumptions of joint independence, as we did in the probabilistic analysis in our paper.

In any event, we were able to use purely elementary methods to establish an “inverse theorem” that states, roughly speaking, that the above two scenarios are the only ways in which a Gilbreath array can fail to have a -valued first entry. This basically arises from a more careful analysis of the towers of triangles alluded to earlier. (A previous argument involved considering ways to pack a large triangle by smaller triangles, leading to a MathOverflow question which was nicely answered by Fedja Nazarov and Anders Martinsson, but we later managed to optimize the argument to the point where the answer to this packing question was no longer needed.) So this in principle reduces the (deterministic) Gilbreath conjecture to several more tractable-looking (though complicated to state) assertions, though proving those latter statements seems well out of reach at the moment.

Kategorije: Matematički blogovi

A digestion of unit distance constructions

Pet, 2026-07-03 18:18

Suppose that one has a set of points in the plane, which we will think of as the complex plane . Let denote the number of unit distances determined by these points, i.e., pairs of points whose displacement obeys the equation

(It makes little difference for the asymptotics, but we will count the pair separately from here.)

The Erdös unit distance problem asks, for a given large number , what is the largest possible value of amongst all sets of cardinality ?

For instance, if one takes to be equally spaced collinear points with unit spacing, one can obtain a linear construction with . Erdös observed that one can improve this construction asymptotically:

Theorem 1 (Erdös construction) There exists point sets of arbitrarily large cardinality such that for some absolute constant .

In fact, in the construction one could take arbitrarily close to . Erdös famously asked whether had to be bounded above by ; and for decades there was significant effort expended on upper bounding , with the best known upper bound being , established by by Spencer, Szemerédi, and Trotter in 1984. We will note here that it seems extremely difficult to improve this upper bound. One reason for this is that if one replaces the equation (1) with the superficially similar equation

(i.e., replace the unit circle by a standard parabola), then the bound is best possible, as can be seen by taking to be a rectangle in the Gaussian integers of width and height . Hence any improvement of the bound would have to exploit some special property of the unit circle that is not shared by the parabola.

It came as some surprise recently when a team from OpenAI resolved the question of Erdös:

Theorem 2 (OpenAI construction) There exists point sets of arbitrarily large cardinality such that for some absolute constant .

The optimal value of is still unknown, but the best upper and lower bounds on are tracked at this page; currently we know that .

The construction in Theorem 2 is a heavily modified version of that in Theorem 1, and uses some non-trivial amount of algebraic number theory, in particular the device of Golod–Shafarevich towers of field extensions. However, it was later observed using the Mythos AI that one could get a weaker bound with less algebraic number theory, which after optimizing parameters yields the following intermediate result between Theorem 1 and Theorem 2:

Theorem 3 (Mythos construction) There exists point sets of arbitrarily large cardinality such that for some absolute constant .

Furthermore, by inserting Golod–Shafarevich towers back into the Mythos construction, one can recover the full strength of Theorem 2.

These results already have a number of expositions; see for instance this article of Alon et al., or this blog post of Bloom. As an exercise for myself, I recently spent some time trying to “digest” these constructions and place them on a common footing, with an emphasis on trying to find the minimal route to either heuristically or rigorously recovering these results relying on as little algebraic number theory as possible. The post here is a writeup of this exercise. (Disclosure: AI tools were useful for providing initial summaries of these arguments, as well as on explaining various fundamentals of algebraic number theory to me.)

The first (trivial) observation is that one can use rescaling to replace the unit distance by any other fixed distance. In particular, for any positive real , if we let denote the number of pairs whose displacement obeys the equation

then it is clear that any construction of a point set with a given value of can be rescaled to another point set of the same cardinality with the corresponding value of . It turns out to be convenient to work with values of that are asymptotically large, for instance the product of several large primes.

All the constructions of good point sets basically involve taking all the elements of a certain ring of algebraic integers up to some height. In the original construction of Erdös, was chosen to be the ring of Gaussian integers , but in fact any ring of integers in a non-trivial bounded degree field extension of would suffice to recover Theorem 1 (though always with the constant not exceeding ). To go beyond this, one has to start considering number fields of unbounded degree. As it turns out, the field extensions arising from Golod–Shafarevich towers are the most efficient for this purpose, and lead to Theorem 2; but one can work with the more elementary construction of number fields generated by many square roots of medium-sized primes, and this suffices for the intermediate result in Theorem 3.

The numerology can be explained as follows. Take to be a ring of integers in some number field of degree , and suppose for sake of argument that is the product of (rational) primes , which for simplicity we will assume to all have comparable magnitude, thus for some and all . Thus, is roughly of the size of . In practice one wants to impose some additional “splitting” conditions on these primes , but the prime number theorem, as well as variants such as the Chebotarev density theorem, suggest that we should be able to keep reasonably close to in size; for instance, if we select primes greedily then we can have . In particular we expect to have in practice.

By construction, splits into the product of rational primes. Moving up to the degree extension, one can optimistically hope that splits further into the product of primes in . Using conjugation symmetry, these primes might split into conjugate pairs . By selecting one element from each pair and multiplying, this generates solutions to (3) in . These solutions will of course have complex magnitude ; one can optimistically hope that they in fact have “height” in some sense.

To take advantage of this, take to be the set of points in of height . As has rank , we therefore expect the size of this set to be roughly

(For this heuristic discussion I will be deliberately vague about what the symbol means.) Meanwhile, using our solutions to (3), we expect to have

But this can be clarified by the heuristic (4). Taking logarithms, we expect to have

In the regime where the degree of the number field is held fixed, we thus expect to exhibit logarithmic type growth in , and on inserting this back into (5) we (heuristically) recover Theorem 1 (with the natural constant ). In fact it is not hard to turn the above heuristics into a rigorous argument, by setting equal the Gaussian integers and selecting all the primes to be , so that they split completely in by the Fermat two-square theorem.

But if one can permit the degree to grow in the construction, and in particular be superpolynomial in , then the above heuristics suggest that we can start improving upon Theorem 1 , and even get all the way to Theorem 2 if we can make the degree go to infinity while keeping the number of primes fixed.

If one naively tries this approach by forcing all the primes to split completely in a very high degree number field, one runs into significant technical difficulties, not least of which is the need to obtain good error terms in the Chebotarev density theorem, which touches upon such difficult questions as the Generalized Riemann Hypothesis and the existence of Siegel zeroes. From an algebraic number theory perspective, this is related to the breakdown of unique factorization in such number fields, as measured by the class group. The size of this group is in turn controlled by the discriminant of the field, as per the fundamental theorem of Minkowski in this subject.

But one can hope that the arguments are robust enough to tolerate a little bit of breakdown in unique factorization, so long as the class group is not too large. The most natural way to do this is to use all the standard machinery of algebraic number theory, such as the unique factorization of ideals. But there turns out to be a more elementary (though largely equivalent) approach, which is to weaken the target equation (3) to a congruence equation

This condition does not pin down the value of completely, but so long as we can keep the height of not too much larger than , it does restrict to a sufficiently small set of possible values that a simple application of the pigeonhole principle can allow one to conclude.

In order for this strategy to work well, one needs to locate high-degree number fields of controlled discriminant for which it is relatively easy to at least partially split one’s rational primes into ideals in this field . It turns out that requiring the field to have a tower structure and admit complex multiplication (which basically amounts to it including ) is already sufficient to get a satisfactory amount of splitting. To control discriminants, the most efficient choices are the Golod–Shafarevich towers, for which the (root) discriminant stays bounded; but a more naive choice of a tower of quadratic extensions also gives reasonable control on discriminants and is sufficient to establish Theorem 3.

The three constructions thus sit on a continuum, with the key differences being the selection of the key parameters (the number of primes multiplied together) and (the degree). The Erdös construction keeps the degree fixed and sends the number of primes to infinity. The OpenAI construction does the opposite, keeping the set of primes fixed but sending the degree to infinity. The Mythos construction is a compromise, in which the degree and the number of primes both go to infinity in a coupled fashion. In particular, one could easily imagine an alternate timeline of events in which the Mythos construction was the first to be discovered (by either humans or AI) after the Erdos construction as a reasonably natural modification of the latter, and then subsequently refined (again either by humans or AI) to the OpenAI construction once the significance of Golod–Shafarevich towers was realized.

In this recent paper of Pohoata, the terms “horizontal amplification” and “vertical amplification” were proposed for the technique of constructing large configurations by increasing and , thus the Erdös construction becomes a paradigm for horizontal amplification while the OpenAI construction becomes a paradigm for vertical amplification (and the Mythos construction utilizes both types of amplification). See also this paper of Bloom-Sawin-Schildkraut-Zhelezov for another recent application of vertical amplification.

— 1. Some more details —

Here we sketch how the quadratic extension approach can recover Theorem 3.

As indicated above, we will work with a product of (rational) primes of size . Our only requirements of these primes, beyond their size, will be that they are distinct and equal to mod , so that they split in the Gaussian integers. By the prime number theorem in arithmetic progressions, this allows us to take as small as , so in particular .

To construct , we start by taking distinct medium-sized (rational) primes of size for some medium-sized parameter (eventually, when we optimize parameters, we will take to be a small multiple of ). We will not need any further properties of these primes, so by the prime number theorem we can take as small as , so in particular . We will take to be larger than , so that the primes are distinct from the primes .

We will work in the number field
generated by and real square roots . For instance, if and , a typical element in this field would take the form

In this particular example, the ring of integers would consist of those elements (8) for which are rational integers. In general, the ring of integers can be slightly larger than this, but for our purposes we can just work with the “naive” ring of integers generated by and . A typical element of this ring then looks like

where are rational integers and we adopt the convention that . Let us define the (naive) height of such a ring element to be . Then the number of elements of of height at most is as long as is sufficiently large (again I will be vague about what means here).

Our main goal is to find a large number of solutions in to the congruence equation (7). As per the usual Minkowski embedding based on the various ways to embed into or , it is convenient to think of as a lattice in a -dimensional vector space, which (due to the presence of , which excludes purely real embeddings) is naturally thought of as the product of copies of . For instance, in the running example, one can identify a ring element with an element

of , and with this embedding becomes a lattice in . In general, the embedding of into is essentially the Walsh–Fourier transform, weighted by the various square roots of . (The appearance of the Walsh-Fourier transform reflects the fact that the specific number field we are working with is an abelian Galois extension with Galois group .) Because of this, one can readily compute the covolume of the lattice, which up to lower order terms is basically , which with our construction can be crudely bounded by . As long as we keep small compared to , this covolume will be small compared to and will end up being a lower order term.

The reason we care about the covolume (whichis essentially the square root of the discriminant) is because of Minkowski’s theorem, which we will use in this crude form: Minkowski’s theorem: any lattice in a -dimensional space of covolume will contain a non-zero lattice vector of length . Thus for instance will contain some vector of length , and any sublattice of of index will contain a vector of length , and thus also of height by inverting the Walsh-Fourier transform.

Anyway, suppose that we can find some sublattice (in fact, they will be ideals, but we will not need this) of for which we can obtain the inclusion

that is to say one has for all . Then clearly any element of will obey the congruence (7). An obvious choice of would be , but this is way too sparse: this lattice has index in , so the shortest vector one can hope to locate in it will have height , which is too large for our purposes. Instead, we would like to have index (which is the smallest it can be while still yielding the inclusion (9)); then will contain a non-zero element of height , which means that is equal to times an element of of height . The number of such elements is , which will end up being a lower order term that we can easily pigeonhole away.

We claim that we can find at least different sublattices of index that obey (9). To verify this claim, it is a straightforward matter to use the Chinese remainder theorem to work “prime by prime”. Indeed, it suffices to show for each of the primes dividing , that there are different sublattices of of index that obey the inclusion

We can descend now to the finite ring , which is a -dimensional vector space over the finite field . The complex conjugation operation descends to an involution on this vector space, and we are looking for subspaces of dimension with the property that

So now we just need to understand the structure of . The general Wedderburn–Artin theorem tells us that this ring is the product of finite fields, but we can be much more explicit here in this specific situation. Exactly as Minkowski embedding maps rings in number fields into product of copies of and associated to the real and complex embeddings of the ring, we can also embed into a product of copies of and , depending on how we assign square roots to or in or the quadratic extension . We can illustrate this with the running example. As mod , the square roots of in stay in . Suppose first that also splits into square roots in . Then we can embed into by mapping

By counting elements we see that this embedding is in fact an isomorphism. If instead does not split, so that now lie in , then we can embed into by mapping

thus we drop half of the previous embeddings as being conjugate to the half that we retain. Again, counting shows that this is an isomorphism.

In general, one can show that is isomorphic to either (if all the split in ) or (if at least one of the does not split). Furthermore, in the former case the copies of organize into conjugate pairs (corresponding to flipping to ), and in the latter case the copies of organize into conjugate pairs. By selecting one element from each conjugate pair, and taking to be the joint kernel of such elements, we can generate either or different subspaces of dimension with the desired property that . By the aforementioned Chinese remainder argument, this gives the claimed lattices of index .

Invoking Minkowski’s theorem, this now generates non-zero vectors of height obeying (7). There is a technical issue that some of these vectors could conceivably collide with each other; however there is a further Chinese remainder theorem argument (which I will omit here) that shows that any such can belong to at most such lattices. So the number of distinct generated by this argument is at least , and by pigeonholing one can now also get at least at least solutions to the equation

of height for some of height . As long as we select

then the type factors can be neglected, and we approximately have

up to lower order terms. If we choose to be a small multiple of , then we soon calculate that , and we recover Theorem 3.

Kategorije: Matematički blogovi

Third SAIR competition: inverse Galois challenge

Uto, 2026-06-16 18:11

I am happy to announce the third SAIR challenge, which is focused on obtaining numerical data for the infamous inverse Galois problem. This is a collaborative project with the L-functions and modular forms database (LMFDB), and is organized by John Jones, Jen Paulhus, David Roe, Andrew Sutherland, and myself. The challenge is somewhat similar to my own Equational Theories Project, in that one is trying to complete a large mathematical data set in a verified fashion, except that the target data set had an existing mathematical interest. Also, the verification will be done by MAGMA (as well as PARI/GP) rather than Lean.

Let me first quickly review the inverse Galois problem. Suppose one has an irreducible polynomial of one variable of some degree and integer coefficients; take for instance . Then will have distinct roots ; in this case the roots happen to be

The roots generate a splitting field over the rational numbers . Any automorphism of this splitting field must permute the roots , and thus generates a subgroup of the permutation group (defined up to relabeling of the roots), which we call the Galois group of . This is some subgroup of that acts transitively on the roots (because each root generates the field). Typically, it is all of ; but occasionally it is smaller. For example, the particular cubic polynomial above has the special property that each root individually generates the entire field , thanks to the identities

Because of this, the Galois group of is the cyclic group (or equivalently, the alternating group ), rather than the full symmetric group . (This is in contrast to, say, , whose roots , , cannot be expressed as rational polynomials of each other, and whose Galois group is all of .) In fact, in the cubic case, it turns out that the Galois group is when the discriminant is a perfect square, and otherwise.

More generally, we have

Problem 1 (Inverse Galois Problem) Let be a transitive permutation group on letters. Can be realized as the Galois group of some degree irreducible polynomial with integer coefficients (after identifying the roots of suitably with the letters)?

The answer to this problem is known to be positive for , with the single possible exception of the sporadic Mathieu group : there are transitive permutation groups on letters (cf. OEIS A002106), and for of them, a polynomial has been located with that Galois group; see this database of Klüners and Malle. The problem of locating a polynomial with Galois group is a notorious open problem, though this is likely to be quite a difficult problem, and not the objective of the SAIR challenge.

Instead, we will focus on “breadth” rather than “depth”, in order to leverage the power of crowdsourcing and modern AI technologies. It turns out that there are distinct transitive permutation groups on letters, which are conventionally labeled from (the cyclic group ) to (the permutation group ). The first stage of the challenge will be:

Problem 2 (First stage of SAIR challenge) For as many of the groups , , locate an integer polynomial with that Galois group (up to isomorphism). (Also of interest is to specify the number of real roots, and to keep the discriminant low; more on this later.)

The verification side of this problem is essentially solved: the MAGMA computer algebra system can take any candidate polynomial and locate its Galois group within seconds. The MAGMA team has kindly granted SAIR a limited license to provide an API for contestants to calculate a certain number of Galois groups per day without needing to purchase their own license, though of course they are free to use their other computational tools to also perform these calculations outside of the competition.

The LMFDB already has polynomials for 286 of the 25000 groups, so there is plenty of remaining polynomials to claim in the challenge.

For applications, it is of interest to track some other statistics of a polynomial besides its Galois group. One of these is the number of real roots, which is a number between and of the same parity as (and which has to be achievable as the number of fixed points of one of the permutations in the Galois group, namely the one corresponding to complex conjugation); in particular, this number must be even in the degree case. Combining the label of the Galois group with the number of roots turns out to generate pairs in degree , and the challenge is actually to attach polynomials to as many of these pairs as possible. (The LMFDB has already done so for just of these.)

Of course, there are infinitely many polynomials of degree , and any Galois group that is representable by one polynomial, will be representable by infinitely many others (e.g., one could simply translate the polynomial by an arbitrary integer shift). To avoid creating an unusable database filled with uninteresting polynomials, we will prioritize polynomials whose (absolute) discriminant is as small as possible. (There are some technical details as to how this discriminant is defined and computed; see this page for details). The way we have set things up, each pair will come with a leaderboard for the polynomials with the smallest discriminants that have been located so far by contestants, removing duplicates arising from trivial operations such as translating the polynomial. Contestant team will be awarded a score between and for each submitted polynomial based on how small their discriminant is compared to the best known discriminant, and how many other teams were also able to find a polynomial with that pair. Thus, pairs that are extremely easy to generate (such as those associated to the full permutation group ) will be worth only a negligible score (as every contestant will be able to submit a polynomial for that pair), while pairs which are difficult to locate a polynomial for will be worth more points.

For this competition, the unrestricted use of any sort of computational tool, including AI, to locate the polynomials, are expressly permitted; this first stage of the competition is a “black box” challenge where we are not directly interested in obtaining insights as to how the polynomials are located, but the sole objective is to resolve as much of the inverse Galois challenge as possible. As such, the notorious uninterpretability of modern AI is not a concern for this stage. However, we will encourage contestants to share techniques with each other in order to cover more ground, through the Zulip channel for this challenge.

This first stage of the competition will close on August 15. After this, we will launch a second stage (with details to be determined) to focus on some set of candidate Galois groups that could not be resolved by the first stage. Here we envisage a more collaborative, conceptual, and human-driven effort in which the role of AI tools may be more secondary, and with more of a focus on creating mathematically interesting results rather than simply trying to saturate a given benchmark. Stay tuned for more details!

Kategorije: Matematički blogovi

On the proposed rule changes to the administration of federal grants

Sri, 2026-06-10 01:45

The United States Office of Management and Budget (OMB) has proposed a vast and radical set of rule changes to how federal grants from all funding agencies are administered. (A summary of the key changes, by a former Senior Program Officer at the National Institutes for Health, can be found here.) This is no mere tinkering at the edges of existing policy; many basic principles, such as the central role of peer review in grant-making decisions, are seriously compromised by the proposed rules, while the administrative burden of complying with grant rules are significantly increased, and hamstring the ability of funded scientists to react to new developments and forge new collaborations.

There is much to discuss in these proposals; see for instance this post by Karen Saxe (vice president for Government Relations at the American Mathematical Society), this news item on the response from the astronomy community, this op-ed from Ars Technica, this article from the New York Times, this article from Science, or this story from CNN. I will focus here on just one of the impacts, regarding the need to maintain agility and flexibility in a competitive and rapidly changing environment.

Some types of research, particularly those closest to industrial or other real-world applications, can be planned in a predictable fashion, in which the timelines for hitting key milestones are clear, and schedules for events can be planned years in advance. However, basic research — of which pure mathematics is a quintessential example — expects (almost by definition) to discover previously unknown directions and connections that cannot be predicted perfectly at the time a research project is proposed. Many of the most striking breakthroughs in such subjects come from uncovering such expected developments and rapidly capitalizing on them – for instance, by quickly organizing seminars, workshops, or conferences on a suddenly “hot” topic.

To give just one example of this sort of serendipitous discovery, a significant portion of the foundational theory of compressed sensing was initiated from a chance meeting in 2004 between myself, Emmanuel Candes (a statistician) and Justin Romberg (an electrical engineer) at a program at the Institute for Pure and Applied Mathematics (IPAM) on multiscale geometry. This theory – has led to notable accelerations and other improvements to a range of technologies, from MRI scans to radio interferometry to electron microscopy. The three of us, as well as the IPAM program we participated in, were all funded by grants from the National Science Foundation (NSF), but the extraordinarily fruitful collaboration was not fully anticipated in any of the proposals. (Disclosure: I now serve as director of special projects at IPAM.)

This is the type of fortuitous interaction that would be severely impacted by the proposed rule changes. Consider for instance Section 200.432 of the Code of Federal Regulations, which concerns the use of grant funds to support conference costs:

A conference means an event whose primary purpose is to disseminate technical information beyond the recipient or subrecipient and is necessary and reasonable for successful performance under the Federal award. Allowable conference costs may include the rental of facilities, speakers’ fees, attendance fees, costs of meals and refreshments, local transportation, and other items incidental to such conferences unless further restricted by the terms and conditions of the Federal award.

As just one of many significant rule changes proposed is the following addendum to the above text:

OMB proposes to expand § 200.432 to add a requirement that costs for attending conferences are allowable only if participation in the conference is expressly approved by the agency and included in the terms and conditions of the award. The revision would clarify that recipients are not authorized to attend conferences using Federal funds that do not serve to advance program outcomes.

This rule change would limit conference activity support to pre-approved plans that followed the scheduled objectives in the original proposal, which is written some time before the research takes place. However, it is the nature of novel research (particularly in fundamental sciences such as mathematics) to have serendipitous opportunities emerge that were not anticipated in the original grant proposal, such as an unexpected and exciting new connection between the problem one was initially studying, and another subfield of math or science that had previously been thought to be unrelated. Being able to react quickly to such developments, either by attending or organizing an event around them, or by inviting key researchers to visit, is essential to keep up with such breakthroughs. Requiring bureaucratic pre-approval in these circumstances would significantly hinder the ability for funded scientists to competitively take advantage of these opportunities.

An illustrative example would be the 2011 IPAM program on Navigating Chemical Compound Spaces. The premise sounded like a pie-in-the-sky idea: to develop computational tools to be able to somehow travel through the almost infinite space of all possible chemical compounds in the search for a compound we need — be it to create a novel drug, a better solar cell, or stronger glass. At the time, even with projected advances in computer power, accurate prediction of chemical properties of materials was seen as a distant dream. Simulating a simple protein for even a few milliseconds with existing methods would require weeks of time and an astronomical energy budget. In addition to experts on computational mathematics and materials science, the program involved a group of people who worked in a then obscure subject called machine learning (whose practical applications at the time involved such feats as deciphering human-written zip codes). Attending such a program might be regarded as out of scope for many material scientists. Yet the outcome of the program was the realization that machine learning methods could be used to learn and model the forces that govern electronic structure, molecular interactions, and ultimately determine chemical properties of materials through much faster and efficient computation. This idea was incredibly fruitful and literally changed the way electronic structure computations are done. AlphaFold has become Nobel prize winning work, and AI is being used to discover new drugs. Now, 15 years later, scientists are building labs to literally navigate the chemical compound space, assisted by AI, a descendant of old machine-learning computations approaches.

These examples also illustrate the time scales involved in fundamental research and in bringing it to the point where its application becomes an engineering endeavor. Fundamental research means playing the long game, leveraging the richness and unpredictability of scientific discovery. It is not something a private company would fund, but it is the engine behind the continued technological transformation whose fruits we all enjoy. It means taking risks, going in directions that are mere hunches and educated guesses, and going there only with the expectation to find new and surprising things. But it is necessary for technological progress.

Importantly, it is unrealistic to expect that every conference attendance will result in a major and unexpected connection or breakthrough. At times, there is a slow accumulation of knowledge that suddenly produces unexpected results. It is important to understand that fundamental research operates on scales of years and decades. The ultimate effect of attending a conference cannot always be known in advance, making the pre-approval process difficult to manage. This brings in a related point: the risk-averse nature of the proposed rules. We all know that making breakthroughs requires risk-taking; behind every successful project stand several that failed. Sometimes, communication of what failed is as useful (or more!) as communication of what succeeded, and this kind of information gets shared in informal settings at workshops and conferences.

The willingness to take risks and move in unexpected directions has always been a particular strength of this country, both in science and elsewhere, as exemplified for instance by the Defence Advanced Research Projects Agency (DARPA)’s willingness to experiment with emerging technologies such as the internet, GPS systems, or high-energy lasers, long before they could be proven to be viable. The additional regulatory burdens of these proposed rule changes would cripple this capability and set back the nation’s scientific competitiveness and leadership with the technologies of the future. I encourage all stakeholders (whether individuals or organizations) to submit public comments on the proposal on the OMB site (the public comment period extends until July 13). You can also submit through the Stand Up for Science site.

(Thanks to Kevin Klowden and Dima Shylakhtenko for feedback on an initial version of this post.)

Kategorije: Matematički blogovi

Modular Arithmetic Challenge

Pon, 2026-06-08 21:51

A couple months ago, Damek Davis and I launched the first mathematical challenge at the SAIR Foundation, aimed at “distilling” the ability to solve 22 million problems in universal algebra into a condensed form. Stage one of that challenge has now been completed, with several effective “cheat sheets” generated to guess the truth or falsity of these problems to reasonable accuracy; the leaderboard for that stage, with their winning cheatsheets can be found here. Stage two of that challenge, in which the competitors now have access to Python code as well as modest LLMs, and now need to generate Lean proofs or disproofs rather than just true-false answers, is currently underway.

With Alberto Alfarano, François Charton, Yongzheng Jia, Kristin Lauter, Cathy Li, and Emily Wenger, are launching a second challenge at SAIR, this time focused on seeing how efficiently neural networks can execute simple modular arithmetic operations. For this challenge we are focusing on the simple operation of modular multiplication: taking a prime modulus (up to about a thousand digits long) and two integers and between and , and computing the product . This is of course a solved problem using traditional computation, being a single line of code in any modern programming language. But it has been a fascinating toy problem in which to explore the basic capabilities of neural networks.

For instance, this problem has revealed the mysterious phenomenon of “grokking“. When one tries to train a neural network on this problem for small sizes of inputs , then initially one runs into the familiar problem of overfitting: the network learns to solve the problem for the training data too well, at the expense of performing well for held-out test data. However, if one continues training for sufficiently long periods of time, then the network can suddenly “grok” the problem and generalize surprisingly well to the test data. It appears that the neural network can suddenly “learn” powerful computational tricks, such as taking discrete logarithms, to find accurate and efficient ways to arrive at the correct answer.

This challenge is not about grokking, but instead about scaleability: we can create neural network models for modular multiplication that are extremely accurate for, say, 10-bit inputs, but they struggle at handling larger bit sizes. The competition is then simple: submit a neural network (with fixed weights) that can solve this task for larger input sizes with as high an accuracy as possible. Some pre-processing of the individual inputs , , is permitted (e.g., to convert these numbers into decimal or some other convenient representation), but other than that the main computation has to be neural in nature; one cannot simply run some Python code, for instance, to compute the multiplication. We are imposing limits on the size and allocated run time on the neural network, but otherwise we are deliberately being flexible in the architecture requirements, in order to encourage creative experimentation; in particular, we permit networks whose weights were arrived at by other means than the usual machine learning training process.

This is a relatively simple challenge to state, but we genuinely do not know what to expect from the competitor entries – is there a clever way to encode modular arithmetic for even quite large numbers into a medium size neural network, or is it going to be an exceptionally difficult task? Hopefully we will find out in a few months! Discussion of the ongoing challenge will take place on this Zulip.

Kategorije: Matematički blogovi

Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond

Pon, 2026-05-04 04:49

Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Liam Price, Jibran Iqbal Shah, Quanyu Tang, and I have just uploaded to the arXiv our paper Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond. This paper (which is a work in progress) represents our efforts to digest and document the recent flurry of developments around the following problem of Erdős, Sárközy, and Szemerédi on primitive sets:

Conjecture 1 (Erdős problem #1196) Suppose that is a primitive set of integers, which means that no element of divides another. Then

as .

One can show that the upper bound of is best possible up to the error by taking to be the set of products of primes for some suitable parameter . This was one of the most well-known open problems in the study of primitive sets, and had attracted some number of partial results (for instance, Lichtman was able to show the upper bound of ). It was thus notable that this problem was first solved by an autonomous AI query (by the fifth author) a few weeks ago. This solution introduced a proof technique – based on Markov chains in the divisibility poset – which in retrospect is very natural for controlling primitive sets, but which had not been explicitly used in previous literature, though in retrospect many of the arguments in that literature involved a specific Markov chain which we call the downwards Mertens chain. The proof instead revolved around a different Markov chain, which we call the downwards von Mangoldt chain, which manages to neatly avoid the “” type losses in the previous Mertens-based arguments, and resolve Conjecture 1. In this paper we develop the Markov chain approach more systematically, and show that it settles several further conjectures concerning primitive sets, and also provides simpler proofs of some previous results in the literature. More precisely, in addition to Conjecture 1, we establish the following:

Theorem 2 (Erdős primitive set conjecture, #164) For any primitive consisting of numbers greater than ,

Theorem 3 (Odd Banks–Martin) Let and suppose is a primitive set consisting of odd numbers with at most prime factors. Then

where denotes the primes appearing as factors of elements of , and is the collection of products of primes from .

Theorem 4 ( is Erdős-strong) If is a primitive set consisting of even numbers, then

Theorem 5 (Ahlswede–Khachatrian–Sárközy) If is a primitive set, then

whenever .

Theorem 6 (Erdős–Sárközy–Szemerédi, #1217) Let be such that the upper doubly logarithmic density

is positive. Then there exists a strictly increasing infinite divisibility chain

in such that

Theorem 2 and Theorem 5 had been previously established by Lichtman and Ahlswede–Khachatrian–Sárközy respectively, but the Markov chain formalism gives shorter (and more unified) proofs of both. Theorems 3, 4, 6 were open conjectures that can now be settled by this method. These results were obtained with varying levels of AI involvement, ranging from completely autonomous AI queries to traditional pen-and-paper calculations, to various hybrid approaches (for instance, with humans suggesting key inequalities that could then be rapidly tested numerically or even proved by various AI tools).

— 1. Chain/antichain duality and Markov chains —

I’ll now discuss the basic method of proof and try to motivate the main ideas, which have become much clearer in retrospect. Primitive sets can be viewed as antichains in the divisibility poset , in which the partial ordering is given by the divisibility relation . So, one can pose the following more abstract question: given a general poset and a weight function , what is the maximal value of as ranges over all antichains in ?

One can attack this problem using the well known duality between antichains and chains (totally ordered subsets of ): every antichain and chain can meet in at most one point, thus one has

for any chain and any antichain . In particular, if one has a measure on the space of all chains (viewed as a compact subspace of the power set of , equipped with the product topology) with the property that

for all , then by integrating the previous inequality against and using Tonelli’s theorem one would obtain the upper bound

In fact this duality is completely tight:

Proposition 7 (Chain/antichain duality) Let be a poset, let be a weight function, and let . Then the following are equivalent:

(i) for all antichains .
(ii) There exists a measure of total mass at most on the space of chains, such that (1) holds for all .

Proof: We have already indicated how (ii) implies (i). Now we need to show that (i) implies (ii). A standard compactness argument allows us to reduce to the case when is finite. If (i) holds, then we also have

for all in the Stanley chain polytope, defined as the convex hull of the indicator functions of antichains. By a classic result of Stanley, this polytope can also be defined as the space of all obeying the inequalities

and

Applying linear programming duality (or the Farkas lemma), we conclude that the inequality (2) must be a non-negative linear combination of the inequalities (3), (4) (as well as the trivial inequality ). Equivalently, we can find non-negative weights for each chain such that

and

for all . The claim follows by viewing as a measure on the space of chains.

Thus, the “universal” problem of obtaining a uniform upper bound on for all antichains is replaced with the equivalent “existential” dual problem of exhibiting a single measure on chains of controlled mass, which hits each element of the poset with a mass of at least the original weight . Thus, such problems are now reduced to that of finding a sufficiently clever construction of such a measure . If the mass of was normalized to equal , this becomes a probability problem: find a random chain process in the poset that hits each element of the poset with a sufficiently high probability. (Though in our paper we found it more convenient for technical reasons to not normalize the measure, and allow the mass to take values other than .)

It turns out (in a manner that was not explicitly appreciated in past literature) that particularly good choices of random chain to use here can come from Markov chains. (Here, the term “chain” is now being used in two different ways, but fortunately the order-theoretic concept of a chain and the Markov process-theoretic concept of a chain will be quite compatible in this discussion.) There will be two types of Markov chains on the poset that will be relevant: downwards Markov chains and upwards Markov chains. Here is our notation for a downwards Markov chain:

Definition 8 (Downwards Markov chain) Let be a poset, and suppose we designate some subset of to be the “absorbing states” (in practice these will be the minimal elements of , although they do not have to be). A downwards Markov chain on with absorbing states is a collection of transition probabilities for obeying the following axioms:

(i) , with equality unless and , or if and .
(ii) For any , one has .

(Thus for instance one has for any absorbing state .)

Given such a downwards Markov chain and an initial state , one can generate a random decreasing sequence by having each transition to with probability after conditioning on the past history of the chain. This sequence will (almost surely) be strictly decreasing until it hits an absorbing state, in which case it stays there forever, although if the descending chain condition is not satisfied it is also possible for the sequence to be strictly decreasing indefinitely. We let denote the law of this decreasing sequence. This construction is already enough to recover the Lubell-Yamomoto-Meshalkin (LYM) inequality:

Theorem 9 (LYM inequality) If is the power set of with the inclusion partial ordering, and is an antichain in this poset (i.e., a Sperner family), then

Proof: We introduce the downwards Markov chain with absorbing state in which each non-empty subset of of some cardinality transitions to a -element subset chosen uniformly at random (i.e., with probability of transitioning to each). If we start the descending sequence from the maximal element of , then one can easily check that each is hit with probability . Applying Proposition 7 with , , and , we obtain the claim.

In the above argument we fixed the initial location of the Markov chain, but more generally one can start with any source mass and work with the measure for the purposes of applying Proposition 7.

One can also define upwards Markov chains in exact analogy with downwards Markov chains (reversing the order in the poset), which now generate random increasing sequences rather than decreasing sequences. There is a useful adjoint construction that can convert a downwards Markov chain into an upwards Markov chain: if we have a positive weight which is invariant under the chain in the sense that

for any , then we can define an adjoint upwards Markov chain (with no absorbing state) by the formula

for any in . More generally, if is merely sub-invariant in the sense that

for all , one can still construct an adjoint upwards chain as before, but now one must also add an additional absorbing maximal state to ensure that the transition probabilities still sum to one.

A downwards or upwards Markov chain, when equipped with an invariant or sub-invariant measure, also induces a flow network on the poset, in which an edge from to is assigned a flow capacity of

One can rewrite the Markov chain arguments in the paper in terms of such flow networks, in which case the arguments often boil down to an application of the discrete divergence theorem, giving very short proofs of many of the above results; see the paper for more discussion. However, we chose to focus more on the Markov chain approach in our presentation, as this formalism is also natural and could potentially be more flexible for further applications.

— 2. The Mertens and von Mangoldt chains —

For the purpose of analyzing primitive sets, there are two downward chains on the natural numbers (with absorbing state )that, in retrospect, are particularly natural to use:

The downwards Mertens chain, in which each transitions deterministically to , where is the largest prime factor of ; and
the downwards von Mangoldt chain, in which each transitions to with probability for each dividing , with the von Mangoldt function.

The von Mangoldt weight is a natural choice here thanks to the fundamental identity

which encodes the fundamental theorem of arithmetic. The two chains are similar in many way to each other: the von Mangoldt process favors the division by the largest prime factor, but does not require it.

The Mertens chain generates deterministic downward divisibility chains

starting from a product of primes , and as such this process was implicit in much of the previous literature on primitive sets. However, it does not quite interact well with the weight, which is not invariant or subinvariant with respect to this process. Intuitively, the process of dividing by tends to increasingly select for numbers for which is smaller than expected. Instead, the natural invariant measure for this chain is the Mertens weight

the verification that this is indeed an invariant weight is a nice exercise in telescoping series. Taking the adjoint of the downwards Mertens chain with respect to this weight and running that chain from gives the upwards Mertens divisibility chain

in which each transitions to for some prime with probability . A routine induction shows that each is hit by this chain with a probability of ; this for instance gives a weak version

of Theorem 1, and similarly for the other results discussed above.

The key innovation (which was uncovered by the AI-assisted proofs, though not quite in the notation and framework presented here) is to switch to the von Mangoldt chain, which removes the bias towards numbers whose largest prime factor is small. Indeed, the weight now turns out to be sub-invariant (after removing ) under this chain, and there is a modification

of this weight which turns out to be perfectly invariant. (We have an interpretation of this formula in terms of a zeta process that couples together various zeta distributions into a continuous divisibility chain; see the paper for further details.) Taking the adjoint with respect to this weight (or with the original weight ) can eliminate the loss in the previous argument, and give one of the proofs of Theorem 1 recorded in our paper (there are several other variants of this method that we also present).

One slight defect of the von Mangoldt chain, as compared to the Mertens chain, is that it can “jump over” primitive sets (such as the set of products of primes) due to the fact that it will sometimes multiply or divide by a power of a prime rather than a prime itself. This turns out to be a technical difficulty for many of our applications, resulting in a need to make various small ad hoc modifications to the von Mangoldt chain to eliminate this type of jump.

In order to establish some crucial sub-invariance properties, it turns out (after some standard manipulations) to be useful to obtain good bounds on the negative log-derivative

of the Riemann zeta function in the region . Here it turns out that there is a clean (and rather efficient) upper bound

which is equivalent to the non-decreasing nature of the Dirichlet eta function

in this region. There are many proofs of this fact in the literature, but I would like to record a particularly cute proof, that is in the spirit of other arguments in the paper: one can interpret probabilistically as

where is a gamma random variable of shape and scale . Because the sum of independent copies of and has the distribution of , one can couple together all the so that they are increasing in , at which point the claim follows.

— 3. Further directions —

Will Sawin and Ofir Gorodetsky have obtained analogues of several of the above results for function field models or permutation models respectively; we briefly discuss these in the paper, although we do not plan to cover these models in depth. We also note another recent use of this technique to solve a separate Erdős problem (#858) relating to antichains in a variant of the divisibility poset.

The zeta process that we have discovered hints at an emerging theory of the “developmental anatomy of integers”, which differs from the existing topic of anatomy of integers in that it views a large integer (and its prime factor “organs”) not as a static entity, but rather as an evolving process in which primes (or powers of primes, in the case of the von Mangoldt process) are added or removed to the integer over time. With this perspective, primitive sets can be viewed as singular moments in such a developmental process, which are only encountered at most once in the life cycle of a given integer. It seems of interest to study this developmental perspective further.

The paper is currently a work in progress; we have released an early version due to the public interest in this problem. We plan to explore some further applications, and also to formalize more of the above results in Lean (currently two of the six main theorems are formalized), before submitting the paper for publication. The situation here highlights a distinction I have recently made between three components of the problem solving process in mathematics, namely proof generation, proof verification, and proof digestion. In this particular case, the first two steps were extremely rapid due to modern AI tools; however, properly digesting the AI-generated proofs into a coherent exposition that places the arguments in context with both past literature and future directions remains a slower process that requires expert human attention.

Kategorije: Matematički blogovi

math.e

Secondary links

Novi brojevi časopisa math.e

Kalendar

Kolovoz

Matematički blogovi

Sponzori

Terrence Tao

A digestion of the Jacobian conjecture counterexample

Two more apps: visualizing the zeta process and the motions of the heavens

Visualizing the Gilbreath expectation sequence

Call for long programs, workshops, and summer schools at IPAM

A paper diagram visualizer

A random variable visualizer

Old and new apps, via modern coding agents

Gilbreath’s conjecture: a Cramér random model and a deterministic analysis

A digestion of unit distance constructions

Third SAIR competition: inverse Galois challenge

On the proposed rule changes to the administration of federal grants

Modular Arithmetic Challenge

Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond