Old Problem About Mathematical Curves Falls to Young Couple

plaguepilled · on Aug 28, 2022

The relevant journal article is pretty long - great choice by Quanta to cover this article.

Here's a link to the pre-print: https://arxiv.org/abs/2201.09445

artintechni · on Aug 29, 2022

"Long" but piecemeal. Which is a result of how they approached the problem.

They defined every possible curve in infinite dimensions based on 5 characteristics, essentially drawing the problem from an infinite space to an abstracted 5 dimensional space. Think: all black holes can be defined, so far as we understand them today, in an abstracted 3 dimensional space, that is [mass, electric charge, angular momentum](i)

Then started proving the theorem for subsets of that 5-d space. Think: proving a theorem for infinitely many integers by first proving for 0, then all even integers, then all odd integers, then proving for a small handful of odd integers that your odd integer proof fail on, then proving that negatives can be modified to the positives you have already proved.

It's really great seeing this practical approach laid bare.

Also, at the bottom, the paper includes code for "Section 11: Most of the Sporadic Cases", so if you're unfamiliar with the math but are familiar with standard python then you both gain further understanding by seeing the code process, but also by comparing the code to the included corresponding, highlighted as comments, theoretical proofs' propositions.

For example, from the paper(ii):

    def can_induct(d, g, r, l, m):
        # Proposition 8.4
        if d >= g + 2 * r - 1 and good(d - (r - 1), g, r, l, m):
            return True

(i) https://en.wikipedia.org/wiki/No-hair_theorem

(ii) https://arxiv.org/pdf/2201.09445.pdf

p1esk · on Aug 28, 2022

I'm wondering if this type of research can be applied to deep learning. SGD is a very slow and inefficient way to train models, can we find another way? A set of datapoints maps to some unknown curve (loss surface) for a particular model architecture, where each point is a set of specific weight values. Can we come up with a better way to find the minima (points with the lowest loss) given a few random (or specifically chosen) points? Can we interpolate the curve?

If we consider a specific example, say GPT-3 training, and look at a dataset as a collection of paths on a curve (in a word embeddings space, so perhaps it's a graph that could be interpolated into a curve), then the training is trying to map a bunch of these paths (word sequences) to a single downward path somewhere on the loss surface. Currently, this process is random - we randomly sample paths on the dataset graph, trying to build a path of small steps on the loss surface. But what if we could analyze the dataset as a whole, and extract some properties that would constrain the search in the parameter space? Or at least choose the training sequences more carefully to improve learning efficiency.

beecafe · on Aug 29, 2022

You're talking about second order optimization. The full-fat version is too slow (the Hessian, the matrix of second derivatives, is huge for DL models), the optimized approximations don't seem to provide a meaningful benefit over first order methods like SGD (or more commonly, Adam).

A more promising path is to regularize the loss function so that SGD can optimize it quickly, rather than switching to a second order method.

p1esk · on Sept 1, 2022

You're talking about second order optimization

No. I'm asking for a new kind of math for ML, inspired by the paper in this post. I would like to treat a dataset as a curve, which maps to another curve (loss landscape), under the model architecture constraints, and then find the minima on that landscape in one shot. There might be some kind of interpolation possible which has nothing to do with SGD optimization (of any order).

When I learn something new, for example a difficult topic like functional programming, or signal processing, the process feels nothing like SGD. At first I gather some background information, motivation, goals, etc. Then I unpack the concept/formula, and try to understand how it works, how the pieces fit together. Could be bottom up, or top-down. The information is accumulated until it reaches critical mass, and then boom - I get it. That stage happens quickly, it's an "aha" moment. This learning process involves both statistical pattern matching, and some other mechanism, perhaps referring to a rule based system like a decision tree, or accessing some facts in my long term memory database. I have no idea how a brain does it, but clearly I don't need a thousand examples to get Fourier transform, or monads, or whatever it is I'm learning about. Often just one example is sufficient to form new patterns and rules in my mind. I'm guessing this is because we are not randomly probing the loss landscape in the dark. We might be interpolating it, filling in the blanks. I do, however, need a thousand examples/repetitions to learn a new motor skill, like playing tennis or a piano, or even learning a new language - training my ear for many hours until I start recognizing the right sound patterns, so there are types of learning I do which might be similar to SGD. But most mental concepts are not learned like that.

dilawar · on Aug 28, 2022

The paper has a python listing at the end. Nice to see computer programs becoming part of pure math papers.

pfortuny · on Aug 28, 2022

Well, it is enumerative geometry: counting things. One would expect (a priori, not necessarily) that any method to give an answer to the problem can be used to count. That is, it is programmable.

Not trying to argue, just to specify the type of result.

j7ake · on Aug 28, 2022

Quanta magazine has consistently great content. What’s their business model? Can I subscribe to a print magazine delivered to me periodically ?

hh3k0 · on Aug 28, 2022

> At Quanta Magazine, scientific accuracy is every bit as important as telling a good story. Since Quanta is a nonprofit foundation-funded publication, all of its resources go toward producing responsible, freely accessible journalism that is meticulously researched, reported, edited, copy-edited and fact-checked. And our editorial independence ensures the impartiality of our science coverage — our articles do not reflect or represent the views of the Simons Foundation. All editorial decisions, including which research or researchers to cover, are made by Quanta’s staff reporting to the editor in chief; editorial content is not reviewed by anyone outside of the news team prior to publication; Quanta has no involvement in any of the Simons Foundation’s grant-giving or research efforts; and researchers who receive funding from the foundation do not receive preferential treatment. The decision to cover a particular researcher or research result is made solely on editorial grounds in service of our readers.

https://www.quantamagazine.org/about/

ironSkillet · on Aug 28, 2022

As someone else pointed out, their business model is having a hands off billionaire patron who supports scientific endeavors, including education.

HighlandSpring · on Aug 27, 2022

Cute!!

ed25519FUUU · on Aug 28, 2022

[flagged]

julianeon · on Aug 28, 2022

But there's a risk that their kids won't care about math or won't amount to anything notable in STEM, whereas these 2 are proven to be effective at mathematicians.

You know the debate about 10x engineers - and how many people weigh in saying they are real, that 10x may even understate it? Here you have to 10x (minimum) mathematicians. You are suggesting we give that up, that they sacrifice the quiet time mathematicians need to do great work, and devote themselves to raising kids, on the off chance that (some of) their children may contribute more than they will.

Imagine Steve Jobs having his first success (incorporating Apple, selling a few units) or Bill Gates (selling some units of BASIC and random software, how Microsoft started) - then quitting to raise a bunch of kids. Is that a good trade? What are their kids up to today - and could we expect much better even if the had 10 times as many? Would you really expect those kids to accomplish more than Jobs and Gates did?

When put this way, it obviously sounds like a bad trade.

quickthrower2 · on Aug 28, 2022

Both the 15 and 0 kids arguments miss the point. It is their choice. They don’t owe either new people or mathematics to society.

FabHK · on Aug 28, 2022

True, but government can still try to nudge behaviour.

Singapore famously established the social development unit (SDU) (nicknamed "Single, Desperate and Ugly") to encourage graduates to have more children, by organising speed dating events, BBQs, cruises etc. They also have a "Baby Bonus Scheme" [0].

"SDN’s long term goal has always been to help singles realise their marriage aspirations through equipping singles with relationship skills and creating a vibrant dating scene for singles to interact." [1]

https://www.theguardian.com/world/2003/apr/10/gender.uk

https://eresources.nlb.gov.sg/history/events/3c520e6c-dc34-4...

https://www.msf.gov.sg/policies/Marriages/Pages/Finding-a-Pa...

[0] https://www.msf.gov.sg/policies/strong-and-stable-families/s...

[1] https://www.msf.gov.sg/media-room/Pages/Singles-dating-media...

ptsneves · on Aug 28, 2022

Governments nudge because if there is not a continuous thread of culture and people to carry it over a long timespan than a lifetime, then it’s values and long term designs are lost. Immigrants are important but they need to be integrated in something the natives prepare and preserve, otherwise those immigrants would not have moved for something better.

The United States have successfully integrated immigrants of many peoples into a framework of life that is still recognisable as European enlightenment. The natives are now not necessarily Europeans but they have been living and reproducing long enough to be part of the fabric that keeps the original European enlightenment tradition in place.

benreesman · on Aug 28, 2022

I didn't read the (flagged to hell) GP as implying an obligation. "I strongly encourage so and so to blah and blah" is about as mild as it gets on opinions about other's lives?

I wish that was the typical "get in other people's business" tonality.

ptsneves · on Aug 28, 2022

More and more I notice that people have a huge cognitive dissonance in accepting that all people were once children and that these people they know and esteem were once someone’s burden. I also find it somehow naive to believe we are somehow not bound by our biological husk. We enjoy a good and full life on the backs of our parent’s toil, and due to not remembering or not being in contact with families we can feel that life is great and immortal and it always was.

I like to think of the hard part of raising children as paying the debt we owe to our parents and ancestors. If not for their survival I would be able to say I am very glad and grateful to be alive and experience the wonders of the universe, wonders like math.

tartoran · on Aug 28, 2022

Kids are not some kind of start up scaling problem. Kids require a lot of resources and attention, no wonder parents opt for a smaller number which still is a challenge but a manageable one

q-big · on Aug 28, 2022

> Kids are not some kind of start up scaling problem. Kids require a lot of resources and attention

So do startups.

iratewizard · on Aug 28, 2022

When you have a lot of kids, the older ones take care of the younger ones. 8 isn't too different from 4

cauefcr · on Aug 28, 2022

And that's how you scale mental problems on the cheap, just double the kids again.

iratewizard · on Aug 28, 2022

The cheapest way to scale mental problems is to use the state to replace the father's role in a family. I know plenty of large families with perfectly well adjusted kids. I can't say I've met any well adjusted kids raised by a single mom.

labster · on Aug 28, 2022

[flagged]

FabHK · on Aug 28, 2022

"a big part of Larson’s approach [...] was to break a curve of interest into multiple curves, study their properties [...]"

paulpauper · on Aug 28, 2022

I think this is some much needed optimism about the higher ed and educational system. True, these people are exceptional, but I think the media only looks at the things that wrong with the educational system.

smallnamespace · on Aug 28, 2022

Most top STEM departments are very immigrant-heavy (sometimes more than 50%). It can be both true that US universities have top departments, while Americans are disadvantaged at getting in (by having weaker foundations, particularly in math).

A STEM professor once told me he needed a Masters in math before they would let him in. They doubted an American with just an undergrad degree could handle the technical challenge. This was almost 2 decades ago, since then the gap may have widened a bit.

pcrh · on Aug 28, 2022

Most places that recruit globally are likely to have a greater proportion of international members, especially if language and social factors are less important for success. So being immigrant-heavy doesn't necessarily indicate a relative failing of the US educational system.

EddySchauHai · on Aug 28, 2022

While PhDs without masters are common in Europe, universities may have doctoral training centers that have an extra years to act like a masters. My wife went straight from undergrad at Oxford and her PhD was four years instead of the usual three with the first year consisting of short courses & projects.

hikingsimulator · on Aug 28, 2022

The UK is very different from continental Europe. Your statement isn't quite valid.

The EU has had the BMD reform in the past where education systems were standardized to a bachelor-master-doctorate three-party system.

To get to a PhD, you need first a bachelor (the name usually varies depending on the country, e.g., it is called a license in France), then a master.

That's the very common pipeline there. Doing a PhD without a master is a rare thing.

EddySchauHai · on Aug 29, 2022

Huh, interesting. Most of my friends who have PhDs don't have masters and that's Europe-wide! The UK definitely is different though. I got onto a Masters in CS without a Bachelors and was discussing carrying on into a PhD which would have been pretty interesting to explain on my CV.

bawolff · on Aug 28, 2022

Usually people who say america is falling behind are talking about the average american, not the top of the top.

smegsicle · on Aug 28, 2022

some schools are really good, some are really bad

its a crab bucket situation