this post was submitted on 08 Dec 2024

459 points (94.6% liked)

Technology

60055 readers

3235 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

459

The GPT Era Is Already Ending (www.theatlantic.com)

submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

158 comments fedilink hide all child comments

If this is the way to superintelligence, it remains a bizarre one. “This is back to a million monkeys typing for a million years generating the works of Shakespeare,” Emily Bender told me. But OpenAI’s technology effectively crunches those years down to seconds. A company blog boasts that an o1 model scored better than most humans on a recent coding test that allowed participants to submit 50 possible solutions to each problem—but only when o1 was allowed 10,000 submissions instead. No human could come up with that many possibilities in a reasonable length of time, which is exactly the point. To OpenAI, unlimited time and resources are an advantage that its hardware-grounded models have over biology. Not even two weeks after the launch of the o1 preview, the start-up presented plans to build data centers that would each require the power generated by approximately five large nuclear reactors, enough for almost 3 million homes.

https://archive.is/xUJMG

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 68 points 2 weeks ago (4 children)

How is it useful to type millions of solutions out that are wrong to come up with the right one? That only works on a research project when youre searching for patterns. If you are trying to code, it needs to be right the first time every time it's run, especially if it's in a production environment.

[–] [email protected] 53 points 2 weeks ago (1 children)

It's not.

But lying lets them defraud more investors.

[–] [email protected] 4 points 2 weeks ago

Ding!

[–] [email protected] 6 points 2 weeks ago (2 children)

Well actually there's ways to automate quality assurance.

If a programmer reasonably knew that one of these 10,000 files was the "correct" code, they could pull out quality assurance tests and find that code pretty dang easily, all things considered.

Those tests would eliminate most of the 9,999 wrong ones, and then the QA person could look through the remaining ones by hand. Like a capcha for programming code.

The power usage still makes this a ridiculous solution.

[–] [email protected] 33 points 2 weeks ago

If you first have to write comprehensive unit/integration tests, then have a model spray code at them until it passes, that isn't useful. If you spend that much time writing perfect tests, you've already written probably twice the code of just the solution and reasonable tests.

Also you have an unmaintainable codebase that could be a hairball of different code snippets slapped together with dubious copyright.

Until they hit real AGI this is just fancy auto complete. With the hype they may dissuade a whole generation of software engineers picking a career today. If they don't actually make it to AGI it will take a long time to recover and humans who actually know how to fix AI slop will make bank.

[–] [email protected] 15 points 2 weeks ago (2 children)

That seems like an awful solution. Writing a QA test for every tiny thing I want to do is going to add far more work to the task. This would increase the workload, not shorten it.

[–] [email protected] 6 points 2 weeks ago (1 children)

We already have to do that as humans in many industries like automobile, aviation, medicine, etc.

We have several layers of tests:

Unit test
Component test
Integration / API test
Subsystem test
System test

On each level we test the code against the requirements and architecture documentation. It's a huge amount of work.

In automotive we have several standard processes which need to be followed during development like ASPICE and ISO26262:

[–] [email protected] 6 points 2 weeks ago (1 children)

I've worked in both automotive, and the aerospace industry. A unit test is not the same thing as creating a QA script to go through millions of lines of code generated by an AI. Thats such an asinine suggestion. Youve clearly not worked on any practical software application or you'd know this is utter hogwash.

[–] [email protected] 3 points 2 weeks ago (2 children)

I think you (or I) misunderstand something. You have a test for a small well defined unit like a C function. und let the AI generate code until the test passes. The unit test is binary, either it passes or not. The unit test only looks at the result after running the unit with different inputs, it does not "go through millions of lines of code".

And you keep doing that for every unit.

The writing of the code is a fairly mechanical thing at this point because the design has been done in detail before by the human.

[–] [email protected] 4 points 2 weeks ago

The unit test is binary, either it passes or not.

For that use case yes, but when you have unpredictable code, you would need to write way more just to do sanity checks for behaviour you haven’t even thought of.

As in, using AI might introduce waaay more edge cases.

[–] [email protected] 2 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

How often have you ever written a piece of code that is super well defined? I have very little guidance on what code look like and so when I start working on a project. This is the equivalent of the spherical chicken in a vacuum problem in physics classes. It's not a real case you'll ever see.

And in cases where it is a short well defined function, just write the function. You'll be done before the AI finishes.

[–] [email protected] 1 points 1 week ago (1 children)

This sounds pretty typical for a hobbyist project but is not the case in many industries, especially regulated ones. It is not uncommon to have engineers whose entire job is reading specifications and implementing them. In those cases, it’s often the case that you already have compliance tests that can be used as a starting point for your public interfaces. You’ll need to supplement those compliance tests with lower level tests specific to your implementation.

[–] [email protected] 2 points 1 week ago

Ironic, because I am an engineer. I've been coding for almost 15 years now.

[–] [email protected] 0 points 2 weeks ago

Many people write tests before writing code. This is common and called Test Driven Development. Having an AI bruteforce your unit tests is actually already the basis for a "programming language" that I saw on hackernews a week or so ago.

I despise most AI applications, and this is definitely one. However it's not some foreign concept impossible in reality:

https://wonderwhy-er.medium.com/ai-tdd-you-write-tests-ai-generates-code-c8ad41813c0a

[–] [email protected] 5 points 2 weeks ago

I do agree it's not realistic, but it can be done.

I have to assume the people that allow the AI to generate 10,000 answers expect that to be useful in some way, and am extrapolating on what basis they might have for that.

Unit tests would be it. QA can have a big back and forth with programming, usually. Unlike that, QA can just throw away a failed solution in this case, with no need to iterate on that case.

I mean, consider the quality of AI-generated answers. Most will fail with the most basic QA tools, reducing 10,000 to hundreds, maybe even just dozens of potential successes. While the QA phase becomes more extensive afterwards, its feasible.

All we need is... Oh right, several dedicated nuclear reactors.

The overall plan is ridiculous, overengineered, and solved by just hiring a developer or 2, but someone testing a bunch of submissions that are all wrong in different ways is in fact already in the skill set of people teaching computer science in college.

[–] [email protected] 2 points 2 weeks ago

Especially for programming, you definitely don't need to be right the first time and of course you should never run your code in a production environment for the first time. That would be absolutely reckless.

[–] [email protected] 0 points 2 weeks ago (1 children)

TDD, Test Driven Development. A human writes requirements, with help of the AI he/she derrives tests from the requirements. AI writes code until the tests don't fail.

[–] [email protected] 0 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Yeah, go ahead try that and see how it works out for you. Test driven development is one thing, having an AI try to write the correct code for you by blindly writing code is idiotic.

[–] [email protected] 1 points 1 week ago (1 children)

Why is it idiotic? Your tests will let you know if it is correct. Suppose I have 100 interface functions to implement, I let the AI write the boilerplate and implementations and get a 90% pass rate after a revision loops where errors are fed back into the LLM to fix. Then I spend a small amount of time sorting out the last 10%. This is a viable workflow today.

[–] [email protected] 0 points 1 week ago (1 children)

AI training takes forever. I dont think you realize how long an AI training actually takes. It's not a 5 minute exercise.

[–] [email protected] 1 points 1 week ago

Sure but the model is already trained. I’m not talking about using any sort of specialized model.