this post was submitted on 13 May 2024
75 points (100.0% liked)
Technology
37727 readers
479 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
AI in web search is not going anywhere. Deal with it.
This force is unstoppable, whether you like it or not. So you better spend your time adapting.
They already ruined web search with SEO. Now it just won't be worth searching for websites at all. We can either accept whatever nonsense the syntax generator spits out, untethered from fact, or we can stop looking altogether.
That's what you mean by adapt, right? Accept not having access to real information ever again?
I’m not using “AI” in web searches no matter how much any VC bro’s golden parachute depends on it, sorry. Refusing to partake or even using tools to filter out LLM trash are perfectly fine ways to adapt to search engines leaning on AI hype to try to convince you that their inability to combat SEO spam is good, actually.
I can believe that it won't happen in 2024.
I am pretty confident that in the long run, pretty much everyone is gonna wind up there, though. Like, part of the time spent searching is identifying information on the page and combining from multiple sources. Having the computer do that is gonna be faster than a human.
There are gonna be problems, like attributability of the original source, poisoning AIs via getting malicious information into their training data, citing the material yourself, and so forth. But I don't think that those are gonna be insurmountable.
It's actually kind of interesting how much using something like an LLM looks like Project Babel in the cyberpunk novel Snow Crash. The AI there was very explicit that it didn't have reasoning capability, could just take natural-language queries, find information, combine it, and produce a human-format answer. But it couldn't make judgement calls or do independent reasoning, and regularly rejected queries that required that.
Though that was intended as an academic tool, not something for the masses, and it was excellent at citing sources, which the existing LLM-based systems are awful at.
I'd like to share your optimism, but what you suggest leaving us to "deal with" isn't "AI" (which has been present in web search for decades as increasingly clever summarization techniques...) but LLMs, a very specific and especially inscrutable class of AI which has been designed for "sounding convincing", without care for correctness or truthfulness. Effectively, more humans' time will be wasted reading invented or counterfeit stories (with no easy way to tell); first-hand information will be harder to source and acknowledge by being increasingly diluted into the AI-generated noise.
I also haven't seen any practical advantage to using LLM prompts vs. traditional search engines in the general case: you end up typing more, for the sake of "babysitting" the LLM, and get more to read as a result (which is, again, aggravated by the fact that you are now given a single source/one-sided view on the matter, without citation, reference nor reproducible step to this conclusion).
Last but not least, LLMs are an environmental disaster in the making, the computational cost is enormous (in new hardware and electricity), and we are at a point where all companies partaking in this new gold rush are selling us a solution in need of a problem, every one of them having to justify the expenditure (so far, none is making a profit out of it, which is the first step towards offsetting the incurred pollution).
I think that I'd put it in a slightly less-loaded way, and say that an LLM just produces content that has similar properties to its training content.
The problem is real. Frankly, while I think that there are a lot of things that existing LLM systems are surprisingly good at, I am not at all sure that replacing search engines will be it (though I am confident that in the long run, some form of AI system will be).
What you can't do with systems like the ones today is to take one data source and another data source that have conflicting information and then have the LLM-using AI create a "deep understanding" of each and then evaluate which is more-likely truthful in the context of other things that have been accepted as true. Humans do something like that (and the human approach isn't infallible either, though I'd call it a lot more capable).
But that doesn't mean that you can't use heuristics for estimating the accuracy of data and that might be enough to solve a lot of problems. Like, I might decide that 4Chan should maybe have less-weight as a solution, or text that ranks highly on a "sarcastic" sentiment analysis program should have less weight. And I can train the AI to learn such weightings based on human scoring of the text that it generates.
Also, I'm pretty sure that an LLM-based system could attach a "confidence rating" to text it outputs, and that might also solve a lot of issues.
I'm currently wondering what their plans are for updating these LLMs.
Who wants to create the content to feed these machines without a recognition, retribution or a perceived act of 'good'? If I were to maintain a blog with a particular midly but important obscure topic, would I devote the time to have ChatGPT or Copilot make a summary?
Now, the LLMs need to ingest a lot more than 'one blog'... If someome knows, please let me know.
I doubt this crazy effort with such resource consumption is to create a snapshot of what the internet was in the 2020s.
Yeah, but you can disagree with the way google does it and use alternatives. One of those alternatives could be a non-AI alternative.