Ask Lemmy

26707 readers

1425 users here now

A Fediverse community for open-ended, thought provoking questions

Please don't post about US Politics.

Rules: (interactive)

1) Be nice and; have fun

Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them

2) All posts must end with a '?'

This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?

3) No spam

Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.

4) NSFW is okay, within reason

Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected]. NSFW comments should be restricted to posts tagged [NSFW].

5) This is not a support community.

It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.

Reminder: The terms of service apply here too.

Partnered Communities:

Logo design credit goes to: tubbadu

founded 1 year ago

MODERATORS

[email protected]

LLM queries for personal pdf libraries? (sh.itjust.works)

submitted 5 months ago by [email protected] to c/[email protected]

16 comments fedilink hide all child comments

So perplexity can kind of weakly analyze the first few pages of small file size pdfs one at a time, but I'd love to have something that would allow me to upload several hundred research papers and textbooks that could then be analyzed for consensus and contradictions and give me more meaningful search results and summaries than keyword searching alone. Does anything like this exist in a fairly user friendly accessible format?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago) (1 children)

I don't think you can use Retrieval Augmented Genaration or vector databases for a task like that. At least not if you want to compare the whole papers and not just a single statement or fact. And that'd be what most tools are focused on. As far a I know the tools that are concerned with big PDF libraries are meant to retrieve specific information out of the library. Relevant to a specific question from the user. If your task is to go through the complete texts, it's not the right tool because it's made to only pick out chunks of text.

I'd say you need an LLM with a long context length, like 128k or way more, fit all the texts in and add your question. Or you come up with a clever agent. Make it summarize each paper individually or extract facts, then feed that result back and let it search for contradictions, or do a summary of the summaries.

(And I'm not sure if AI is up to the task anyways. Doing meta-studies is a really complex task, done by highly skilled professionals of a field. And it takes them months... I don't think current AI's performance is anywhere near that level. It's probably going to make something up instead of outputting anything that's related to reality.)

[–] [email protected] 2 points 5 months ago (1 children)

Check out Afforai. It's not perfect at all, but it is on track to do what I want.

[–] [email protected] 1 points 5 months ago

Ah, nice. Thanks for sharing.