Install LocalAI and ensure it’s using acceleration. It’s one of the best solutions we have at the moment.
Are you sure you’re not running these small models off of CPU and no acceleration? Because I’m running these small models pretty quickly. Nearly instant responses using a NVIDIA titanXP from a gaming rig I built in 2017 ish.