this post was submitted on 28 Jan 2024
65 points (87.4% liked)

Technology

59378 readers
3617 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Google’s new video generation AI model Lumiere uses a new diffusion model called Space-Time-U-Net, or STUNet, that figures out where things are in a video (space) and how they simultaneously move and change (time). Ars Technica reports this method lets Lumiere create the video in one process instead of putting smaller still frames together.

Lumiere starts with creating a base frame from the prompt. Then, it uses the STUNet framework to begin approximating where objects within that frame will move to create more frames that flow into each other, creating the appearance of seamless motion. Lumiere also generates 80 frames compared to 25 frames from Stable Video Diffusion.

Beyond text-to-video generation, Lumiere will also allow for image-to-video generation, stylized generation, which lets users make videos in a specific style, cinemagraphs that animate only a portion of a video, and inpainting to mask out an area of the video to change the color or pattern.

Google’s Lumiere paper, though, noted that “there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use.” The paper’s authors didn’t explain how this can be achieved.

Synopsis excerpted from The Verge article.

top 7 comments
sorted by: hot top controversial new old
[–] [email protected] 26 points 9 months ago* (last edited 9 months ago)

Having used diffusion a bit for static images, I can only look forward to the eldrich horrors it will inevitably create.

[–] [email protected] 11 points 9 months ago (2 children)

Will never be usable by the public

[–] [email protected] 9 points 9 months ago (1 children)

It's still driving the state of the art forward, which will result in models that will be used by the public.

[–] [email protected] 3 points 9 months ago (1 children)

Right? Once the model and training methods are published in some journal, the only barrier becomes the hardware to use it.

Which, given like stable diffusion etc, is really a matter of VRAM. Have enough of that, and this should be possible

[–] [email protected] 2 points 9 months ago

Indeed. Often the hardest part of an invention is the discovery that a thing is actually possible. Even if nobody knows how it was done they can now justify throwing resources into figuring it out and know what results to keep an eye out for.

[–] [email protected] 5 points 9 months ago

It's almost like most of the time in history cutting edge tech tended to be unusable by the public until it matured enough to get businesses interested. Then they'd invest in a usability layer that was unimportant to the cutting edge research.

[–] [email protected] 2 points 9 months ago

Here is an alternative Piped link(s):

https://piped.video/f9ThAzZs32M?si=0c47ifUhmoQTRdcD

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.