antifuchs

joined 7 months ago
 

Got the pointer to this from Allison Parrish who says it better than I could:

it's a very compelling paper, with a super clever methodology, and (i'm paraphrasing/extrapolating) shows that "alignment" strategies like RLHF only work to ensure that it never seems like a white person is saying something overtly racist, rather than addressing the actual prejudice baked into the model.

 

School student tells AI to put 20 other students’ faces on nude pictures, shares them in chat; it takes months for anyone including the school administrators to act because of some extremely, uh, dubious loophole.

If someone does that in photoshop, it’s a crime; if they do it in AI pretending to be photoshop, it’s somehow not. Gotta love this legal system’s focus on minor technicalities rather than the harm done.