RE: Chasing shadows: Is AI text detection a critical need or a fool's errand?
I find the entire LLM and AI landscape fascinating. I’ve already set up my own privateGPT which allows me to use an uncensored LLM and I’m now experimenting with training it with my own dataset. Which is going very badly.
GPT-4 (paid-for which I don’t have) allows people to upload their own training data - training data that can be generated using a web scraper of any website (all available for free on GitHub). This is where I can’t see any detection tool ever being reliable. You can even program it to “talk like (name)” which could be the person you’ve trained it to be.
As you suggest, it’s like trying to identify if maths homework used a calculator or not.
Even the idea of getting an identifiable “watermark” would be easily circumvented via privateGPTs.
It’s probably a very interesting area to research if you’re lucky enough to get paid to do it!
I tried setting up LLM Studio but it crushed my computer. One of these days, I'll modernize my hardware and try it again. The censorship, CYA, and scolding from big-tech's free implementations are insufferable.
To me this is the biggest tragedy of the way that many of Steem's high-powered curators are curating. We're missing out on a massive opportunity that we should have been perfectly positioned for. If curators were valuing posts correctly, Steem would be an ideal platform for AI training and implementation, which could massively increase the value of the tokens. Instead, they're flooding the database with noise that's basically worthless if you're training for things that appeal to humans --- or do anything useful for that matter.
Agreed. On the "detection" side and on the "adversarial attack" side.
I haven't tried LLM Studio yet - do you have the option of using your GPU for processing?
The PrivateGPT that I set up was much quicker with the GPU, even using my laptop. It was a pain to set up and took me pretty much a full day because of the GPU needing CUDA in my WSL environment. So tricky, that I deleted it all and gave up but tried a 2nd time which went more smoothly.
A very quick local GPT would be to use Ollama - there are a few uncensored models available and it's about 3 or 4 command lines to install and run.
That's perhaps reflective of the majority-community perspective that AI generated content is bad. Full stop. There are only a handful of us now who want to do something different - the herd appear to be happy with their diary games and engagement challenges and choose not to look (or think) beyond that. Maybe aspiring to be a curator or representative themselves, or run a community. Nothing beyond "the template".
I don't remember, and I've already uninstalled it. If I get motivated, maybe I'll reinstall it and check. My PCs are so old, I think I'll need to modernize, though.
Yeah, true. And, to be fair, AI right now is very easy to misuse here. It's a conundrum.