People paid to train AI are outsourcing their work… to AI

People paid to train AI are outsourcing their work… to | itkovian

No wonder some of them turn to tools like ChatGPT to maximize their earning potential. But how many? To find out, a team of researchers from the Swiss Federal Institute of Technology (EPFL) hired 44 people on the Amazon Mechanical Turk gig work platform to summarize 16 extracts from medical research papers. Then they analyzed their responses using an AI model they had trained themselves that looks for telltale signs of ChatGPT output, such as a lack of variety in word choice. They also mined the workers’ jokes in an attempt to figure out whether they had copied and pasted their responses, an indicator that they had generated their responses elsewhere.

They estimated that between 33% and 46% of workers had used AI models such as OpenAI’s ChatGPT. It’s a percentage that will likely grow even more as ChatGPT and other AI systems become more powerful and easily accessible, according to the authors of the study, which was shared on arXiv and has yet to be peer reviewed.

“I don’t think it’s the end of crowdsourcing platforms. It just changes the dynamics,” says Robert West, an assistant professor at EPFL who co-authored the study.

Using AI-generated data to train AI could introduce further errors into already error-prone models. Big language models regularly present false information as fact. If they generate faulty output that is in turn used to train other AI models, the errors can be absorbed by those models and amplified over time, making their origins increasingly difficult to figure out, says Ilia Shumailov, a junior computer science researcher at Oxford University, which was not involved in the project.

Even worse, there is no simple fix. “The problem is, when you use artificial data, you capture the errors from model misunderstandings and statistical errors,” he says. « You have to make sure your mistakes don’t distort the output of other models, and there’s no easy way to do that. »

The study highlights the need for new ways to verify whether the data was produced by humans or artificial intelligence. It also highlights one of the problems with the trend for tech companies to rely on gig workers to do the vital job of tidying up data fed to AI systems.

« I don’t think it will all come crashing down, » West says. « But I think the AI ​​community will need to closely investigate which tasks are more prone to being automated and work on ways to prevent that. »

Hi, I’m Samuel