AI agents will solve CAPTCHAS
i'm quitting my job over a $5 dollar bet.
Last week my friend and I engaged in a humourous brainstorm about what the role of humans will be in the agentic age. we wanted to see what tasks will still be done by humans when ai can essentially, do it all.
structurally, the conversation revolved around one of us naming something that humans (might) still do, and then we both would ponder if that's true or not.
it went something like this
we'll still create art friend: sure but artificial art will be indistinguishable from human art. true, true.
we'll still deliver news to other humans, so journalism will be ever-human. me: actually i've already built an ai agent that scrapes the internet and feeds me news daily, cuz i was tired of having to rely on biased legacy news & social media. it has ~40 daily active users (btw if the reader is interested -> https://www.theaitimes.net/ )
fuck.
me: what about , um , chefs! humans have cooked for each other since the beginning of time and we will neve.. friend: robots. me: huh friend: theyll cook better, faster, cheaper and more optimized. we'll be walking into the first robot-operated restaurant in no time. me: fuck.
after about a couple dozen such back n forths we both came to the conclusion that all humans will be good for, is solving CAPTCHAS
thats it.
this is what our caveman grandparents got laid for. so we could solve CAPTCHAS.
sad as it seems, after an hour of this back n forth we were ready to stop thinking about our uselessness and go out to the local brewery and be sad there. atleast humans will be able to drown our sorrows in liquid poison served with fresh lime. even if its served by robots.
WAIT.
friend: Saif, shut the f*ck up. Saif(me): are you thinking what im thinking friend: no im thinking of a corona. stop it. dont do it.
me: what if i trained an agent to solve captchas for us? friend: first of all thats not even possible, second the bar closes at 1am lets go.
but i knew it was possible. when my art teacher told me at 7 yrs old that you cant make rainbows indoors i taped glass prisms in my room to a flashlight , and proved him wrong.
gonna sound cliche but i love it when the world thinks something isnt possible. its just another chance for me to do it. if i fail atleast i tried.
so i said fuck the bar. im gonna train an ai agent to solve CAPTCHAS. if humanity's role is inevitably reducing to low standards, let it atleast be creative. i dont want my kids being captcha engineers no fuck that.
friend: dude captchas are literally made to detect bots and llms barely produce good writing in images, what makes you think theyll be able to understand funky letters.
bet? bet.
he bet $5 that I couldn't train an agent to solve CAPTCHAS in one night.
so i got to work.
idk how technical the reader of this essay will be so ill keep it as brief as possible.
at it's core, i only needed 2 things: thing #1: an agent that can learn from data thing #2: data
thing #1 is super simple. tons of public agents out there you can immediately start training. i used browser-use api key. a/b tested this with my own openAI api key (model: o3) , results were better with o3 so i settled on that.
thing #2 is hard. how do u even get data on how to solve captchas? like any human, my mind first went YOUTUBE VIDEOS!
what if there are some yt videos out there that show people solving hard captchas.
problem with this is a) why would there be such a video, the idea itself seems stupid. b) videos are heavy and will cost me more than $5 to even train lol so defeats the purpose.
what about some pdf in some corner of the internet from 2007 that has a bunch of solved captchas? if i can find this lonesome corner of the internet i may be able to feed it to my model.
after looking around for hours i finally found a dataset of 1000s of solved captchas in kaggle.
downloadedd this dataset, fed it to my local model first and asked it to create a json that i could feed to my ultimate model as a guide on finding patterns within captchas. things like "the letter T is usually curved but J is straight and enlarged etc.
my local llama model was surpisingly good at this. it created a well structured json file with legit no hallucinations and tons of strong patterns. i took this json file along with the dataset i got from kaggle, and fed it to my agent.
prompt: you are an expert captcha solver. heres a json file based on this dataset that lists common patterns found in solved captchas along with a dataset of 8k solved captchas. these files should serve as your primary source of context when solving captchas.
then i ran it.
i told it to go to [ https://captcha.com/demos/features/captcha-demo.aspx ] and solve it.
lo and behold. it did it.
it solved the captcha.
IT SOLVED THE F*CKING CAPTCHA DO YOU UNDERSTAND HOW INSANE THIS IS.
easiest $5 of my life.
youtube vid if youre interested in watching it solve the captcha: https://youtu.be/x4QYho0pZU0
but my overthinking doesnt stop there. now that we've made humans absolutely useless, why am i even wasting time making my company millions of dollars when these are my last 2-3 yrs before agi makes us the permanent underclass.
i dont know what ill build. i dont know what my company will be called. shit idek how to raise money.
but day 2 in sf feels alot cooler than being stuck at my shitty 9-5 job writing code for a ceo.
idk if it was the right decision to move to sf.
i wanted to be around cool interesting ppl, likeminded or not, who are atleast willing to see the humour in our world. and do something about what's coming.
looking around this writing club, i think i made the right decision.
i think we'll be fine.
saif.build
