The other day I was trying to come up with an idea for a new side project. It was proving difficult. Things are pretty slow at work right now as we ramp up at the new organization (Flexport now, for those following at home), and I find I'm spending my days doing mostly menial tasks related to administering various data platform tooling, not exactly solving the most interesting, complex problems. Perhaps it's a function of my brain leaking out my ears professionally, but here's the "idea" I came up with. Maybe it's my subconscious surfacing some terrifying hybrid of Colbert's is potato bit mixed with Silicon Valley's dumbest app from the Pied Piper lads, but this is what I landed on. A 1c domain later and here we are, in all it's spuddy glory.
Right now it's a really basic, borderline broken image classifier that attempts to answer the eternal question: Is Potato? Upload or link an image and it will tell you, in somewhat uncertain terms most likely, whether the image is a potato or not. I plan to use this as a place to experiment and refresh my memory on deep learning architectures with a spot to deploy and iterate on them, as it's been a while since I've done anything in this space professionally. I also have some other ideas about applications in this ridiculous field that I want to try out, like crop yield prediction and a chatbot that spouts potato-related trivia. My high school buddy Joey Sultana would be so proud of me.
Here's a little bit more about the model itself.
I built it using Keras, as I find it to be a nice, high-level API for building network architectures. It's really (I cannot stress this enough) basic right now, accepting a 300x300 pixel input and passing it through the very simple architecture below. I have about 1,000 samples now, split about 50/50 potato vs. not-potato for training. There's another set of 60 I've used for validation and a handful of unseen images for testing. I obviously want to expand this set to include more samples, but the Bing image downloading library I used wasn't super reliable, and I ended up sourcing most of the not-potato samples from COCO anyway. Training is done using a really simple script running on my non-GPU-enabled machine using the Tensorflow backend. I plan to build a similar architecture in Pytorch at some point.
Again, I haven't exactly been super diligent here, as the concept is only a week old at this point. I tried to use the Flask/NGINX/UWSGI stack in Docker, but ran into a bunch of headaches with threading that meant the inference was mostly just hanging without giving me any useful exceptions to work with when debugging. Switching to Gunicorn seems to have solved those issues though and it seems a lot faster to boot. I'd actually forgotten how heinous using Docker on Windows was, but after a half hour of futzing about I was able to get things working. I also forgot how the most basic tier of DigitalOcean "droplets" only give you half a GB of RAM, which isn't even enough to install Tensorflow with pip, so I had to remember how to create a swapfile by following this guide.
Not gonna lie, it sucks. Honestly not surprising given the zero minutes I've spent trying to improve it so far. Seems to pick up the basic "vibe" of the potato, color, shape etc. but I think it's also over fitting to the white space surrounding some of the training images. In terms of metrics, it's bad. The smoothed validation precision is just ok at 0.82 and recall is completely in the toilet at 0.63, implying that it's identifying a whole bunch of false negatives (ie. potatoes that are being classified as not), for example this one.
Tensorboard plots for training over 30 epochs are below, with training metrics in red and validation in blue. As you can quite clearly see, it seems to be overfitting and not generalizing especially well.
Next steps might be to look at:
- More data. My training set is tiny.
- Hyperparameter tuning. I didn't actually bother to change anything from default yet, so definitely likely some gain to be had here, especially given the erratic nature of the validation loss.
- Bigger architecture. While I want this to be trainable on CPU, the architecture I'm using is stupidly simplistic.
- Intelligent cropping or segmentation. I haven't considered this but using these techniques will likely improve performance.
- Post-hoc probability calibration. This might help adjust the decision boundary to give us a more appropriate threshold based on validation data.
There's some tidy up work in the app to do. It's saving files everytime someone submits something so I need a job to clean those up. It's still kinda ugly so I'd like to fix the styling CSS (funsies). I'd also like to report out on performance and build in a feedback mechanism, which would likely necessitate connecting to a proper database of some kind.