Local Cricket Stats Processing
As of a couple of years ago, I play cricket in beautiful Victoria BC. The website for the Victoria District Cricket Association when I joined was a bit of a shambles, and last season I volunteered some time with a couple of other guys to help overhaul it. Buckle up, this is a post about building a website and doing data migrations. A fun time will be had by all!
The task itself involved moving away from an integrated website/database solution, run and supported (poorly) by one guy, and across to a better platform that allowed us a lot more flexibility, at a much lower cost. We had to choose a new scorecard database provider and build a new website around the functionality that it provided. Oh, and of course we'd need to migrate 12 years' worth of scorecard data from the previous format to the new structure.
The migration itself wasn't particularly difficult. The new database provider, Cricketstatz, actually did a lot of the legwork in migrating across the scorecards. They provided functionality to convert a bunch of popular formats to their in-house standard. With a little bit of Python scripting and the help of Selenium (which is magic, by the way), we were able to move all of the data across with minimal loss.
For migration of the website, we settled on Wix as the provider, given that it's one of the best products for non-technical folk going around. That process involved a bit of work (some of which is still ongoing) around design and customizing how the site would look. One aspect of this required us to do a bit of UI/UX work was around how people wanted to interact with the stats pages. This is where we started to run into headaches.
We have the (apparently) unique situation in our organization where the junior players are allowed to play for multiple senior clubs. In any given week, our U19 players may play for up to 3 teams. On the flipside, our U19 senior team, the Colts, allow up to 2 overage players to suit up for their team each week. This is for developmental purposes and to help the juniors learn from experienced cricketers from different teams (at least, that's what our PR group would have you believe). Unfortunately, our new scorecard and statistics provider doesn't easily let us query stats by player and team such that we can judge seasonal awards on a per-team basis. The Cricketstatz API's and canned reports couldn't easily support this requirement, and our existing stats pages were a bit of a shambles as a result. Cue our need to create a secondary stats database and API for this purpose.
I quickly sketched out a reasonably decoupled architecture for the system based on doing twice-daily data migrations from the Cricketstatz scorecard and player database into our secondary statistics database. See below for a horribly low-res version.
We're running a Docker container on a DigitalOcean droplet (VM) with a small, self-contained SQLite database that is updated every 6-12 hours from the main statistics database provided and managed by Cricketstatz. The data update process is written in Python leveraging a few useful libraries (BeautifulSoup, SQLAlchemy, requests, pandas) and implements a full ORM for interacting with the database. It pulls the appropriate reports from Cricketstatz and populates or updates records found in our stats database. It's run on a schedule using cron inside the container. On the API side it's super lightweight with a handful of methods for returning the bowling, batting and fielding stats per the SQLAlchemy models. It's written in Python using Flask and Flask-RESTful libraries. I would be remiss if I didn't include a shout-out to this repo which provides a suite of Docker images/scripts for quickly standing up a Flask application with UWSGI and NGINX. It meant I could go from nothing to a functional API for testing in about 10 minutes.
As noted above, we see a number of benefits by implementing such a solution:
1. Old iFrame implementation of stats pages requires us to pull in about 1-dozen Cricketstatz reports, do data manipulation and presentation client-side and render into our table manually. This takes 4-8 seconds and results in high latency for loading pages. New version simply calls an API to return and render processed data in < 1 second.
2. Custom filtering (seasons, teams, grades, etc.) is difficult due to the way the Cricketstatz reports work. Now we can implement this in the API and consolidate all stats into one page.
3. Previously we had HTML iFrame code for 6+ separate stats pages that needed to be manually updated every time we changed things. Now we have an API that can be redeployed at will and one front-end code base contained in Wix. This means that maintenance is much less of a burden.
4. The behaviour of the iFrame implementation on mobile was unstable. The datatables plugin that we used had issues running embedded in iFrames. Now we use the standard Wix table plugin that is mobile-responsive.
We're not yet entirely up-and-running with this solution, but we're hoping to be by the time pre-season training starts up in April. I'll update this post with details once we go live. For the time being, you can check out the VDCA website as it currently stands, with the old stats pages intact. The code for the new solution is available on my Github page.