The Harvard Business Review reported that analysts spend over 80% of their time simply discovering or preparing data. A lack of strategy or architecture is costing you time, money and ultimately points on the pitch.
There are two big costs with not having proper data infrastructure around your analysis. The solution may vary but the costs remain the same.
- Time to insight: Without a process around your data collection, storage and retrieval you are likely wasting huge time producing reports or perhaps even missing analysis altogether because of the time it takes.
- Data Continuity: How often when a key employee leaves does data walk out the door at the same time. Your organisation should be treating the data it collects as a key business asset and with that it should get the structure and attention it needs.
Your organisation likely spends large sums of money on collecting data in-house via analysts and sports science staff while also purchasing data from external sources. That needs to be managed, stored correctly and treated like the valuable asset that it is. The head coach’s job might be to focus game-to-game but who is looking after the long term data strategy at your organisation?
When organisations lack robust data architecture it forces each organisational unit (sports science, analysis, medical) to create their own data silos and means data management is often done in isolation and with inconsistencies. This process is inefficient and expensive.
Spreadsheets are (very often) not the answer:
For the better part of twenty years, Microsoft Excel has been the most popular spreadsheet application on the planet. While many articles have predicted its demise it seems unlikely this (and it’s younger cousin Google Sheets) are going anywhere in a hurry. But just because something is ubiquitous doesn’t make it the right choice.
Is a spreadsheet a database?
Spreadsheets and databases involve different technologies but share common characteristics. The advent of Google Sheets and Excel Online has blurred the lines between the two but they are fundamentally different technologies.
The issue with spreadsheets is that fulfill so many tasks. They allow you store, run calculations and even build complex reports all together in one place. However we often don’t separate these tasks in different workbooks and the sheet we designed to store our data has now become our chief report generator workbook and all those calculations we’ve needed over the years have been slowly bolted onto the data as we went. There is rarely any documentation around what is what. This leads to costly errors and a lot of wasted time.
That same workbook likely has multiple copies floating around the org, often not sync’d with the latest version with many people having added their own nuances in the form of additional calculations, filters or even edits to the data.
If the above description sounds like you see issues 1 and 2 at the top of this article. How much is this lack of a data strategy costing you in terms of resources or points on the pitch?
Reasons to have a database:
A relational database management system (RDMS) standardizes the way data is stored and processed. RDMS tables store data in a logical manner specifically designed to provide data integrity, reduce duplication, and minimize irregularities.
Having a database usually means there is a single version (master copy) of the data. If you need to query some data from a report you know exactly where it is, that it’s up to date and it’s the most reliable source the organisation has. While the database is administered by someone it acts more a central repository owned by the organisation and not a bunch of excel’s sitting on someones OneDrive.
If the database is built correctly it reduces the need to duplicate data and have similar data stored in multiple places. You also have much tighter controls on what data get’s inserted into a database, who can perform these tasks – all leading to a cleaner more organised structure to your data and ultimately reducing the time to analysis.
The other reason to have a database is the ease with which you can connect it to 3rd party applications or even build your own applications on top. For example connecting a visualisation tool like Tableau is much easier of the data is structured and stored in a database.
In Ben Alamar’s book Sports Anlytics he surveyed 27 individuals from the NFL, NBA, MLB and the Englush Premier League and found that 1/3 had a dedicated database programmer on the sports side of the organisation. This book was published in 2013 and I’m sure the landscape has changed since then but it would be interesting to see this survey conducted today in non-US sports.
Communicating with a Database
Once the data is stored in a database you need a way to communicate with it. This is usually done using a language called SQL (pronounced sequel). It stands for Structured Query Language and it’s the primary language used to communicate with a database. The best explanation if seen of how this works is the video below.
SQL is not a programming language, it’s a query language. The primary objective where SQL was created was to give the possibility to common people get interested data from database. It is also an English like language so anyone who can use English at a basic level can write SQL query easily.
According to a recent analysis by Burning Glass, SQL is the most requested tech skill among tech jobs.
If you are starting out in your career in data and analytics having SQL as a skill will server you well in almost any sport or business setting.
*Note – This article doesn’t try to proclaim there is a one best way to manage your data. Choosing the right tools for the size, quantity and frequency of your data along with your budget and IT skills all need to be considered before choosing the right solution. With that said – most organisations who take data seriously will have more robust data storage processes than spreadsheets. Spreadsheets are not considered best in class but if that’s all your budget can stretch to right now make sure to build them and use them in the best way. Read more here….