The internet is rapidly growing, and it’s becoming more and more commonplace just to accept that every site you use pulls personal data.
Whether you agree this is for the greater good or not is largely down to opinion, but it is happening, and every metric ranging from how much margarine you bought in Tesco, to your pension contribution is being recorded somewhere for use in someone’s project.
When managing a project, the amount of data your developers are dealing with is starting to matter and needs to go on your risk register.
Bigger Data? So What?
Any decent project manager will ensure that a prototype is used as a proof of concept or at the very least a development, staging and live environment is created so you can test out new pieces of functionality.
It’s common to test with a data subset ( i.e. an example of all the variables but with a tiny fraction of the actual data used) in these situations to iron out any functionality issues such as verification / null values in fields.
This position is good practice when the data you’re dealing with is a couple of Megabytes, but as data becomes bigger (i.e. in the terabytes) it’s the data which becomes it’s own issue, rather than merely how it’s applied.
The lead time to copy data over and the lag which you’ll get on the live server with big data can be enormous. That process which took half a second with the subset of data is now taking three hours – and someone needs to escalate it, or you’re likely to miss your project success criteria and ultimately your KPIs.
Big Data Solution
Obviously, the first solution would be to take this into account when putting the tasks and project plan together, but there is another way. One firm is helping companies over the challenge of big data by using some pretty clever software which reduces the amount of resource overhead needed to run the backups and move data about (i.e. disc space, memory, etc.).
They’re also able to “mask” data, which given the ever growing rules regarding data protection means you don’t have to let developers see the confidential bank or medical files on which they are working, but they still have access to the same amount of data to test.