Thursday, August 15, 2013

Find Your Co-Data

"Co-data" is my term for data that goes well and augments your core data set. I particularly like that The Weather Channel has found consumer behavior data predicting what you will buy depending on the weather. You don't need The Weather Channel's giant data set to find this data set. It could be as easy as looking at your fellow local businesses' websites.

Let's say you're a cab driver. You want to minimize wait times and maximize distance driven. How about finding out when colleges in your area start up again? Or checking out when a particular bar closes? Or finding out the time a particular show (preferably one with drunken attendees interesting in safe-cabbing it home) gets out?

I tried this simple method when I worked at PPG Industries. Of course, our sales of exterior paint increased when the weather got pleasant. Pulling free data off the NOAA Climate Data Center enabled me to do some rudimentary comparisons between our past sales by region and temperature. I found that people start painting more at about 50 degrees F, and that over about 84 degrees F the amount they paint starts to drop off (too hot out).

Using such basic data and simple correlation, I was able to optimize the load-in for our largest retailer's stores so that we had enough exterior paint early in the season... but not too early. I also found that using last year's sales to predict when we should ship this year was a lousy measure; better to use the average over the past three years and then build back two weeks for safety.

At Vocollect, we're discovering lots of cool ways to use the information we have to make our workers' lives easier. We're helping by giving simple suggestions such as prompting the user to access a feature when we notice the feature could be used to solve a problem we deduce the worker is having. The next phase will be to combine this user data with simple information we have from other sources to help suggest, for example, how two coworkers can avoid each other in a distribution center aisle to ease congestion delays.

All this work goes back to my feeling about big data: you don't need it if you have plain old "data" that you're not using in the first place.