Bias/Variance Trade-off in the real world.
As a former archaeologist who ended up as a GIS Supervisor at an electric company, I have a few examples of models to build from that seem vastly different, but when applied to the bias/variance trade-off actually have a lot of modeling similarities.
What does bias look like in archaeology? Well, most of what archaeologists do can be considered bias. The only thing we have left to extrapolate over an entire population is through a biased sample. Those remains just happen to be in a place where people left a lot of stuff and the stuff survived the vigors of the environment, including future people building there. Plus, it just happens to be that an archaeologist is put there to find it. We end up with a lot of random chance, that isn’t so random!
How do we add more variance into a biased model? The lazy answer is more data. But how do you possibly get more data? There are a few ways. The first is to look at multiple samples, hopefully, close to one another. When looking at the Clovis first theory in Native American archaeology, many people looked at tons of sites in North America that “dated” to around the same time period. They were looking for what was considered “Clovis” technology and items, which is too exact to go into for our purposes. This actually created a further biased dataset. It wasn’t until after 2000 that people building in the tip of South America found sites that were much, much older than Clovis sites. And the tools were so different than Clovis that they may have been overlooked in other sites. So to make our samples unbiased, it was understood we may have to dig deeper or take a deeper look at what we were considering as tools.
The other way to add more variation into our data is a technique that is only now barely taking into effect. That is to ask the elders. There was much trade between ancient peoples, more than we often think. So to define the real nuances between cultures and their neighbors, we must ask elders what they define as their cultural traditions. There is such a rich oral history that it can provide a lot of insight. This has been done on more recent historical sites as well. For example, the history of Pinyon Canyon near La Junta, Colorado is enriched with journals of the residents who lived there. This is especially valuable when examining huge events like a devastating flood that occurred in the area in the 1800s. By reading multiple journal entries from the survivors, the final interpretation includes a more comprehensive view.
So what about variation? Doing data for an electric company involves a lot of datasets. Data is collected everywhere and with every means possible. It is also used by a lot of different departments like engineers and outage management staff. But can you have too much data? Yes, you can! Believe it or not, there are different versions of reality, even when it comes to the equipment on the pole. In a previous article, I talked about the different languages spoken that are unique to each department. In the case of outages, a system-wide engineer may have designed the perfect system by using all of the data given to them. I have seen gigantic system-wide maps that really use energy efficiently and distribute it out equally to every branch of the system. But, these are not reality. In fact, I have seen several cases in which a line has been rebuilt under a multi-million dollar project yet one spinoff section of line feeding a subdivision constantly is losing power because the devices monitoring it are not the right ones. This is because there are natural forces or individual use cases that are not accounted for in the original model. But, they should NOT be in the system-wide model because they are unique cases. When marijuana was made legal in Colorado, there were small clusters of growers from out in the boondocks to in the middle of suburbia. Yet, if we predicted these people in the model, it would throw off the entire system. They are outliers, and our data didn’t predict their extreme usage. What matters here is to rely on field crews to define the system. In fact, the electric company I worked for had a very low outage rate because they relied so much on field crews to further tell where the outages where.
The commonalities between archaeology and an electrical system model seem non-existent. But in both cases, we can see how further human interpretation of a basic model helps bridge the gap between the bias/variance trade-off. In both instances, we saw that talking to subject matter experts can decrease both bias and variance.