As I sat down to write this article on the “Sweater Effect” and its impact upon modeling, I recalled the manner in which I was originally introduced to the subject. I was working as a director of corporate planning at a public utility, when a junior planner came into my office with a problem. He was struggling with the concept that all degree days were not equal.
Degree days in my gas company send out plots showed consumption versus degree days as extremely non-linear behavior. Days with only a few degree days had a lower consumption per degree day, moderate degree days were linear and high degree days had less consumption per degree day than moderate degree days. In addition, degree days in early fall had materially less consumption than equivalent degree days in early spring.
I, too, was perplexed by the data. I decided that the best way to find my answer to the dilemma at hand was to begin asking questions of the data instead of blindly running standard analytics on it.
I encountered something somewhat relevant while installing a new kitchen in my home. The pipes in most of the kitchens in my housing development often froze and occasionally burst. What was curious to me about this behavior was that despite the nearly identical design of all of the homes, the pipes in the more recently constructed homes had a greater tendency to freeze and rupture.
In examining my deconstructed kitchen, I found that the water pipes were installed across the top of a floor with an unheated garage below. Above and to one side, they were covered by base cabinets and on the other side was an outside wall. To correct this problem, the builder of my house thought he could provide heat to the pipes by insulating them with a blanket of fiberglass insulation placed in the space between the cabinets and the pipes. However, the insulation actually acted as barrier, thus preventing any heat from coming into the space.
Older homes on the street had fewer problems because in the builder’s growing frustration during the four years while the street was being developed, he placed greater amounts of higher quality insulation in the newer homes, making their issues worse from his efforts. Clearly, not fully understanding the root of the problem and listening to what the data told him caused a lot of extra work, frustration and expense where an expert consultation could have eliminated his problem.
While I was working on the kitchen, I had an abundance of time to think; “Were we at the utility guilty of not listening to the data? What was I missing in my understanding of consumption?” Houses are relatively simply constructed entities, leaking heat by conduction, convection, radiation and infiltration of outside air. Furnaces are, in general, also very simply controlled devices, either on or off. So what was it that I did not understand? Ultimately, after much analysis, the solution arrived.
Individual homes are different and linear, but the curve reflecting a group of homes was non-linear. This individual linear behavior aggregating to non linear behavior is mathematically unusual, but not impossible. The detailed examination of the relationship between degree days and heating-related consumption revealed that individual homes consume almost zero until a critical point and then exhibit linear behavior which continues until a furnace is operating 24 hours in a day, a point where it cannot produce any more heat. Other researchers developed these critical point relationships. Princeton Index Score Keeping Method (PRISM) was among the first. More recently, change point modeling for energy efficiency analysis explains some of this behavior.
Differences in construction and thermostat settings cause homes to begin heating at different temperatures and have different points of maximum output. They sum mathematically in an “S” shaped curve. My planning department’s reliance on send out curves provided a completely false understanding of the underlying truth.
My kitchen epiphany did not, however, provide an answer to the question of spring versus fall differences. In a brainstorming session, my staff at UtiliPoint Analytics posed an explanation. Perhaps people were insulating themselves with clothing in the fall, rather than engaging the furnace. They coined the term “Sweater Effect.” Confirming their hypothesis required analyzing error terms regression modeling of millions of homes across several states.
We found in the early fall, roughly a third of the consumers endure the cold for a period of time. Their endurance ends abruptly after either several consistently chilly days or a very cold day, and this behavior significantly alters the send out curves. Correct modeling of the Sweater Effect requires a long time series, error analysis and counting degree days from a summer month by individual, however standard regression calculations can be modified with low effort approximations.
What I learned from these examinations is similar to what I learned while building my kitchen. A master craftsperson would have altered the design a little and the work would have been higher in quality. I learned a great deal from doing my first kitchen and my next kitchen will be much better. I also gained an understanding of why the pipes freeze and have since put in place a permanent solution.
When I was 12 years old and assisting my father with my first home project of adding a room to the house, I made the statement, “This saw does not cut straight.” My father replied “It’s not the tool, it’s the carpenter, son,” and a lifelong lesson was learned. Insight and quality work comes from the craftspeople, not the tools.
Similar lessons can be applied to Big Data Analytics. Realize that there are master craftspeople who can provide higher quality work and better designs. Use them to learn techniques for future projects that can be done yourself or internally within your organization. The benefits to doing it yourself are the same as with a kitchen; cost cutting, learning and satisfaction of mastering new technology. Most importantly, analytics are tools to be used by craftspeople. Frequently I have overheard statements like “We will just do some analytics” or “Just do it in HADOOP.” I find these calls similar to a layman saying to builders, “You should just do some table sawing.”