An organization is considering the adoption of a particular type of incentive plan to increase the motivation of its employees to extend their best efforts and to focus that effort on meeting organizational objectives. Using a data scientist the HR Director requests that research be performed that will predict the likelihood of success with a plan that was effective in another division. Using machine learning tools the data scientist examines the results experienced by the other division to conclude that the same plan will be effective if used in other divisions.
The HR Director is very familiar with the benchmarking principle that in order for results produced in one context to be likely to have the same results in another context the two contexts must be substantially similar. This principle is based on concepts from research that an internally valid study must also be externally valid (generalizable) in order for it to produce the same results in another context. A recent book argued that extrinsic rewards reduced intrinsic rewards based on a lab study that involved people throwing tennis balls at targets for short periods for insignificant rewards. When one compares that context with one where people work to support themselves over a career, often doing very unpleasant things, it seems absurd to claim the two contexts are even remotely similar. Yet a machine learning algorithm may well make that error because its predictions lack any understanding of human nature. A Facebook algorithm was recently accused of gender discrimination because it sent ads for jobs in STEM fields much more often to men then to women. However, it was found that the decision was made on economic factors (it was more costly to reach working women), rather than on a judgment as to whether women were considered to be capable of doing jobs in the STEM arena. Regrettably on the surface it appeared that the results supported the latter assumption.
Machine learning tools could be incapable of predicting the success of an incentive plan considering the impact of differences between the two divisions mentioned earlier. For example, if one division had a hierarchical culture while the other allowed employees to take the initiative to deal with issues this might result in different levels of motivation to excel. Recognition by employees in the top-down control culture that they were not empowered to impact results can certainly depress initiative. That would certainly limit the motivational impact of the plan in one type of culture. So the differences in the cultural contexts might make comparisons across the divisions inappropriate. When working in the Middle East I had an HR executive ask me if U.S. style incentive plans would be effective if employees believed results were out of their control and that it would be irreverent of them to assume they could have an impact on outcomes that were determined by a higher power. Lacking an answer based on experience I had to apply behavioral science principles and suggest the cultural difference would certainly impact individual decisions about taking the initiative. A manager might be able to convince employees that their talent was a gift and that it was their responsibility to use the gift but the different cultural orientation certainly changed the game.
Melanie Mitchell, a researcher at the Santa Fe Institute, observed in a recent New York Times article that AI systems lack an understanding of situations and the meaning of the differences across situations. Again, what works in one context needs to be understood before projections can be made about what results would be in another context. Her long experience with AI systems has made it clear to her that even minute differences in contextual characteristics can have a major impact on outcomes. Someone hacking into a system and making changes that are so minor that they would not be detectable by humans could dramatically reduce the effectiveness of the algorithms. And if the target is a control systems for a city’s power network this could be too important to overlook. The fact that machine learning algorithms have been trained in specific contexts makes them vulnerable to erratic results when contextual details change.
Back to the HR Director deciding whether to adopt the incentive plan from one division in another. The division that had used the plan had included all employees as participants. But the division executive considering adopting wishes to carefully select plan participants based on their ability to significantly impact results (perhaps including only managerial and professional employees). This decision would be driven by human judgment, applying the principle that those lacking the latitude to vary their behavior and control their results would not be affected by participation. The algorithm based on the division that had used the plan may in this case be a bad choice to project the success in the other division. Interestingly the algorithm might find the new application to be a good choice because it would cost less (smaller participant payroll), when in fact the trimming of participants may result in strong negative reactions by those excluded. The lower cost would be detectable by the algorithm but it would be incapable of detecting the probable angst felt by non-participants, which is attributable to human characteristics.