Combining Separate Studies into Coherent Wholes with Bayes Expert

The main technical problem in the design of Bayes Expert is determining how to best estimate the probability of a variable given combinations of all the input variables that have never been tested in combination, or how to convert a dependency rule into a Conditional Probability Table. We accomplish this through the use of quadratic programming with hard constraints based on probability rules and soft constraints derived from a population like the one it is applied to.

Our algorithm, which is a type of mutual validation, determines how much the models agree with each other. What we do is find consonant sets of data that reinforce each other, especially when it comes to relative risk statistics. If we change the relative risks by small normalized fractions of their 95% bounds and the result is feasible in quadratic programming, then the relations and their data, that is the conditional probability tables and discrete distribution tables, fit the data of the studies when they are only slightly adjusted to each other. However, if they must change within larger bounds to be feasible, the studies will disagree more and are less likely to be valid together. To serve as a validation score, we record the amount by which we have to change the upper - lower window in the quadratic programming inequality constraints.¹⁸

A variable combination can be incorrect in a variety of ways. For example, the data may not be matched between variables with the same names, or the logical ands and ors used to map the variables may be incorrect. The dependency structure may be incorrect. Populations in studies expressed in variables many links away on the internet may not match, or human errors in transcription of the relative risks may occur. The studies underlying the relative risks could be incorrect, or the data could be fabricated. All of these errors, however, appear in the validation score, allowing for the automated machine combination of the GCN, the culling out of crowd-sourced studies that do not agree, and guidance on what to include and where the bugs are in manual nets. We used the validation score to successfully debug typos and cull studies in the handwritten longevity net.

The absence of a variable, on the other hand, is not an error, and thus the validation score cannot detect it. The conditional probability table's job is to mathematically combine the given risks, not to adjust those risks to a specific outcome: this is not a prediction algorithm. We only describe known risks of variables that are correct in combination as long as they are correct individually. This is true even though the combination excludes more important risk factors for the outcome. These risks that are unknown to the model were simply not delineated, which doesn’t make the combined relative risks incorrect.

As a result, Bayes Expert networks should be validated by measuring the combined relative risk it calculates in a hold out set of data rather than by measuring outcomes. This is appropriate because this method does not predict, which machine learning can do better because it can account for all risks. We only include known risks and, as a result, can explain what those risks are better than a machine learning technique. In the longevity app, this is the function of Bayes Expert, whereas machine learning is used for prediction tasks.

Furthermore, because we can condition risk on the state of the variables within the simulation, only including known risks is good for describing the known world in the GCN's simulation. Unknown factors, such as those used in machine learning, would misattribute risks in a causal simulation. Bayes Expert will assist in problem solving by selecting causal models based on how well they agree within the space of relative risk relationships it defines.

Figure 4. An example of a dependency rule, where dependencies in the science literature can be encoded, using relative risk or sensitivity/specificity and their confidence intervals. The input variables, from two separate studies, are in the middle, while output variable values and their priors are at the bottom. There is one CPT per rule in BayesExpert.

¹⁸ Duong, D. (2022) Bayes Expert: Crowdsourcing the Community of Science. Available at: https://github.com/Rejuve/bayesnet/blob/master/Rejuve%20BayesExpert.pdf (Accessed 5 October 2022)

Last updated