- Teaching computers to do science
- Papers are useless, all the information is in code
- Model Augmentation and Synthesis
- Arbitrary models are complex, but transformations are simpler
- Project Repo github.com/jpfairbanks/SemanticModels.jl

Make an initial model $ y \approx \beta x $

Make a better model $ y \approx \beta x + \gamma y $

Interpret $\beta, \gamma $ to understand the world

Fitting the data is a regression problem:

$$h^* = \min_{h\in {H}} \ell(h(x), y)$$

Institutional process of discovery is

$$\max_{{H}\in \mathcal{M}} expl(h^*)$$ where $expl$ is the explanatory power of a class of models $H$.

- The explanatory power is some combination of
- generalization,
- parsimony,
- and consistency with the fundamental principles of the field.

Most frameworks are designed before the models are written

Domain | ||
---|---|---|

Algebra | ||

Learning | ||

Optimization | ||

Modeling |

SemanticModels is a post hoc modeling framework

- (a) Cumulative number of infected individuals as a function of time (day) for the three countries Guinea, Liberia and Sierra Leone.
- A Khalequea, and P Senb, "An empirical analysis of the Ebola outbreak in West Africa" 2017

In [11]:

```
abstract type AgentModel end
mutable struct StateModel <: AgentModel
states
agents
transitions
end
```

In [13]:

```
#using AgentModels <- hypothetical ABM library
function main(nsteps)
n = 20
a = fill(:S, n)
ρ = 0.5 + randn(Float64)/4 # chance of recovery
μ = 0.5 # chance of immunity
T = Dict(
:S=>(x...)->rand(Float64) < stateload(x[1], :I) ? :I : :S,
:I=>(x...)->rand(Float64) < ρ ? :I : :R,
:R=>(x...)->rand(Float64) < μ ? :R : :S,
)
sam = StateModel([:S, :I, :R], a, T, zeros(Float64,3))
newsam = step!(sam, nsteps)
counts = describe(newsam)
return newsam, counts
end
```

Out[13]:

main (generic function with 1 method)

```
using LsqFit
function f(x, β)
return β[1] .* x + β[2]
end
function main()
X = load_matrix("file_X.csv")
target = load_vector("file_y.csv")
a₀ = [1.0]
fit = curve_fit(f, X, target, a₀)
return fit
end
main()
```

CT is the mathematics of structure preserving maps. Every field of math has a notion of *homomorphism* where two objects
in that category *have similar structure*

- Sets, Groups, Fields, Rings
- Graphs
- Databases

CT is the study of structure in its most general form.

- $ G = (V,E) $
- $Ob(G) = V$
- $Hom_G(v,u) = (v\leadsto u) \in E$

- Graph Homomorphism $f: G\to H$ st $(v\leadsto u) \in G \implies (f(v) \leadsto f(u)) \in H$
- $Ob(Graph)$ is the set of all graphs
- $Hom_{Graph}(G,H)$ is the set of all graph homomorphisms between $G,H$

We have built a novel modeling environment that builds and manipulates models in this category theory approach.

Contributions:

- We take general code as input
- Highly general and extensible framework
- Goal: Transformations obey the functor laws.

Show the workflow demo

- Computers are good at type checking
- Can we embed our semantics into the type system?

Convert categorical values into singleton types:

Convert categorical values into singleton types:

SemanticModels.jl github.com/jpfairbanks/SemanticModels.jl is a foundational technology for teaching machines to reason about scientific models

Thinking in terms of transformations on models is easier than thinking of models themselves.

A good type system can reason over modeling concepts