E-graph

Graph data structure

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let $\Sigma$ be a set of uninterpreted functions, where $\Sigma _{n}$ is the subset of $\Sigma$ consisting of functions of arity $n$ . Let $\mathbb {id}$ be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of $f\in \Sigma _{n}$ to e-class IDs $i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id}$ is denoted $f(i_{1},i_{2},\ldots ,i_{n})$ and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:^[1]

A union-find structure $U$ representing equivalence classes of e-class IDs, with the usual operations $\mathrm {find}$ , $\mathrm {add}$ and $\mathrm {merge}$ . An e-class ID $e$ is canonical if $\mathrm {find} (U,e)=e$ ; an e-node $f(i_{1},\ldots ,i_{n})$ is canonical if each $i_{j}$ is canonical ( $j$ in $1,\ldots ,n$ ).
An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons $H$ (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map $M$ that maps e-class IDs to e-classes, such that $M$ maps equivalent IDs to the same set of e-nodes: $\forall i,j\in \mathbb {id} ,M[i]=M[j]\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)$

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants.^[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes $f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})$ are congruent when $\mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}$ . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the $\mathrm {add}$ , $\mathrm {find}$ , and $\mathrm {merge}$ operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

E-matching

Let $V$ be a set of variables and let $\mathrm {Term} (\Sigma ,V)$ be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, $\mathrm {Term} (\Sigma ,V)$ is the smallest set such that $V\subset \mathrm {Term} (V,\Sigma )$ , $\Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)$ , and when $x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)$ and $f\in \Sigma _{n}$ , then $f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)$ . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph $E$ represents a ground term $t\in \mathrm {Term} (\Sigma ,\emptyset )$ if one of its e-classes represents $t$ . An e-class $C$ represents $t$ if some e-node $f(i_{1},\ldots ,i_{n})\in C$ does. An e-node $f(i_{1},\ldots ,i_{n})\in C$ represents a term $g(j_{1},\ldots ,j_{n})$ if $f=g$ and each e-class $M[i_{k}]$ represents the term $j_{k}$ ( $k$ in $1,\ldots ,n$ ).

e-matching is an operation that takes a pattern $p\in \mathrm {Term} (\Sigma ,V)$ and an e-graph $E$ , and yields all pairs $(\sigma ,C)$ where $\sigma \subset V\times \mathbb {id}$ is a substitution mapping the variables in $p$ to e-class IDs and $C\in \mathbb {id}$ is an e-class ID such that each term $\sigma (p)$ is represented by $C$ . There are several known algorithms for e-matching,^[3]^[4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.^[5]

Complexity

An e-graph with n equalities can be constructed in O(n log n) time.^[6]

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs.^[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3^[8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.^[9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.^[10] E-graphs are also used in the Simplify theorem prover of ESC/Java.^[11]

Equality saturation is used in specialized optimizing compilers,^[12] e.g. for deep learning^[13] and linear algebra.^[14] Equality saturation has also been used for translation validation applied to the LLVM toolchain.^[15]

E-graphs have been applied to several problems in program analysis, including fuzzing,^[16] abstract interpretation,^[17]^[18] and library learning.^[19]

References

^ (Willsey et al. 2021)
^ (Willsey et al. 2021)
^ (de Moura & Bjørner 2007)
^ Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi:10.1016/j.entcs.2008.04.078. ISSN 1571-0661.
^ Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi:10.1145/3498696. S2CID 236924583.
^ (Flatt et al. 2022, p. 2)
^ (Tate et al. 2009)
^ de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi:10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.
^ Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi:10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6. {{cite book}}: |journal= ignored (help); Missing or empty |title= (help)
^ (Flatt et al. 2022, p. 2)
^ Detlefs, David; Nelson, Greg; Saxe, James B. (2005-05-01). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi:10.1145/1066100.1066102. ISSN 0004-5411. S2CID 9613854.
^ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566. ISSN 0362-1340.
^ Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv:2101.01332 [cs.AI].
^ Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv:2002.07951 [cs.DB].
^ Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi:10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.
^ "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.
^ Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv:2203.09191. {{cite journal}}: Cite journal requires |journal= (help)
^ Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv:2205.14989. {{cite journal}}: Cite journal requires |journal= (help)
^ Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv:2212.04596. doi:10.1145/3571207. ISSN 2475-1421. S2CID 254536022.

de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3.
Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597.
Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086.
Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847.

External links

The Egg Project
A Colab notebook explaining e-graphs

Program analysis

Key concepts

Semantics

Types	Axiomatic Denotational Categorical semantics Operational Big-step Small-step
Models	Lambda calculus Petri net Process calculus Rewriting system State machine Turing machine

Analyses

Static	Abstract interpretation Alias Control flow kCFA Data-flow Dependence Effect system Escape Model checking Pointer Shape Symbolic execution Termination Type systems Typestate
Dynamic	Data-flow Taint tracking Concolic execution Fuzzing Invariant inference Program slicing Testing

Formal methods

Concepts

Logics

Hoare
Incorrectness
Linear
Separation
Temporal

Data structures

Tools

Constraint solvers	CHC SAT SMT
Lightweight	Alloy TLA+
Proof assistants	ACL2 Agda Coq F* HOL Light HOL4 Idris Isabelle Isabelle/HOL Lean LEGO Mizar NuPRL PVS Twelf