Dags With No Tears Continuous Optimization for Structure Learning
DAGs with NO TEARS 🚫 💧
This is an implementation of the following papers:
[1] Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018). DAGs with NO TEARS: Continuous optimization for structure learning (NeurIPS 2018, Spotlight).
[2] Zheng, X., Dan, C., Aragam, B., Ravikumar, P., & Xing, E. P. (2020). Learning sparse nonparametric DAGs (AISTATS 2020, to appear).
If you find this code useful, please consider citing:
@inproceedings{zheng2018dags, author = {Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P.}, booktitle = {Advances in Neural Information Processing Systems}, title = {{DAGs with NO TEARS: Continuous Optimization for Structure Learning}}, year = {2018} } @inproceedings{zheng2020learning, author = {Zheng, Xun and Dan, Chen and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P.}, booktitle = {International Conference on Artificial Intelligence and Statistics}, title = {{Learning sparse nonparametric DAGs}}, year = {2020} } Update
Code for nonlinear NOTEARS has been added. See [2] for details.
tl;dr Structure learning in <60 lines
Check out linear.py for a complete, end-to-end implementation of the NOTEARS algorithm in fewer than 60 lines.
This includes L2, Logistic, and Poisson loss functions with L1 penalty.
Introduction
A directed acyclic graphical model (aka Bayesian network) with d nodes defines a distribution of random vector of size d. We are interested in the Bayesian Network Structure Learning (BNSL) problem: given n samples from such distribution, how to estimate the graph G?
A major challenge of BNSL is enforcing the directed acyclic graph (DAG) constraint, which is combinatorial. While existing approaches rely on local heuristics, we introduce a fundamentally different strategy: we formulate it as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. In other words,
where h is a smooth function whose level set exactly characterizes the space of DAGs.
Requirements
- Python 3.6+
-
numpy -
scipy -
python-igraph: Install igraph C core andpkg-configfirst. -
torch: Optional, only used for nonlinear model.
Contents (New version)
-
linear.py- the 60-line implementation of NOTEARS with l1 regularization for various losses -
nonlinear.py- nonlinear NOTEARS using MLP or basis expansion -
locally_connected.py- special layer structure used for MLP -
lbfgsb_scipy.py- wrapper for scipy's LBFGS-B -
utils.py- graph simulation, data simulation, and accuracy evaluation
Running a simple demo
The simplest way to try out NOTEARS is to run a simple example:
$ git clone https://github.com/xunzheng/notears.git $ cd notears/ $ python notears/linear.py This runs the l1-regularized NOTEARS on a randomly generated 20-node Erdos-Renyi graph with 100 samples. Within a few seconds, you should see output like this:
{'fdr': 0.0, 'tpr': 1.0, 'fpr': 0.0, 'shd': 0, 'nnz': 20} The data, ground truth graph, and the estimate will be stored in X.csv, W_true.csv, and W_est.csv.
Running as a command
Alternatively, if you have a CSV data file X.csv, you can install the package and run the algorithm as a command:
$ pip install git+git://github.com/xunzheng/notears $ notears_linear X.csv The output graph will be stored in W_est.csv.
Examples: Erdos-Renyi graph
-
Ground truth:
d = 20nodes,2d = 40expected edges.
-
Estimate with
n = 1000samples:lambda = 0,lambda = 0.1, andFGS(baseline).
Both
lambda = 0andlambda = 0.1are close to the ground truth graph whennis large. -
Estimate with
n = 20samples:lambda = 0,lambda = 0.1, andFGS(baseline).
When
nis small,lambda = 0perform worse whilelambda = 0.1remains accurate, showing the advantage of L1-regularization.
Examples: Scale-free graph
-
Ground truth:
d = 20nodes,4d = 80expected edges.
The degree distribution is significantly different from the Erdos-Renyi graph. One nice property of our method is that it is agnostic about the graph structure.
-
Estimate with
n = 1000samples:lambda = 0,lambda = 0.1, andFGS(baseline).
The observation is similar to Erdos-Renyi graph: both
lambda = 0andlambda = 0.1accurately estimates the ground truth whennis large. -
Estimate with
n = 20samples:lambda = 0,lambda = 0.1, andFGS(baseline).
Similarly,
lambda = 0suffers from smallnwhilelambda = 0.1remains accurate, showing the advantage of L1-regularization.
Other implementations
- Python: https://github.com/jmoss20/notears
- Tensorflow with Python: https://github.com/ignavier/notears-tensorflow
Source: https://github.com/xunzheng/notears
0 Response to "Dags With No Tears Continuous Optimization for Structure Learning"
Post a Comment