John J. Nay     About     CV     Publications     In Progress     Software

Work in Progress


Nay, J. J. (2016). “Predicting and Understanding Law-Making with Word Vectors and an Ensemble Model.” eprint arXiv:1607.02109. Revised and Resubmitted.

Abstract Out of nearly 70,000 bills introduced in the U.S. Congress from 2001 to 2015, only 2,513 were enacted. We developed a machine learning approach to forecasting the probability that any bill will become law. Starting in 2001 with the 107th Congress, we trained models on data from previous Congresses, predicted all bills in the current Congress, and repeated until the 113th Congress served as the test. For prediction we scored each sentence of a bill with a language model that embeds legislative vocabulary into a semantic-laden vector space. This language representation enables our investigation into which words increase the probability of enactment for any topic. To test the relative importance of text and context, we compared the text model to a context-only model that uses variables such as whether the bill's sponsor is in the majority party. To test the effect of changes to bills after their introduction on our ability to predict their final outcome, we compared using the bill text and meta-data available at the time of introduction with using the most recent data. At the time of introduction context-only predictions outperform text-only, and with the newest data text-only outperforms context-only. Combining text and context always performs best. We conducted a global sensitivity analysis on the combined model to determine important factors predicting enactment.



Nay, J. J., Burchfield, E., Gilligan, J. (2016). “A Machine Learning Approach to Forecasting Remotely Sensed Vegetation Health.” eprint arXiv:1602.06335. [Revise and Resubmit at Computers & Geosciences].

Abstract Drought threatens food and water security around the world, and this threat is likely to become more severe under climate change. High resolution predictive information can help farmers, water managers, and others to manage the effects of drought. We have created a tool to produce short-term forecasts of vegetation health at high spatial resolution, using open source software and data that are global in coverage. The tool automates downloading and processing Moderate Resolution Imaging Spectroradiometer (MODIS) datasets, and training gradient-boosted machine models on hundreds of millions of observations to predict future values of the Enhanced Vegetation Index. We compared the predictive power of different sets of variables (raw spectral MODIS data and Level-3 MODIS products) in two regions with distinct agro-ecological systems, climates, and cloud coverage: Sri Lanka and California. Our tool provides considerably greater predictive power on held-out datasets than simpler baseline models.



Gunda, T., Bazuin, J., Nay, J. J., Yeung, K. (2016). “An Assessment of Seasonal Forecast Use Benefits with an Empirically-Grounded Computational Simulation of Farmers.” Revised and Resubmitted.



Nay, J. J., Ruhl, J.B., Gilligan, J.M. (2016). “The Evolution of Presidential Policy: A Statistical Topic Modeling Analysis.”



Gilligan, J.M., Worland, S.C., Wold, C.A., Nay, J. J., Hess, D.J., and Hornberger, G.M. (2016). “Urban Water Conservation Policies in the United States: A Statistical Analysis.” Under Review.



Team Lead and Co-PI on a grant to combine machine learning and econometric approaches to estimate causal effects of federal agricultural policy on drought impacts over the past 50 years.



With J.B. Ruhl and David Markell, a text analysis of all climate change litigation in the U.S.



Developing methods for efficiently exploring vast amounts of law and policy text.



One of our time series topic modeling analyses