Programming an Intelligent City: The Role of Data Science
By Brandon T. Willard, Data Science Lead
CityBase is ultimately concerned with making local governments better capable of supporting their constituents. To do that, we need to create and use transformative technology.
Currently, many of technology’s more “transformative” developments revolve around data science, machine learning, and artificial intelligence. In this post, we’ll start by asking how technologies in these areas can be transformative and how they relate to CityBase.
How Data Science Can Optimize Local Government
Developments in the areas of data science, machine learning, and artificial intelligence promise real breakthroughs in automation. Because of this, they are particularly important to nearly every industry—including the govtech space that CityBase occupies.
While some of the capabilities data science offers are more experimental than others, the more established capabilities still provide the necessary steps toward long-awaited services, like
- Manageable real-time monitoring and detection for reliability and security
- Services backed by data from currently unconnected agencies
- Recommendations for—and easy enrollment in—government assistance programs and services
- Considerations and exemptions for personal circumstances and financial stability
Simply put, data science, machine learning, and artificial intelligence are bringing us measurably closer to wide-scale, sophisticated automation to improve local government service delivery.
These services, and many more demand a reliable and consistent use of inference that goes beyond the conventional capabilities of enterprise software offerings and practice. For instance, while a local government website may already have all services listed online, a person would need to self-select which government assistance programs to apply for. When technology can make inferences based on a constituent’s relevant circumstances, local government can proactively recommend services.
While we could wait for the state of government technology to improve just enough to remove most of the inference burden, there’s no guarantee that the necessary improvements will happen—within our lifetimes—across enough local governments and their independently operated services. Worse yet, while we wait, constituents—like you and me—are suffering from the inefficiencies these services address!
We’ve already done pretty well within the current state of technology (e.g., providing a unified government service API), but we have to keep looking forward. That’s where data science, machine learning, artificial intelligence, and their transformative potential come in.
What We Mean by “Data Science”
The common conceptions of data science, machine learning, and artificial intelligence tend to revolve around computation, and, within that area, they’re all brought together—as well as differentiated—by what they do (obviously), how they do it, and what they do it with.
In a nutshell, what they do is largely inference. How they do it involves a number of popular mathematical modeling techniques and models. And, naturally, they do it with everyday computers, smartphones, head-up displays, graphical processing units (GPUs), cloud platforms and services, and sometimes even special “brain-like” hardware, among others. Here, we’re primarily concerned with “what they do” and “how they do it.”
It’s easy to find articles online that try to explain exactly what data science is in relation to machine learning, artificial intelligence, statistics, etc., but we’re not going to tackle that here. Instead, for our purposes, let’s just say that “data science” focuses more on the infrastructure for—and application of—statistical inference and mathematical modeling. In other words, the term “data science” is meant to signify an applicable synthesis of mathematical modeling and computer science within the industry.
In practice, data science usually looks like standard work in established areas of technology: software development, database administration, dev-ops, business intelligence, and so on. A strong association between data science and software development/data infrastructure is absolutely necessary. In order to effectively use the results of machine learning, mathematical modeling, etc., infrastructure—among other things—must be primed for it. Transformational changes often go hand-in-hand with infrastructure changes. This holds even at the conceptual level where, for example, low-level guiding principles—like those found in functional programming—can produce code that’s more readily debugged and extended.
While data science has widened the scope of possible automations—by, for instance, making it possible for nearly anyone to write code that can interpret written word, and interact in near natural spoken language—it also puts its own hefty demands on the subjects it relies upon.
For instance, it’s not often that a single mathematical model will suffice for a given industry problem. So adding a data science–like capability that involves mathematical modeling isn’t as simple as implementing a single model from an academic paper. More often than not, multiple models are required under different circumstances, and this sort of model orchestration is itself a non-trivial statistical problem.
Why Programming Matters in Theory and Practice
The complexity involved with synthesizing statistics, computer science, etc., puts new requirements on both the theory and implementation. Most professionals in any one of these areas will easily recognize the importance of their subject’s theoretical framework(s). But, too often, those frameworks are pushed aside when their relevance isn’t immediately apparent.
Unfortunately, that’s not the case for the computational synthesis that is data science. As software developers know only too well, modeling the problem domain adequately—for the task at hand and future changes—is a large and important part of the job. Recent decades have demonstrated this focus through an emphasis on object-oriented programming and its design patterns. This conceptual framework has become common among enterprise developers and successfully used to model the elements and interactions of numerous problem domains.
As software development moves forward, it continues to push its means of expressing arbitrary objects and relationships while retaining a meaningful connection to its practical concerns: computability, interpretability, reliability, testability, and so on. (See previous CityBase posts on the use of Elixir and umbrella projects.)
The same is true for mathematical modeling and all the other areas data science touches. As a result, data science has to effectively synthesize both the theory and application of all its constituent fields. In other words, data science necessarily relies on powerful computer science paradigms and their resulting programming languages and practices.
Luckily, data science’s constituent fields are all connected by “mathy” elements, so any such synthesis isn’t completely impossible to imagine. As a matter of fact, the connections between computer science, numerical computation, and statistical modeling exist at quite a few levels; enough to guide the use of programming in statistics and statistics in programming (e.g., statistical models are used in the implementation of programming languages). More than a few of these connections haven’t been made, yet, or reached their potentials, and it’s possible that even the most beneficial are among those!
Setting the Stage for the Future of Cities
Programming for statistics and mathematical models isn’t all that great right now, but it’s improving. Deep Learning and tensor libraries (e.g. TensorFlow, Theano, Torch) are nice examples. They’re essentially a usable combination of pre-existing linear (tensor, really) algebra and optimization techniques. These are things that were—and still are—primarily done by experts in mathematics and statistics. Now, through careful packaging and application of the appropriate frameworks (e.g. symbolic computation), the same capabilities are widely available to enable breakthroughs in unrelated areas, such as local government.
While each element of the aforementioned technologies has existed for quite some time, putting them together so that non-mathematical developers can easily prototype models is what makes the difference. This isn’t the end of the story, though; these technologies have simply set the stage for a more developed form of synthesis between the theory and practice of mathematics, statistics and computer science!
In response to these recent developments, CityBase is also exploring—and actively developing—promising connections between computer science and mathematical modeling. For instance, we would like a means of bridging foundational AI-related work in a symbolic graph and pattern-based computation with the modern world of data science in Python. To do this, we’re assessing the capabilities of Hy: a Lisp language that compiles to Python.
These efforts are in line with our desire to design and implement with greater expressive potential, to easily create powerful domain-specific languages that invite effective ideas—and people—from their respective areas. We would like mathematicians to bring their abilities to govtech with as little restriction as possible. To do that, we need to be above the norm when it comes to building those language and concept bridges. In this way, bringing data science forward is an important step in the development of large-scale, transformative government technology, and is simply a part of the bigger picture.