How we do the magic

Continuous optimization

Often, the final deliverable of many of our projects is to develop optimization tools and interfaces, which support policy-makers and day-to-day operators. Thus, we have learned that it is not enough to master the programming tools and the mathematical apparatus, but to understand the domain problem and represent it in computational terms.

Our optimization approach has two main pillars, which work in tandem and are inseparable from each other.

The first pillar is our strong belief that developing optimization instruments without data systems is meaningless. Our optimizations are fueled by continuous data pipelines and machine learning models which enrich the input data and provide the algorithm with all necessary real-world information, which makes it not simply a mathematical abstraction, but an useful instrument.

The second pillar is what we call continuous optimizations, where a specific solution is proposed by our algorithms, usually based on historical data, and then this solution is evaluated in real time by various KPIs, visualised in easy-to-understand dashboards. Thus, you can't just provide an optimization tool and hope for the best, first building the algorithm based on data and evaluating its impact in the real world. The first iteration of an algorithm may not yield ideal results, but if this principle is sustained, over time it will become better and better.

As a company, we specialise in both standard optimization techniques, specifically aimed at solving vehicle routing problems, and in genetic algorithms, used in broadly defined constraint spaces, where the best solution is not always easy to untangle with simplified mathematical techniques.

Our waste management optimization systems are capable of predicting the best collection time of each container, and routing the trucks by taking into account real-world information, such as potential u-turn spots and expected noise and traffic disturbance. Our public transit optimization systems can find the best possible schedule and topology of a specific transit line, given fitness functions such as the expected crowding level, the actual cost, the environmental footprint, the frequency of the line, and mobility patterns of citizens, by taking into account various operational constraints.

Geo-spatial analytics

The history of our company is closely connected to the field of urban analytics, which assumes extensive work with diverse geo-spatial data, especially data connected to citizen or vehicle mobility. Geo-spatial analysis is messy, where data imputations and various procedures for data cleaning are usually applied. We have learned how to deal with such data and have developed an extensive set of tools, based on various unsupervised machine-learning and statistical inference methods.

Examples of our work include techniques which estimate the air pollution hot zones, based on citizen science and official sensor data, techniques which infer the disposed waste from a specific neighbourhood, street, or even building, and methods to infer origin-destination information from simple smart card and EMV card taps.

One of our crowning achievements is our mastery of cell phone position data to understand urban mobility patterns. Through our partnership with telecommunication companies we have been able, for example, to estimate the entire mobility flow in cities, the most likely modality of each trip, and the distribution of origins and destinations. We have used these techniques to various ends, but in the context of public transit design, our methods have been applied to problems such as stop location selection, genetic line topology optimizations, on-demand mobility routing, and even full-blown agent-based simulations.

Any analysis is meaningless if it cannot be understood by our clients. Luckily, we love to create beautiful maps and dashboards which represent our findings in an easy to grasp way not only to the specialist, but to the layperson as well.

Machine learning & MLOps

From bleeding edge deep learning models to time series forecasting, we have handled multiple machine learning use cases, and have successfully applied them to our projects and systems.

Our portfolio of use cases include, for example, a full-blown deep learning people counting system, which collects data from thousands of cameras and estimates the occupancy levels at any given time of all vehicles in Sofia’s public transit network. Various forecasting models predict the traffic levels in the city and help produce more accurate timetable calculations and fair evaluations of the root cause for schedule delay events. In the realm of waste management, we have developed techniques which, for example, automatically capture low-level sensor data outliers, or predict the perfect time to collect a specific container or a point of containers.

Of course, all this is not possible without understanding how to create solid data engineering pipelines, both in real time and in batches, which follow modern CI/CD practices.

We do not believe that the process of model development is a one time effort. Our approach aims to automate the entire life cycle of a machine-learning project, from tracking the initial data science experiments, to managing the features and models in production, to finally monitoring drift of the model, and automatically or semi-automatically triggering the cycle again and again.

Event-driven business systems

One thing we learned over the years is that you cannot have solid data science results without understanding the engineering practices behind the development and management of highly-scalable data-intensive systems.

As a company, we govern three systems, developed with modern software engineering practices: the open-loop ticketing system and the mobility analytics system, which were successfully deployed in Sofia, as well as the waste management system, currently deployed in Sofia, Montana, Veliko Tarnovo, Albacete, Burgas, and Vidin.

Some of these systems are used by thousands of users every day, and process financial information as well. Thus, they are developed with production standards in mind. We are excited about extreme observability and monitoring of performance data (metrics, logs, etc) in order to understand the internal state of the complex systems we manage. Furthermore, we utilise the CI/CD philosophy and the DevOps principles to keep our deployment pipelines smooth and bridge the gap between the development of the system and the management of its platform and infrastructure.

Our architecture philosophy is based on evolutionary principles. We aim to create the correct architectural constraints through an appropriate selection of technologies and design patterns, and to minimise the communication load by developing asynchronous event-driven systems, usually centered around an event-streaming platform, such as Apache Kafka. The topology of our micro- and nano-services evolves naturally, based on client or business-driven requirements, thus we are able to escape the curse of the waterfall planning method.

Over the years, we gained a deep expertise in modern cloud-native technologies, such as Apache Kafka, Apache Cassandra, Kubernetes, as well as data pipelining tools, such as Kafka Streams and Apache Airflow. We have experience at running our systems not only on the cloud but also on in-house infrastructure, constructed and maintained by our team.