An article written in collaboration with:
Pierre Souchay, Chief Technology Officer
Ana Pachon, Marketing and Communications manager
Climate change is one of the most significant challenges the world is facing today, and understanding its impacts is critical to develop effective strategies to address its effects. Parametric insurance and climate adaptation consulting products have emerged as valuable tools to assess climate risks. However, managing the vast amount of data required for this type of services can be challenging. Issues like scheduling and updating data, maintaining security and data separation, and handling complex computations are critical to the success of each climate risk assessment. In this article, we will discuss how these challenges can be addressed and leveraged.
The origin: parametric insurance
Parametric insurance is an innovative form of insurance that utilizes predefined parameters to determine the amount of reimbursement, rather than relying on the expertise of an insurance adjuster. These parameters, such as frost temperature, precipitation levels, or cyclonic risks, are established in advance and provide objective and quantifiable data to determine the amount of compensation.
As an example, a winemaker who experiences temperatures below 0°C for two consecutive days may receive a €30,000 payout, while a temperature of less than -5°C over a week may result in a €50,000 payout. This type of insurance offers more objective contract terms and a streamlined claim process. The use of weather stations, satellites, and scientific surveys to gather data makes parametric insurance particularly well-suited to address the complexities of climate change.
As we look to the future, we can expect events to become increasingly complex and challenging to predict. Although there are some simulating models available, they have a margin of error and some degree of uncertainty, even when considering just a one-year horizon. This is why we must explore multiple scenarios using different models in order to better prepare for what may come.
AXA Climate has developed specific tools which address individual risks such as frost, hail, cyclones, and droughts. We collected data from a variety of public providers such as ERA5 or CHIRPS, and then created a predefined list of dashboards, graphs, and statistics to help underwriters explore the richness of this carefully curated data.
The next step: understanding cause and effect
As the scope and complexity of climate risks have grown, there has been a growing need for more comprehensive risk assessments and consulting services. These services provide companies with better tools to understand the causes and effects of climate change, enabling them to take a more proactive approach to manage those risks.
By conducting climate risk assessments, companies can gain a deeper understanding of the potential risks they face and develop effective strategies to mitigate those risks and adapt their business models.
Thus, we continued to create more data tools capable of conducting or assisting this kind of service and soon realized the growing need for a centralized data store—a single location where all the data could be stored and easily reused across insurance or consulting projects.
The challenges of working with scientific and historic data
Working with the scientific and historic data needed for those projects required a lot of expertise in different file formats and mathematical concepts. To overcome this challenge, we collaborated with data scientists to develop pipelines that fed directly into our data store. Each new pipeline was like another book added to our shared library, further expanding our ability to leverage the data we had collected.
Scheduling and updating
Efficiently maintaining public data is critical to accurately predict climate change impacts, but managing large, frequently-updated datasets can be a significant challenge. Automation is therefore essential to achieve predictable scientific results and ensure data accuracy.
For example, some providers publish daily worldwide temperature and precipitation data, which would be impractical to retrieve manually every day. To streamline the process, we established automated data pipelines that update on a regular basis, whether daily, monthly, or yearly.
However, this process is not always straightforward. Irregular updates and data errors can create significant hurdles that require careful management. To overcome these challenges, we established strict error-handling protocols and implemented rigorous security measures to ensure that all data is properly updated and secure.
By prioritizing automation and data management best practices, we can efficiently manage and analyze large datasets, enabling us to develop more accurate predictions of climate change impacts.
Security and data separation
Data engineers face different security concerns depending on the type of data they are working with. Public external data typically doesn’t pose a significant security risk, whereas customer data requires strict security measures to protect sensitive information from potential breaches.
We thus ensure perfect isolation between customers in transition and at rest. This means that we carefully segregate customer data to prevent any mixing or unintentional exposure.
While public indicators can be combined with no security concerns, we take every precaution to avoid mixing customer data for security reasons. By maintaining strict data separation and security protocols, we can ensure the confidentiality and integrity of customer data.
Think Big, Start Small
We also faced the challenge of handling an immense amount of computations efficiently. Our top priority was speed, but even more crucial was the ability to scale effortlessly. With worldwide data, the sheer volume of information can be overwhelming, requiring substantial computing memory and resources.
To prepare for future scaling needs, we implemented simple yet effective optimizations. For instance, we grouped our customers’ locations based on geography, which helped minimize the amount of data generated, resulting in reduced costs and optimized computations.
In one particular scenario where we worked with satellite imagery and two points of interest, we identified the minimum data requirements for our computations. By calculating a bounding box that covered both points, we were able to exclude unnecessary data like the CDG airport tile.
Introducing Cocoon: graphs to solve challenges
To tackle scheduling, security and global scale challenges, we decided to call Graph Theory to the rescue in a product called Cocoon, in charge of computing all of our geo-spatial needs. Graphs allow for a centralized approach to solve all these issues at once.
Risk is computed from data, data is not coming from anywhere
Computing risk for our customers is about collecting data and applying various enhancement processes. With Cocoon, we can describe and represent how data is processed, and how the various indicators our customers are interested in (diamonds in the schema below) are built with a kind of genealogical tree.
The world is changing as we see it, let us deal with updates in data
With such a graph, changes in data will automatically update the results that depend on it. Thus, each change/update will propagate in the graph and the data used by the business will always be up-to-date in real-time.
Processing data in a secure way
One may wonder where and how our data engineering processing code enters into play within this graph representation.
At each level are programs, written in whatever programming language, that we call internally “compute units”. These are very small and isolated pieces of code, dedicated to a small and very specific task.
As we can see, each one of these compute units is reading data and writing new data.
From a security and automation perspective, however, this can look tricky. We do not want the code of compute unit 1, which is supposed to read data 1 and write data 2, to be able to read or write data 3.
This is where the graph comes to the rescue again. Since we know the exact relationship between each program and data through inputs and outputs, we can limit the set of permissions which a program is allowed to read or write to the strictly necessary.
Automating this becomes easy too. Nowadays, any Infrastructure as Code (we use Terraform) should be capable of reading this graph representation, and applying the strict necessary read-and-write access to each program in a fully automated fashion.
As a bonus, developers or data scientists do not need to worry about security and automation. They can focus on what matters most to them, which is producing code that adds value to the business.
Infrastructure becomes a detail which was also a key point in our decision to go for this graph-oriented solution when we started Cocoon.
This first article is meant to explain the foundations of how and why we built the AXA Climate data platform based on a graph-oriented strategy.
We tried with this first high-level article to set up a base for more technical and thorough articles which will come in the following months.
We wish to talk more about the architecture choices we made, about infrastructure, and some of the many challenges faced over the last few years as we developed Cocoon into an AXA Climate Data platform.