Using Dataiku to Understand Weather Across Different Parts of Singapore

Use Cases & Projects, Dataiku Product Hui Xiang Chua

What? Yes, you read that right! Singapore is such a tiny island, you may wonder just how big the differences in weather can get. Join us to find out! Given the recent “prolonged heavy rain” in August, with multiple locations in Singapore encountering flash floods, we are also curious to find out how the weather has changed across time for different parts of the city. 

In this analysis, we will use Dataiku to find the answers to three questions:

  1. Do different parts of Singapore experience different temperatures?
  2. Do different parts of Singapore experience different amounts of rainfall?
  3. Are rainfall and wind speed correlated with temperatures?

The data on historical daily records between Jan. 1, 2019 and July 31, 2021 was gathered from Meteorological Service Singapore’s website for five randomly selected locations in different parts of Singapore, namely Ang Mo Kio, Changi, Clementi, Jurong West, and Newton (Figure 1) for the following available metrics: 

  • Daily Rainfall Total (mm)
  • Highest 30-Min Rainfall (mm)
  • Highest 60-Min Rainfall (mm)
  • Highest 120-Min Rainfall (mm)
  • Mean Temperature (°C)
  • Maximum Temperature (°C)
  • Minimum Temperature (°C)
  • Mean Wind Speed (km/h)
  • Max Wind Speed (km/h)

Figure 1. The five randomly selected locations in different parts of Singapore.

Figure 1. The five randomly selected locations in different parts of Singapore

For our first question from above, we see that Jurong West typically sees lower temperatures compared to other locations, while Ang Mo Kio and Changi typically see higher temperatures (Figure 2). It is interesting to note that the difference in average monthly temperatures between locations can be as great as 1.0°C (i.e., 33.8°F)!

Average of mean daily temperatures for different locations by month between January 2019 and July 2021

Figure 2. Average of mean daily temperatures for different locations by month between January 2019 and July 2021. Note: The y-axis does not start from zero to emphasize the differences as small differences are also critical.

For our second question, we discovered that the trends for monthly average rainfall in the different locations are rather erratic, where Changi experienced the lowest amount of rainfall on average for the majority of the time but suddenly had two months with higher amounts of rainfall than the rest. 

This is also the same for Ang Mo Kio, where there were suddenly two months in the analyzed time period that showed the highest amount of rainfall among the locations. However, if we focus on yearly averages, Changi does see the lowest amounts of rainfall and Jurong West sees the highest amounts of rainfall. Also, the amounts of rainfall increase across the year for all locations, with the gap in differences across locations narrowing (Figure 3). 

Figure 3. Average of total daily rainfall for different locations for each year between January 2019 and July 2021. 

Figure 3. Average of total daily rainfall for different locations for each year between January 2019 and July 2021. 

With the available data on rainfall and wind speed, we want to learn if these variables are closely correlated with temperatures. The correlation matrix below (Figure 4) shows that they are actually not very strongly correlated. This could maybe explain why we still feel warm in Singapore even though it might have rained that day — thanks to the humidity!

Figure 4. Correlation matrix for the various metrics

Figure 4. Correlation matrix for the various metrics

And of course, climate change is real. Let’s do our part for sustainability efforts!

For those who are interested in the technical aspect of the project in Dataiku, here are some details: 

1. We made use of the Jupyter notebook interface for scraping the data and saving the datasets into Dataiku (Figure 5). Each dataset contains daily data for a month of the year for a particular location.
Python code for scraping the data off Meteorological Service Singapore’s websitePython code for scraping the data off Meteorological Service Singapore’s website

Figure 5. Python code for scraping the data off Meteorological Service Singapore’s website

Here’s an example of a dataset (Figure 6).

Figure 6. Scraped data for Changi for the month of January 2019

Figure 6. Scraped data for Changi for the month of January 2019

2. Next, we use visual recipes such as Stack to append the datasets for the same location together, Join to merge all the different locations together, Prepare for the data cleaning (Figure 7), and the statistics and charts features within the datasets for the analysis. 

Figure 7. Data cleaning and preparation using the Prepare recipe.

Figure 7. Data cleaning and preparation using the Prepare recipe

3. We also tried to perform predictions on whether it will rain the next day or not using in-built machine learning algorithms (Figure 8) as well as forecasting using the time series plugin. This shows how different user profiles can work together — where data engineers in charge of data pipelines and data analysts and data scientists in charge of analysis can collaborate at the same time in one platform.

Figure 8. Predicting whether it will rain or not on a particular day using in-built machine learning algorithms.

Figure 8. Predicting whether it will rain or not on a particular day using in-built machine learning algorithms

Special thanks to Matthieu Scordia for his support and guidance on this analysis.

You May Also Like

Alteryx to Dataiku: Working With Datasets

Read More

Demystifying Multimodal LLMs

Read More

I Have AWS, Why Do I Need Dataiku?

Read More

Why Data Quality Matters in the Age of Generative AI

Read More