A few months ago I talked about extracting data from a MongoDB database for the purposes of generating alerts. Since then I have taken it further and now generate regular reports of my data using the power of Flow, Excel, and OneDrive. As this may be useful to others running NightScout I thought I would share my set up and the discoveries along the way.
First of all, I need to extract the data from the MongoDB and sent it to a target Excel sheet. To do this we use Flow.
I have set the recurrence to three hours. This strikes a balance between not running too often and blowing my Flow quota, and running sufficiently often to give timely results. At every three hours, we run approximately 240 times a month, which works well with our limit of 750 Flows per month.
The variable stores the latest DateTime value from our target Excel file.
To populate this variable, we query our target Excel and set the value.
In this screenshot we see that we return only one row from Excel, being the row with the highest DATE value. We then use this to set the variable.
Once we have this DateTime value we incorporate it into a modified version of the API call we used in the Alert blog.
For this call we bring back 100 entries from the MongoDB, a bunch of fields and order it so that if there are more than 100 rows available from the Latest Date from our target Excel, then only the rows immediately after this DateTime are returned. This ensures the query does not mess with the row order when it transfers them to Excel.
My continuous glucose monitor (CGM) feeds a value to the MongoDB every five minutes which means it generates 180/5 = 36 entries every three hours. Therefore 100 is a good setting to keep on top of the additional values generated in MongoDB but sufficiently large that it will be able to catch up if there is a temporary issue with the running of Flow.
Once the reply is parsed, we can populate our Excel with the new rows.
One point of note here is that the Flow step requires a Table within the Excel workbook. This is relatively easy to set up. Basically, you add your headers to the sheet, highlight them and select Format as Table from the Styles section of the Home tab.
The result looks something like this.
The DATE value is an integer representing the DateTime value but is a little difficult to read or transform so we also record the DATESTRING which is a little friendlier. Then we have the SGV value which is the blood glucose level in units only the USA use and finally we have the DELTA which is the change in SGV value between reads.
Once we have captured our data, we can begin reporting on it.
I discovered relatively quickly that Flow has a size limit for the Excel files it will work with. In the free plan this size limit is 5Mb, which makes it impractical for our purpose. Luckily I had a paid Flow plan via my Office subscription so I moved to this. This plan allowed me to work with Excel files up to 25Mb in size. This worked well. My Excel file has approximately four months of data in it and is 1.6Mb in size. Therefore, I have around five years of data to go before Flow reaches its limit. In five years either Microsoft will have removed this silly limit, I will be using a different technology to analyse my data or they will have found a cure for Type 1 Diabetes (there is a running joke in the diabetes community that the medical professionals have been promising a cure within five years for decades now).
The other trick I did to minimise the size of my target Excel was to house the reporting in a separate file and use a Power Query to reference back to the target file for the data. Using this Power Query, and some Excel formulae to manipulate the data to make it friendlier for reporting, I got this for my first worksheet.
If you struggle to replicate any of my formulae, please leave a comment and I will reply with the details.
The HbA1c is an indicator of how ‘sugary’ your blood has been for roughly the last four months. Using our CGM data we can make a prediction of what our HbA1c value is.
There are a few formulae available to do this calculation and in the above I use three of them. In the case of my blood results, the models predict 5.3, 5.1, and 5.1 which is well below the target threshold of 6.5 so well done me. I expect this value to slowly increase over time as my pancreas becomes less able to lower my blood sugar levels.
The Distance Report is something that can only really be generated using CGM data with a regular time interval between measurements (in our case every five minutes). The Distance Report shows the total ‘distance’ travelled by the blood values i.e. the sum of the absolute delta values and is an alternative measure to the standard deviation.
For this report we only have data for the last four months as this is how long I have been using a CGM. We can see that the distance travelled each month is roughly the same. As time goes on we would expect this to increase as the pancreas becomes weaker and blood glucose levels (BGLs) start to vary more.
This was the first report I created and reviews literally all my BGL measures (around 600 manual finger pricks and then the CGM data).
In the top left we have literally every value recorded and when it was recorded. The CGM data can be seen as the ‘thickening’ of the values towards the right hand side of this graph.
In the top right we have the distribution graph for the data showing the spread of results.
The bottom left shows all the data points but strips out the Date value, leaving only the Time value. This has the effect of showing the data over a 24 hour period.
Finally, in the bottom right, we have a range of filters to assist with analysing the data.
For example, if we compare the distribution curves for 2017:
we see that our distribution curves are centred around 5.4, 5.5, and 6.0 respectively. In other words it appears the curve is moving to the right over time. This is consistent with a weakening pancreas (or me being more relaxed about carbs).
The Range Report looks at the average and standard deviation of the data per hour, looking for where in the day the BGL values are highest and vary the most.
The graphs are relatively flat with a slight increase towards the end of the day. This is likely the result of dinner (generally the largest and most variable meal of the day and therefore the meal with the most impact on glucose levels) and late night snacking (which will never have a positive effect on BGLs). Again we have a filter, in this case a timeline, to help with our analysis.
The Distribution Report does a similar analysis as the Range Report but per month, rather than per hour.
The trendlines suggest the numbers are relatively flat (average BGL around 6 with a standard deviation of 1). It is expected both of these will increase over time and the BGL average and variability increase.
Displaying the Data to the Health Team
With the Excel files sitting in OneDrive, you simply right click the file to generate a link for sharing a read-only version for health care professionals. In my case I use bit.ly to also make it friendlier. While it is a little twitchy, it is reasonably friendly across various form factors and browsers.
Flow opens up a raft of opportunities for using my data whether it be alerts, analysis to maintain my health or making it readily available to my health care team. A few years ago this kind of set up would have taken weeks of coding, if it was possible at all. Today, it requires zero code and costs almost nothing. If this kind of set up could help you or someone you know, have a tinker, it really is straightforward to set up.