Evan Savage // Blog Archive

Self-Tracking for Panic: Another Dataset

In this post, I perform the same analyses presented in my last post using data from my second panic tracking period. I then test whether my average alcohol and sugar consumption changed measurably between the two tracking periods.

During the second tracking period, I gathered data using qs-counters, a simple utility I built for reducing friction in the recording process.

The Usual Suspects #

Linear Regression #

During the second tracking period, alcohol consumption remained relatively constant:

Alcohol Consumption

Sugar consumption is a different story, with a pronounced negative trend:

Sugar Consumption

The evidence to suggest that my alcohol and sugar consumption are linked is also much stronger now:

Alcohol vs. Sugar Consumption

On the other hand, the previous-day alcohol effect seems to be non-existent:

Alcohol: Today vs. Yesterday

Fast Fourier Transform #

With more data points, the FFT frequency amplitude plot is more muddled:

FFT Frequencies

The 2-day and 7-day effects previously "discovered" are nowhere to be found.

Maximum Entropy Modelling #

I didn't record panic attacks during this tracking period. My previous efforts reduced the severity and frequency of these attacks drastically, enough so that the data here would have been extremely sparse.

In the absence of that data, I asked a different question:

What features best predict my exercise patterns?

Here are the top features from MaxentClassifier:

   3.369 caffeine==True and label is 'no-exercise'
-0.739 sweets==True and label is 'exercise'
0.399 sweets==True and label is 'no-exercise'
-0.201 alcohol==True and label is 'exercise'
0.166 alcohol==True and label is 'no-exercise'
0.161 relaxation==True and label is 'exercise'
-0.092 relaxation==True and label is 'no-exercise'

The caffeine finding is misleading. On one of the two days where I entered non-zero caffeine consumption, that was due to a mistake in data entry. (Side note to self: all tools should include an undo feature!) Aside from that, sugar consumption appears to have the strongest negative effect on exercise.

Student's t-test #

What? #

Student's t-test answers this question:

Are these samples significantly different?

More formally, the t-test answers a statistical question about normal distributions: given $ X \sim \mathcal{N}(\mu_X, \sigma_X^2) $ and $ Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2) $, does $ \mu_X = \mu_Y $?

If we let $ Y $ be a known normal distribution centered at rather than taking it from an empirical sample, we also obtain a one-sample t-test for the null hypothesis $ \mu_X = \mu_Y $.

Why? #

In a self-tracking context, you might ask the following questions:

Student's t-test can help address both questions.

The Data #

Before using Student's t-test on my alcohol and sugar consumption data from the two tracking periods, I check whether these samples have a roughly normal distribution. The code for normality checking is here.

It helps to visualize the histogram data first:

Alcohol Histogram (recovery-journal) Alcohol Histogram (qs-counters) Sugar Histogram (recovery-journal) Sugar Histogram (qs-counters)

These don't look particularly close to normal distributions, but it's hard to
tell with discrete-valued data. For more evidence, I use the
Shapiro-Wilk statistical normality test:

alcohol, recovery-journal:  (0.944088339805603, 0.10709714889526367)
alcohol, qs-counters: (0.8849299550056458, 4.6033787270971516e-07)
sugar, recovery-journal: (0.722859263420105, 2.5730114430189133e-06)
sugar, qs-counters: (0.8092769384384155, 8.38931979441071e-10)

The null hypothesis for Shapiro-Wilk is that the sample is normally distributed, so these low p-values indicate the opposite: my data isn't normally distributed! Bad news for my attempt to use Student's t-test here.

Nevertheless, I'll barge ahead and run the t-test anyways, just to see what that process looks like with scipy.stats. The code for t-testing is here.

alcohol
==========
avg(A) = 3.26
avg(B) = 2.35
(t, p) = (2.0721, 0.0469)

sweets
==========
avg(A) = 1.19
avg(B) = 1.23
(t, p) = (-0.1969, 0.8453)

If the t-test were useful for this data, it would show that my alcohol consumption was significantly lower during the second tracking period. With such a large drop in average consumption, I'm willing to say that this is a reasonable assertion.

A Question Of Motivation #

By this point, you might be asking:

Why did I even bother with all this analysis when I have so few data points?

Good question! The short answer? It's a learning opportunity. The longer answer is backed by a chain of reasoning:

As it turns out, data analysis is hard, period. Picking the right tools is difficult, and picking the wrong ones (like the t-test above!) can easily produce results that appear to be meaningful but are not. In a self-tracking scenario, this problem is often made worse by smaller datasets and uncontrolled conditions.

Thought Experiments #

Repeat Yourself: A Reflection On Self-Tracking And Science #

One criticism often launched at the Quantified Self community is that self-tracking is not scientific enough. For an interesting discussion of the merits and drawbacks of presenting self-experimentation as science, I highly recommend the Open Peer Commentary section of this paper. Some of the broader themes in this debate are also summarized here on the Quantified Self website.

To be fair, there are a host of valid concerns here. For starters, it's very difficult to impose a controlled environment when self-tracking. In a Heisenbergian twist, being mindful of your behavior could modify the behavior you're trying to measure; this effect is discussed briefly by Simon Moore and Joselyn Sellen.

Additionally, a sample population of one is meaningless. Will your approaches work for others? Did you gather the data in a consistent manner? Are your sensors working properly? The usual antidote is to increase the sample population, but then you have another set of problems. Are all participants using the same sensors in the same way? Are they all running the same analyses?

From watching several presentations about self-tracking, there is a curious pattern:

Like any other habit, the tracking habit is hard to maintain.

As a corollary, many tracking experiments consist of multiple tracking periods, these punctuated by relapses of the tracking habit.

Many people interpret these relapses as failures, but they're actually amazing scientific opportunities! These are chances to re-run the same experiment, verifying or confounding results from your earlier tracking periods.

The Predictive Modelling Game #

Predictive modelling could be an interesting component of a habit modification system. Suppose I want to exercise more regularly. First, I select several features that seem likely to influence my exercise patterns, such as:

Next, I gather some baseline data by tracking these features along with my exercise patterns. I then use that data to train a classifier. Finally, I keep tracking the features, ask the classifier to predict my exercise activity, and play a simple game with myself:

Can I beat the classifier?

That is, can I exercise more often than my existing behavior patterns suggest I should?