Intro

I’ve been itching to try some of the various GPTs that are available at OpenAI. Data Analyst GPT caught my attention. It’s right down main street for me since I have done data science/machine learning work for most of my career. To give it a fair test, I wanted to use a real-life dataset: hourly readings from our Radon Eye radon detector. It’s been monitoring the radon level in our home for over 9 months. I’ve previously done some of my own stats and charting of this data, so I’m familiar with it.

The Collab below is a very lightly-edited transcript of my dialog with Data Analyst GPT (“Dag” for short), complete with the charts she produced. If you’ve got a dataset burning a hole in your pocket — and especially if you’re not a data scientist but you want to gain some insight quickly — buckle up. You might be surprised at how capable Dag really is. And if you want to follow along, pick up the radon data from the GitHub repo for this article.

To keep things readable, I’ve formatted the conversation below using avatars, block quotes and horizontal separators. The blue avatar is me (Tim), and the violet avatar is Dag. Each block quote (vertical sidebar) brackets the dialog for each writer: prompts by me followed by responses from Dag. Horizontal lines serve to separate each prompt-response pair. Finally, here and there you’ll see my blue avatar pop up with a comment that wasn’t part of a prompt. You’ll figure it all out once you start reading. Let’s get rolling and learn how the radon levels in our home have changed over time.

Collab between Tim and Data Analyst GPT

Tim Tim: Attached is a dataset captured from a little instrument that measures radon in our house. Can you give me an analysis of it?

Data analyst Janet Dag:

Radon Concentration Over Time

Here’s an overview of your radon data collected from the “Radon Eye” instrument:

Summary Statistics:

  • Total Data Points (Hours): 6,809
  • Mean Radon Level: ~2.52 pCi/L
  • Minimum: 0.00 pCi/L
  • Maximum: 10.43 pCi/L
  • 25th Percentile (Q1): 1.68 pCi/L
  • Median (Q2): 2.32 pCi/L
  • 75th Percentile (Q3): 3.19 pCi/L
  • Standard Deviation: ~1.26 pCi/L

Key Observations:

  • The radon levels are well below the U.S. EPA action level of 4.0 pCi/L most of the time.
  • Spikes above 4.0 pCi/L are visible and may warrant ventilation or further investigation.
  • Distribution is right-skewed — while many values are low, some higher outliers pull the average up.

Would you like a deeper dive into:

  • Time periods with levels above the EPA action level?
  • Seasonal or daily trends?
  • Histograms or rolling averages?

Let me know how detailed you’d like to go.

Tim Tim note: There’s a lot of variability here. Makes me wonder if those one-shot, 72-hour home tests really tell you anything useful.



Tim Tim: Please show me some histograms and rolling averages.

Data analyst Janet Dag:

Histogram of Radon Levels

Here are the visual analyses you requested:

1. Histogram of Radon Levels

  • Most radon measurements fall between 1.0 and 4.0 pCi/L, indicating generally safe indoor air conditions.
  • A few outliers go beyond 6.0 pCi/L, showing occasional elevated levels that might require further ventilation.

Tim Tim note: I omitted the rolling averages chart because it was impossible to see the averages against the large number of data points.



Tim Tim: Can you show me the rolling averages for a shorter time span, say, the last 240 records (10 days)?

Data analyst Janet Dag:

Radon Levels with Rolling Averages (Last 240 Hours)

Here is the detailed view of radon levels for the last 240 hours (10 days):

  • Hourly readings are shown with light transparency to reveal fluctuations.
  • The 3-hour rolling average smooths very short-term changes.
  • The 24-hour rolling average highlights day-to-day trends and is useful for identifying sustained high radon levels.
  • A red dashed line marks the EPA action level of 4.0 pCi/L.

As shown, there are periods where radon levels hover near or exceed the EPA threshold, particularly in the 24-hour average. This could indicate environmental or household factors (e.g., ventilation, weather changes) influencing radon concentrations.

Would you like help analyzing these peaks in more detail or identifying contributing patterns?

Tim Tim note: My eyes see some periodic behavior — a peak then a valley, then another peak etc. 10 peaks in 10 days.



Tim Tim: I’m curious if there is anything we might learn from a time series analysis, say to identify autocorrelations1 at x number of hours (say 24, or 6 or the like).

Data analyst Janet Dag:

Autocorrelation of Radon Levels (Last 1000 Hours)

The autocorrelation plot of the last 1000 hours reveals the following:

Key Observations:

  • Lag 1–3: Strong positive autocorrelation, meaning recent radon values are highly predictive of the next few hours.
  • Lag ~6: A smaller but still noticeable correlation, suggesting some intra-day influence.
  • Lag ~24: Slight autocorrelation, which might indicate a daily cycle—possibly tied to heating, ventilation, or other environmental changes across a day.

Interpretation:

  • These autocorrelations suggest that radon levels tend to follow short-term trends and may have mild daily cyclical behavior.
  • The persistence in values over short lags is common in environmental time series, especially indoors where air doesn’t change rapidly unless influenced (e.g., by opening windows or HVAC changes).

Would you like to go further and build a predictive time series model (e.g., ARIMA), or test for seasonality over longer periods?

Tim Tim note: Maybe we can actually ‘see’ the periodic behavior on a time plot over the course of a few days.



Tim Tim: Could you show me the last 240 hours (10 days) where each day (24 hour period) is a different line on the chart?

Data analyst Janet Dag:

Radon Levels By Hour For Last 10 Days

Here’s the chart showing hourly radon levels for each of the last 10 days, with each line representing a separate day:

What This Reveals:

  • You can visually compare daily patterns, including:
    • Peaks or dips at consistent times (e.g., early morning or late evening).
    • Days with overall higher or lower radon concentrations.
    • Any repeating shapes that might hint at routine environmental influences (like heating/cooling cycles or window use).

Let me know if you’d like to:

  • Highlight the day with the highest peak.
  • Overlay average or min/max bounds.
  • Break this down further (e.g., weekdays vs. weekends, if timestamps are available).

Tim Tim note: There might be a valley-peak-valley here if you look hard enough, but the visual is pretty noisy.



Tim Tim: What weather factors (or other environmental factors) would help to understand the variation in radon levels?

Data analyst Janet Dag:

Great question — radon levels in indoor environments are influenced by a range of weather and environmental factors, primarily because radon gas enters from the soil and is affected by air pressure, temperature, and ventilation. Here’s a breakdown:


🌀 Weather & Environmental Factors That Influence Radon Levels:

1. Barometric Pressure (Air Pressure)

  • Lower outdoor air pressure can cause radon to be sucked into the house, due to the pressure differential between the soil and indoor air.
  • During storms or weather fronts, pressure can drop and increase radon entry.

2. Temperature (Especially Indoor vs. Outdoor)

  • Colder outdoor temperatures and warmer indoor air in winter can create a stack effect — warm air rises and escapes from the top of the house, pulling radon-laden soil gas into the basement or lower levels.
  • Extreme heat can also impact ventilation habits (windows closed, AC on), influencing accumulation.

3. Precipitation and Ground Saturation

  • Heavy rain or snowmelt saturates the soil, which can force radon gas upward into homes rather than letting it disperse naturally.
  • Often causes short-term spikes in radon levels.

4. Wind Speed and Direction

  • Wind can impact air pressure dynamics around the home. Depending on ventilation and leakage, this can pull air (and radon) into or out of the house.

5. Humidity

  • High humidity alone doesn’t directly influence radon, but it often accompanies pressure changes or closed-window behavior, indirectly affecting levels.

  • Ventilation: Homes sealed tightly (especially in winter) trap more radon indoors.
  • HVAC Systems: Can influence airflow patterns that either increase or reduce radon entry.
  • Foundation Cracks, Sump Pumps, Crawlspaces: Provide pathways for radon to enter.
  • Basement Usage: Radon concentration is often highest in the lowest livable level.

🔍 What You Can Do with This Knowledge:

  • Overlay radon data with weather data (barometric pressure, temperature, rainfall) to look for correlations.
  • If you’re collecting time-series radon data over months, you can potentially identify seasonal trends.
  • Ventilation strategies can be tested and monitored for effect (e.g., does opening basement windows lower levels significantly?).

Would you like help incorporating local weather data into your time series to explore this further?

Wrap-up and Next Steps

On the whole, I’m fairly impressed by Data Analyst GPT (Dag). With minimal prompting from me, she did sensible analysis to start: a time chart of all the data, standard stats (mean, median, etc.), and a histogram. She nailed the other charts I requested, again with limited instructions. Finally, I liked the weather and other factors that Dag listed — as well as her explanations of the relationships and their directions. For example, heavy rainfall will cause spikes in radon levels. I’ve seen that myself.

Next steps:

  • Add timestamps to the radon data. Oddly, the instrument doesn’t do that itself. But I think I might be able to pull it off since I manually download the data daily, and I note the date and time when I do. It’s just going to be tedious.
  • Combine the radon measurements with local weather data (temp, humidity, pressure, rain, wind, etc.) to get a quantitative understanding of the impacts and relationships. Look for a future collab.

If you’ve got your own home sensor data — air quality, energy use, weather, etc. — give Dag a try. You might be surprised how fast you can connect the dots. Drop me a note and let me know how it goes.


  1. Autocorrelation measures the relationship between a signal and itself over time. It answers the question: “How much is the value now like the value a little while ago?” As shown in the chart, autocorrelation can help identify periodic behavior in a time series. ↩︎