.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/mpe/run_mpe_02_o3mda8.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_mpe_run_mpe_02_o3mda8.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_mpe_run_mpe_02_o3mda8.py:


Compare Model to Daily MDA8 Observations
========================================

Evaluate CAMx by comparing to AirNow or AQS Ozone observations using the
2015 Ozone NAAQS definition of the maximum daily 8-hour average metric.

The basic steps are:

1. Open CAMx file and identify space/time domain,
2. Download hourly observations for that domain to a CSV file.
3. Add CAMx hourly model data to the observations CSV file.
4. Apply Maximum Daily 8-hour average (Appendix U).
5. Calculate statistics.
6. Plot a time series

*Reminder*: You must have already activated your python environment.

.. GENERATED FROM PYTHON SOURCE LINES 21-23

Configuration
'''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 23-40

.. code-block:: Python


    # Define Analysis
    obssrc = 'airnow'  # or aqs
    obsspc = 'ozone'     # or ozone, co, pm25, ...
    modsrc = 'CAMx'    # Or CMAQ
    modspc = 'O3'     # or O3, CO, PM25, ...

    # Set input files
    dates = ['20160610', '20160611']
    avrgtmpl = '../../camx/outputs/CAMx.v7.32.36.12.{}.avrg.grd02.nc'  # placeholder {} for date

    # Outputs
    outstem = f'outputs/{obssrc}.{obsspc}_MDA8_and_CAMx.v7.32.36.12.avrg.grd02'
    pairedpath = outstem + '.csv'
    statspath = outstem + '_stats.csv'
    figpath = outstem + '_ts.png'


.. GENERATED FROM PYTHON SOURCE LINES 41-43

Imports and Folders
'''''''''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 43-51

.. code-block:: Python


    import pyrsig
    import pandas as pd
    import matplotlib.pyplot as plt
    import os

    os.makedirs('outputs', exist_ok=True)


.. GENERATED FROM PYTHON SOURCE LINES 52-54

Query Observations for Model file
'''''''''''''''''''''''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 54-69

.. code-block:: Python


    obskey = f'{obssrc}.{obsspc}'  # must exist in RSIG
    modkey = f'{modsrc}{modspc}'
    keys = [obskey, modkey]
    dfs = []
    for datestr in ['20160610', '20160611']:
        ncpath = f'../../camx/outputs/CAMx.v7.32.36.12.{datestr}.avrg.grd02.nc'
        ds = pyrsig.open_ioapi(ncpath)
        df = pyrsig.cmaq.pair_rsigcmaq(ds, modspc, obskey, prefix=modsrc, workdir='outputs')
        df[modkey] *= 1000
        df.rename(columns={obsspc: obskey}, inplace=True)
        dfs.append(df)

    df = pd.concat(dfs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Using cached: outputs/airnow.ozone_2016-06-10T000000Z_2016-06-10T235959Z.csv.gz
    Using cached: outputs/airnow.ozone_2016-06-11T000000Z_2016-06-11T235959Z.csv.gz


.. GENERATED FROM PYTHON SOURCE LINES 70-72

Calculate Hourly Statistics
'''''''''''''''''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 72-78

.. code-block:: Python

    statsdf = pyrsig.utils.quickstats(df[keys], obskey)
    # Save stats to disk
    statsdf.to_csv(statspath)
    print(statsdf)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           airnow.ozone        CAMxO3
    count  29892.000000  29892.000000
    mean      38.099505     41.653587
    std       19.807133     12.087435
    min        0.000000      0.301045
    25%       22.665580     33.589913
    50%       38.000000     42.133999
    75%       53.100000     49.821673
    max      115.000000    110.212502
    r          1.000000      0.698374
    mb         0.000000      3.554083
    nmb        0.000000      0.093284
    fmb        0.000000      0.089127
    ioa        1.000000      0.772703


.. GENERATED FROM PYTHON SOURCE LINES 79-87

40 CFR Appendix U to Part 50
''''''''''''''''''''''''''''
- 3(a) hourly data in LST truncated to ppb precision
- 3(b) minimum of 6 valid hourly values for an 8h average
- 3(c) 17 possible valid 8h LST windows per day (07-15 LST, ..., 23-07 LST)
- 3(d) a daily maximum is valid if 13 of 17 possible windows available

https://www.ecfr.gov/current/title-40/chapter-I/subchapter-C/part-50/

.. GENERATED FROM PYTHON SOURCE LINES 89-94

Step 3(a): Adjust Times to LST
//////////////////////////////
- Get timezone from latitude
- Convert tz-aware UTC to tz-naive LST
- not truncating

.. GENERATED FROM PYTHON SOURCE LINES 94-99

.. code-block:: Python


    tz = (df.groupby('STATION')['LONGITUDE'].mean() / 15).round(0)
    tz = pd.to_timedelta(tz, unit='h')  # EST=-5h, ... PST=-8h
    df['time_lst'] = df['time'].dt.tz_convert(None) + tz.loc[df.STATION].values


.. GENERATED FROM PYTHON SOURCE LINES 100-105

Step 3(b): Calculate 8h averages
////////////////////////////////
for each site:
1. regularize to hourly data with missing nans
2. create 8h averages only valid when 6 or more available

.. GENERATED FROM PYTHON SOURCE LINES 105-113

.. code-block:: Python


    a8df = df[['STATION', 'time_lst', obskey, modkey]].groupby('STATION').apply(
        lambda sdf:
        sdf.set_index('time_lst')[[obskey, modkey]].asfreq('1h')
        .rolling('8h', min_periods=6).mean(),
        include_groups=True
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /work/ROMO/users/bhenders/SESARMTraining/examples/mpe/run_mpe_02_o3mda8.py:106: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
      a8df = df[['STATION', 'time_lst', obskey, modkey]].groupby('STATION').apply(


.. GENERATED FROM PYTHON SOURCE LINES 114-119

Step 3(c): Calculate 8h averages
////////////////////////////////
Pandas labels 8h means based on start time in LST
1. regularize to hourly data with missing nans
2. remove 00-06 LST hours that duplicate previous day's hours

.. GENERATED FROM PYTHON SOURCE LINES 119-133

.. code-block:: Python


    # pandas labels the first 8h average (hours 00-07) as 07, so decrement
    # the start_time_lst hour by 7h to match starting time
    iddf = a8df.index.to_frame()
    iddf['start_time_lst'] = iddf['time_lst'] + pd.to_timedelta('-7h')
    a8df.index = pd.MultiIndex.from_frame(iddf[['STATION', 'start_time_lst']])
    # Valid if 13 of 17 possible valid windows (07-15 LST, ..., 23-07 LST)
    # Remove the 8h averages starting 00-06 LST
    n = a8df.shape[0]
    a8df.query('start_time_lst.dt.hour > 6', inplace=True)
    nnew = a8df.shape[0]
    print(f'INFO:: removed {n - nnew} ({1 - nnew / n:.2%}) 8h avg (7 morning hours)')


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    INFO:: removed 8954 (29.19%) 8h avg (7 morning hours)


.. GENERATED FROM PYTHON SOURCE LINES 134-139

Step 3(d): Calculate daily maxima
/////////////////////////////////
For each STATION,
- get the max 8h average within a local standard day
- remove when fewer than 13 available

.. GENERATED FROM PYTHON SOURCE LINES 139-151

.. code-block:: Python

    mda8df = a8df.groupby(['STATION', pd.Grouper(level='start_time_lst', freq='24h')]).agg(**{
        obskey: (obskey, 'max'),
        modkey: (modkey, 'max'),
        'count': (obskey, 'count')
    })
    n = mda8df.shape[0]
    mda8df.query('count >= 13', inplace=True)
    nnew = mda8df.shape[0]
    print(f'INFO:: removed {n - nnew} ({1 - nnew / n:.2%}) MDA8 values (less than 13 8h avgs)')
    mda8df['COUNTYFP'] = mda8df.index.to_frame()['STATION'].astype(str).str[:5]
    mda8df.to_csv(pairedpath)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    INFO:: removed 1320 (68.75%) MDA8 values (less than 13 8h avgs)


.. GENERATED FROM PYTHON SOURCE LINES 152-154

Calculate MDA8 Statistics
'''''''''''''''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 154-161

.. code-block:: Python


    statsdf = pyrsig.utils.quickstats(mda8df[keys], obskey)
    # Save stats to disk
    statsdf.to_csv(statspath)
    # Print them for the user to review.
    print(statsdf)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           airnow.ozone      CAMxO3
    count    600.000000  600.000000
    mean      52.517737   51.605904
    std       17.265956   11.977870
    min       13.462500   24.430573
    25%       41.718750   44.530444
    50%       55.906250   53.408267
    75%       64.856250   59.050945
    max      105.625000   99.977543
    r          1.000000    0.868137
    mb         0.000000   -0.911833
    nmb        0.000000   -0.017362
    fmb        0.000000   -0.017514
    ioa        1.000000    0.897360


.. GENERATED FROM PYTHON SOURCE LINES 162-164

Visualize Results
'''''''''''''''''

.. GENERATED FROM PYTHON SOURCE LINES 164-183

.. code-block:: Python


    gb = mda8df.groupby('start_time_lst')[keys]
    cntdf = gb.count().reset_index()

    fig, ax = plt.subplots(figsize=(6, 4))
    tax = ax.twinx()
    x = cntdf['start_time_lst'].to_numpy()
    tax.bar(x=x, height=cntdf[modkey], color='pink')
    tax.bar(x=x, height=cntdf[obskey], color='grey')
    gb.median().plot(color=['k', 'r'], linewidth=2, zorder=2, ax=ax, marker='o')
    gb.quantile(.75).plot(ax=ax, color=['k', 'r'], linestyle='--', legend=False, zorder=1, marker='^')
    gb.quantile(.25).plot(ax=ax, color=['k', 'r'], linestyle='--', legend=False, zorder=1, marker='v')
    tax.set(zorder=0, facecolor='w', ylim=(0, 2000), yticks=[0, 250, 500, 750, 1000])
    tax.set(ylabel='Daily Count')
    ax.set(zorder=1, facecolor='none', ylim=(0, None), ylabel='O3 MDA8 [ppb]')
    ax.set(title='Ozone from CAMx compared to AQS (25%, 50%, 75%)')
    fig.savefig(figpath)


.. image-sg:: /auto_examples/mpe/images/sphx_glr_run_mpe_02_o3mda8_001.png
   :alt: Ozone from CAMx compared to AQS (25%, 50%, 75%)
   :srcset: /auto_examples/mpe/images/sphx_glr_run_mpe_02_o3mda8_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 184-188

Extra Credit
''''''''''''
1. Modify the script to evaulate 1-hour maximum hourly ozone.
2. Modify the script to evaluate W126.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 5.288 seconds)


.. _sphx_glr_download_auto_examples_mpe_run_mpe_02_o3mda8.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: run_mpe_02_o3mda8.ipynb <run_mpe_02_o3mda8.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: run_mpe_02_o3mda8.py <run_mpe_02_o3mda8.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: run_mpe_02_o3mda8.zip <run_mpe_02_o3mda8.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_