ITSSMetrics

class itssutils.itssdata.ITSSMetrics(itss_data=None)[source]

Class to wrap ITSS metrics dataframe

raw_df

The dataframe of raw ITSS data

Type:pd.DataFrame
metrics

The dataframe of calculated metrics

Type:pd.DataFrame
grouping

The grouping of calculated metrics

Type:list of str
calculate_metrics(grouping, population_csv=None)[source]

Calculate the metrics, grouping by different items

Parameters:
  • grouping (str or list of str) – Columns by which to group the data
  • population_csv (str or path) – Filename of population demographic csv

Examples

>>> # Calculate the metrics for each racial group across all traffic stops
>>> mdf = metrics_by_group(raw_data_df, 'DriverRace')
>>> # Calculate yearly metrics by driver sex for each agency
>>> mdf = metrics_by_group(raw_data_df, ['AgencyName', 'Year', 'DriverSex'])
get_grouping()[source]

Return the grouping used to calculate the metrics

get_metrics()[source]

Return a list of all the calculated metrics

get_metrics_df()[source]

Return the raw metrics dataframe

load(filename)[source]

Load a metrics object from a pickle file pickled object is (grouping, metrics_df) tuple

plot_bars(target_top_row, target_column, only_include_rows=None, title=None, savename=None, savecsv=False, xax_label=None)[source]

Make a bar plot of a certain metric. Requires a multi-level metrics calculation be passed in.

Parameters:target_top_row (str) –

Examples

>>> met.calculate_metrics(['AgencyName', 'DriverRace'])
>>> met.plot_bars('Chicago Police', 'SearchRate')
plot_scatter(y_index, x_index, metric, size, population_col=None, logscale=False, limits=None, scale_factor=None, z_threshold=5, z_opacity='binary', as_ratio=False, title=None, savename=None, savecsv=False)[source]

Scatter plot of all agencies

Parameters:
  • y_index (str or tuple) – the top-level index to use for the y-axis data (i.e. all levels except agency name)
  • x_index (str or tuple) – the top-level index to use for the x-axis data
  • metric (str) – the name of the calculated rate to plot, e.g. SearchRate
  • size (str) – the name of the metric to use to size the points, e.g. SearchCount
  • logscale (bool) – Plot on a loglog scale
  • limits (list or tuple) – the limits on the x and y set_axis
  • scale_factor (float) – Scaling factor for size of points
  • z_threshold (float) – Cutoff threshold to consider something “statistically significant”
  • z_opacity (str) – Type of shading to use (‘binary’, ‘gradient’, ‘filter’)
  • as_ratio (bool) – Make a ratio plot
  • title (str) – Title of the plot
  • savename (str or path) – Where to save the figure
  • savecsv (str or path) – Where to save a csv of data used to make the figure

Examples

>>> # Compare search rates for black and white drivers
>>> met.plot_scatter('Black', 'White', 'SearchRate', 'SearchCount', population_col='StopCount')
plot_timeseries(target_column, only_include_rows=None, only_include_entries=None, title=None, ylabel=None, savename=None, savecsv=None)[source]

Make a timeseries plot

Parameters:
  • target_column (str) – The column you want to make the timeseries for
  • only_include_rows (str or tuple or list) – Rows of index to include
  • only_include_entries (str or tuple or list) – Filter criteria - only include matching entries from target
  • title (str) – Plot title
  • ylabel (str) – Plot y-axis label
  • savename (str or path) – Path to save the plot
  • savecsv (str or path) – Path to save a csv of data used to make the plot

Examples

>>> met.plot_timeseries('SearchRate', only_include_rows='Chicago Police', only_include_entries=['Black', 'Hispanic/Latino', 'Asian', 'White'], title='Search Rate 2012-2017')
plot_zhist(target_item, reference_item, event_col, total_obs_col, title=None)[source]

Z-score histogram for a given event/observation count pairing, e.g. SearchCount/StopCount Must have included ‘AgencyName’ in grouping and grouping must be at least two categories

Parameters:
  • target_item – index of target item, e.g. ‘Black’
  • reference_item – index of reference item, e.g. ‘White’
  • event_col – column name for event counts, e.g. SearchCount
  • total_obs_col – column name for total observations, e.g. StopCount

Examples

>>> # Compare the deviation of black driver search hit rate relative to white driver search hit rate
>>> met.plot_zhist('Black', 'White', 'SearchHitCount', 'SearchCount')
save(filename)[source]

Pickle a metrics object as a (grouping, metrics_df) tuple

save_csv(filename)[source]

Save the current metrics as a csv file