Matching
Matching is a process of connecting emission data points to client infrastructure. The parameters of the matching process depend on the data provider for data points and are configured via the MatchingConfiguration model.
Matching configuration
There is a single MatchingConfiguration instance possible for any data provider. Matching configurations canbe be further separated by the secondary data source field.
Companies can be added or excluded from any MatchingConfiguration instance. Here are the fields of the MatchingConfiguration model and their descriptions:
- Data provider: This instance will be used to match all data points for a given data provider
- Companies: A list of companies who's infrastructure will be matched to this data provider's data points
- Is EPA matcher: If selected, this configuration will use the epa matcher that has a different matching algorithm than the non epa matchers
- Matching configuration params: A set of parameters that can be configured on the secondary data source bases
- Is default: If True, this configuration will be used as a catch all (when the secondary data source for a point is not found in the configuratio)
- Secondary data sources: The secondary data sources that use these matching configuration parameters
- Point distance: Matcher will use the distance from emission to site location for matches
- Enable polygon matching: If enabled, the matcher will use site shapes to determine emission to site match
- Distance to polygon matching: If set to number larger than 0, the matcher will use the distance from emission to site polygon to look for matches
- Enable children matching: This parameter is deprecated and no longer used!
- Near match tolerance: This parameter is deprectated an no longer used!
- Enable plume outline matching: If enabled, the matcher will use plume outlines to match to site location or shape if available
- Disable matching: If enabled, these parameters will not be used for matching. This is useful when there are secnodary data sources for a data provider that we don't want to run the matcher for.
Adding another matching configuration params instance
Clicking the Add matching configuration params will open a new inline form. The new form needs to be populated with required attributes and at least one secondary data source must be selected. The data sources in this form, must not be present in any other MatchingConfigurationParams instance connected to this data provider.
Matching runs
Matching runs can be triggered by three different contexts
Data batch
This can be triggered either automatically or manually:
- Automatically when a a data import is completed
- Manually from django admin on the data batches page by selecting one or more data batches and executing the
re-run matchersaction
Company
Triggered from django admin on the company edit for. This will run the matcher for all public data points and data points owned by this company against all of the sites owned by this company.
Specific sites
This is triggered by the infrastructure import upon successfuly import operation. Currently supported operations are:
- CreateSitesOperation
- UpdateSitesOperation
Additional operations can be setup by calling the run_matching_for_sites function from the post_import_hook of the Operation.
Matching algorithm
The matching process is a two step process, where the first step is only run on the data batch context (after the data import process or when re-running a matcher for a data batche)
Step 1
In step one the matcher will attempt to match data points to sites by only one criteria:
- Is the data point location within the polygon that is the shape of the site
The sites that are taken into consideration are all sites belonging to the companies that have access to a data batch. Company has access to a data batch if:
- Data batch is public
- Data batch is owned by the company
Once the site is matched by this criteria by step 1 of the matching proces, this data point is never going to be matched to any other site regardless of the company owning that site.
Step 2
Step 2 of the matcher is a more complex process that works on the priority system and depending on the configuration in MatchingConfiguration model
Prior to applying the criteria listed below, only sites within 3000 meters from the data point are considered in the matching process. This is to reduce the duration of the matching process by eliminated from the queries all of the sites that are too far away from the emission points.
Polygon matching and plume outline matching are enabled
If enable_polygon_matching and enable_plume_outline_matching are both enabled, then match data point to site when the first of these criteria is True:
- Site is shape and data point location is within the site bounds
- Site is shape and data point location distance to site bounding polygon is less than
distance_to_polygon_matchingconfiguration param. - Site is shape and distance between the data point location and site bounds is less than
distance_to_polygon_matchingOR site is point, and distance between site and data point location is less thanpoint_distanceconfiguration param - Site is point and distance between site and data point location is less than
point_distance
Polygon matching disabled and plume outline matching enabled
If polygon matching is disabled, neither the plume outlines nor the site shapes are taken into consideration and only one criteria is evaluated:
- Site is point or shape and distance between site location and data point location is less than
point_distance
Polygon matching enabled and plume outline matching disabled
In this case, plume outlines are ignored and the following criteria is used, stopping the matching process when the first criteria is satisfied:
- Site is shape and data point location is within the site bounds
- Site is shape and distance between the data point location and site bounds is less than
distance_to_polygon_matchingOR site is point, and distance between site and data point location is less thanpoint_distanceconfiguration param.
Polygon matching and plume outline matching are both disabled
Only one criteria is evaluated:
- Site is point or shape and distance between site location and data point location is less than
point_distance