Hello and welcome to the beginning of Special Topics in GIS. This first module we are looking at data accuracy, quality, and precision. It starts with a look at what these terms mean and moves onto a couple of examples of how to explore accuracy and precision. But before that even we have to establish one key thing; data standards. Bolstad shows us that there are four primary categories of standards: media, formatting, documentation, and our primary focus this week, accuracy standards.
Accuracy standards document the quality of positional and attribute values or datasets. The lab for this module looked at a few different aspects of this. One, we looked at horizontal precision vs horizontal accuracy, and had a rundown of the possible combinations of these as they apply to a dataset. Accuracy refers to how close a data point is to the true location of what was measured. Precision refers to the consistency or repeatability of results for what was measured. The distance between repeated measurements provides an average deviation. We combine these two terms into a matrix of possible outcomes.
Accurate / Imprecise Inaccurate / Precise
Accurate / Precise Inaccurate / Imprecise
This last combination is of course what we want, data that is repeatably and reliably obtained that is at the desired spot. The map below is an example of a a test whereby a measurement of the same spot was taken with the same piece of equipment 50 times. Then statistical analysis was applied, with a combination of buffer rings to help isolate statistically significant percentages of data, based on standard deviations and normally distributed data. While fairly simplistic, it helps illustrate the point of accuracy and precision while finding the average of all of the measurements.
That map above only used the measured waypoints and derived the average position. After that initial analysis, excel was used to perform some statistical analysis on a larger dataset. Similar to the above we were looking at measures of accuracy and precision for a dataset with a true point and many measured points, 200 to be exact.
With this dataset, several things were calculated which are listed below. There is also a cumulative distribution function (CDF) of the results to help visualize the percentile distribution of the dataset.
Root Mean Square Error (RMSE): 3.06
Minimum: 0.14
Maximum: 6.95
Mean: 2.67
Median: 2.45
68th Percentile: 3.18
90th Percentile: 4.67
95th Percentile: 5.69
Some of these metrics are directly observable in the CDF chart, like the minimum, where the line takes off from zero, and the maximum, where it ends. Also the Percentile measurements correlate to the respective number on the line itself. The RMSE and Mean however are not obtainable just by looking at this graph, but rather require access to the full dataset.
Overall, this class jumps into the heavy knowledge quickly! But its certainly interesting to see how previous courses have built up to this point. It is certainly fascinating all of the things GIS can be used for, and excel for that matter. Thank you.
v/r
Brandon