Tuesday, August 27, 2024

Special Topics - M1 - Data Accuracy

 Hello and welcome to the beginning of Special Topics in GIS. This first module we are looking at data accuracy, quality, and precision. It starts with a look at what these terms mean and moves onto a couple of examples of how to explore accuracy and precision. But before that even we have to establish one key thing; data standards. Bolstad shows us that there are four primary categories of standards: media, formatting, documentation, and our primary focus this week, accuracy standards. 

Accuracy standards document the quality of positional and attribute values or datasets. The lab for this module looked at a few different aspects of this. One, we looked at horizontal precision vs horizontal accuracy, and had a rundown of the possible combinations of these as they apply to a dataset. Accuracy refers to how close a data point is to the true location of what was measured. Precision refers to the consistency or repeatability of results for what was measured. The distance between repeated measurements provides an average deviation. We combine these two terms into a matrix of possible outcomes.

Accurate / Imprecise     Inaccurate / Precise 

Accurate / Precise         Inaccurate / Imprecise

This last combination is of course what we want, data that is repeatably and reliably obtained that is at the desired spot. The map below is an example of a a test whereby a measurement of the same spot was taken with the same piece of equipment 50 times. Then statistical analysis was applied, with a combination of buffer rings to help isolate statistically significant percentages of data, based on standard deviations and normally distributed data. While fairly simplistic, it helps illustrate the point of accuracy and precision while finding the average of all of the measurements. 




















That map above only used the measured waypoints and derived the average position. After that initial analysis, excel was used to perform some statistical analysis on a larger dataset. Similar to the above we were looking at measures of accuracy and precision for a dataset with a true point and many measured points, 200 to be exact. 

With this dataset, several things were calculated which are listed below. There is also a cumulative distribution function (CDF) of the results to help visualize the percentile distribution of the dataset.  


Root Mean Square Error (RMSE): 3.06

Minimum: 0.14

Maximum: 6.95

Mean: 2.67

Median: 2.45

68th Percentile: 3.18

90th Percentile: 4.67

95th Percentile: 5.69


Some of these metrics are directly observable in the CDF chart, like the minimum, where the line takes off from zero, and the maximum, where it ends. Also the Percentile measurements correlate to the respective number on the line itself. The RMSE and Mean however are not obtainable just by looking at this graph, but rather require access to the full dataset. 


Overall, this class jumps into the heavy knowledge quickly! But its certainly interesting to see how previous courses have built up to this point. It is certainly fascinating all of the things GIS can be used for, and excel for that matter. Thank you. 

v/r

Brandon

Friday, August 9, 2024

M6 - Post 2 - Corridor Analysis

Welcome back to part 2 of this week's discussion on suitability analysis and least cost pathing. This part picks up with a look at corridor analysis. Where it builds from the previous part is in creating suitability layers, applying them to a weighted overlay, and then building a cost distance model. Based on the following workflow, the below map is of ideal black bear movement areas between two regions of the Coronado National Forest. Based on the bears known habitats, various land cover types, and roadways, each layer was given a suitability factor. Those factors were then weighted, with landcover being primary at 60%, roads and elevations at 20% each. Then a cost-distance corridor was built based on these factors. Those areas are colored by ideal movement areas. Underlying the scene is a hillshade analysis and terrain relief generated from a digital elevation model. 














The red corridor is the ideal movement corridor, and represents only a 1.1 multiplier to the total suitability result. This means that there was an ideal score, then this multiplier applied to it to generate that movement area. The orange is a 1.2 multiplier, and yellow area is a 1.3 multiplier. I think that these are important because when you extend out to the 1.2 and 1.3 multipliers you start to see secondary corridor bands like the smaller orange corridor. While the red is ideal, this still shows that there may be alternative considerations in play. The raw data however shows that the entirety of the region between the two closest portions of the Coronado regions would be viable. The highlighted areas are just Most viable. 

While certainly a lot of work, with multiple levels of iteration in the products, this was a worthwhile investment of time to understand how these tools work, and build off each other. Thank you.


v/r

Brandon 

Wednesday, August 7, 2024

M6 - Post 1 - Suitability Analysis

 This is the first part of a two post final module for GIS 5100, Applications in GIS. 

This last module combines many of the skills that have been acquired, strengthened, and challenged during this course. Specifically, we are working through a significant amount of raster data manipulation to generate a suitability analysis. Under the auspices of being a budding GIS analyst for a property developer, my task was to take five factors, transform data relating to them, and generate a weighted overlay, to provide a suitability assessment for the subject area. 

The subject categories are: Land Cover, Soil Type, Slopes, Streams, and Roads. 

Land Cover was already in raster format, but required reclassification to provide favorable weight to agricultural areas and meadow or grasslands. 

The Soil analysis having previously been completed was a polygon layer requiring conversion to raster, and then adjusting the suitability for arability. 

A DEM was provided so that I could transform it into a Slope raster, and then heavily weight mild slopes. 

For streams, while water is a desirable feature, it was weighted by distance away from it. 

Roadways are key for accessibility and as such heavily weighted based on distance from a roadway out to 1 mile. 

All of these datasets varying factors were given a value of 1 - 5, with 1 being least suitable and 5 being most suitable. This means, that each raster Cell was provided a value on this scale based on the real factor its source raster represented. Then the weighted overlay tool provides a composite score by cell, and based on the weight applied to each factor. 

In the case of the map comparison below, the left pane has all of the five factors being weighted equally. The right pane provides a variable weight as depicted. 











The biggest takeaway is that by weighting the factors differently you can vastly change the amount of suitable or unsuitable area that you are working with. Also remember that the suitability factors created by the Weighted Overlay process must end up in a whole integer. Normal rounding rules apply for each cells value. A cell weighted at 4.29 and 3.75 will both end up being a 4. etc. 

Now stay tuned for part 2 coming up next. 

Saturday, August 3, 2024

M5 - Damage Assessment

This module involved a holistic look at 2012's Hurricane Sandy, from path to shore, and a damage assessment of some of the aftermath. It starts with translating an excel file containing latitude/longitude, strength, wind speed, and time data for the hurricane across its week-long existence. With the course translated from data points to a point feature class, then converted the points to a line. The culmination of that transformation of data is below. 


After the track was established, we switched our focus to the actual damage itself. For this assessment we use a before and after image for the appropriate study area. With the study area identified I digitized points for each of the structures and built out an attribute table combined with predefined information domains. 


Above is a look at the study area in the post-hurricane scene. While it's not a full map with labeling, the Red and Black triangles indicate total destruction, red highlights major damage, orange, minor structural damage, and yellow represents affected structure. Nearly everything here is affected in some way, but several structures appear intact from this view, those are the green circles.   

From there, part of the analysis turned to looking at damage rates in 100-meter zones. This allows us to extrapolate damage predictions for other areas. Now there are several variabilities, and any given adjacent area may have more or less destruction for a multitude of factors. 


In the above, the line in the center of the buffer is the baseline. Its adjacent to the study area, and shows a visual depiction of 100m bands the study area houses fall in. 

One of the other aspects of the module was to explore external GIS tools, like survey123.arcgis.com which allows for custom survey creation that responders or local citizens can use to submit information. UWF Members can view an example here: https://arcg.is/1CeafO

This comprehensive analysis was definitely time consuming. But it is amazing to see all of the data come together in this way. Thank you.


v/r

Brandon


















Special Topics - Mod 3 - Lab 6 - Aggregation and Scale

 Hello and Welcome back!  My how time has flown. It has almost been 8 weeks, and 6 different labs. There have been so many topics covered in...