The GIS Adventures of Map Man

Thursday, October 3, 2019

Special Topics in GIS - Module 2.2: Surface Interpolation

Hello Everyone!

This week's lab was all about interpolation methods for surface data. Interpolation of spatial data is essentially assigning values across surface data at unmeasured/unvalued locations based on the values at measured/values sampling locations. During this lab, I worked with three primary forms of surface data interpolation. Thiessen Polygons, IDW or Inverse Distance Weight and Spline Interpolation. For the Spline interpolation method, there are two types of Spline Interpolation: Regular and Tension.

For this week's lab I worked in two parts. For part A, I compared Spline and IDW interpolation to create a Digital Elevation Model. While this part of the lab and assessing the differences between the two data methods was interesting, I'm going to share with you Part B. For Part B, I was provided 41 sample station water quality measurement points sampled in Tampa Bay, Florida. These data points essentially focus on the water quality and specifically the Biochemical Oxygen Demand (mg/L) at each sample.

Thiessen Polygons

The Thiessen Polygon interpolation method is fairly straight forward. Thiessen polygons contruct polygon boundaries where the value throughout the polygon is equal to the value of the sample point. Overall this method is fairly simple and widely used but for the nature of water quality data, the drastic shifts in polygon values and their clunky look do not reflect the data.

IDW (Inverse Distance Weight)

This method is much better for the nature of the data I was interpolating. Essentially, the values associated with the points directly affect the interpolation while the value decreases the further it gets away from the points. Points that are clustered together tend to push the overall data distribution higher in the clustered areas of concentration. For this data, this method still felt too clunky and did not reflect water quality.

Spline (Regular and Tension)

Spline Interpolation, the smoothest method employed in this lab essentially tries to smoothly go through the data sample points while reducing the curvature of the surface data. Regular spline interpolation is much more dynamic with value ranges (I had negative values even though my data contained none) with lower lows and higher highs. Tension spline interpolation attempts to reduce the factor of data values outside the initial range. For the nature of the data, I believe that Tension spline interpolation (Below) is the best method to visualize the surface data. Water is a smooth continuous medium and water quality can change constantly. Interpolation of this quality of data needs to be loose, but not exceed the data itself making tension spline interpolation the best method for this week.

~Map On!

Thursday, September 26, 2019

Special Topics in GIS - Module 2.1: Surfaces - TINs and DEMs

Hello Everyone!

This weeks lab is one of my favorite labs so far that I've done in my special topics GIS class. For this weeks lab, it was all about working with surface data. When working with surface data in GIS, there are two main types of surface data TINs (Triangular Irregular Networks) and DEMs (Digital Elevation Models). TINs are a vector-based surface data type that uses vertices distributed across the surface to draw triangular edge lines that connect these vertices together. None of these triangle surfaces overlap and you can find that they can better visualize surfaces that vary greatly compared to surfaces that have little to no variance. DEMs share some similarities with TINs, like TINs, DEMs are a great way to visualize a continuous surface. Unlike TINs, DEMs are a raster-based surface data type that uses elevation points within a series of grid cells. These cells can range anywhere from one meter to fifty meters in size. Each cell in the DEM has a unique value. Compared to TINs a digital elevation model is much smoother in showing the surface data while the DEM is much more geometric. While TINs and DEMs are different in the way they visualize surface data, there are a variety of useful functions that they can both do. You can symbolize these two types of surface data in various ways from elevation to slope and aspect values. With this surface data, you can also create contour lines. Below you can see the differences between a TIN and a DEM.

Slope Symbology (DEM)

Aspect Symbology (DEM)

Slope with Contours (TIN)

I could write pages about TINs and DEMs and all the practical and unique uses they have. Before I wrap up this weeks blog, I would like to share with you a DEM project that I worked on this past year that showcases the beauty of working with the aforementioned surface data types.

For this project, I used a LiDAR (Light Detection and Ranging) image (Top) of the surface of Crater Lake in Oregon taken via remote sensing. Crater Lake almost fills a ~2,200 ft deep caldera that formed roughly 7,700 years ago when the volcano Mount Mazama collapsed. It is the deepest lake in the United States with a depth of 1,949 feet and ranks as the ninth deepest rank in the world. From the LiDAR image, I created a hillshade which is essentially a grayscale representation of the earth's surface in 3D that takes into account the position of the sun to shade the terrain. Once my hillshade was created, I overlayed the original LiDAR image and symbolized it by elevation (cool colors low, warm to gray colors high). I then added bathymetry data to show the depth of the lake. I hope you have enjoyed this weeks lab material, working with surface data is one of the coolest GIS applications!

~Map On!

Wednesday, September 18, 2019

Special Topics in GIS - Module 1.3: Data Quality - Assessment

Hello Everyone!

For this weeks lab, we focused on the assessment of data. The lab that I will be sharing with you this week was all about comparing the completeness of two different road shapefiles in the state of Oregon. For this lab, the first level of assessment was comparing the overall completeness of each road dataset by comparing the sum values. Out of the two road datasets I was given (TIGER Lines and Jackson County, OR Centerlines) the TIGER lines are more complete from by over 500 kilometers. Once the overall completeness assessment had been completed, I was then tasked with assessing completeness within a grid. This task would essentially take a grid of 297 cells and compare the total length of roads within each cell. To find these values, I used a tool in ArcGIS Pro called 'Summarize Within'. This tool essentially allows you to find out various levels of information regarding features within other features. In this case, I was looking at the sum of roads for both datasets within each grid. Before I could run my analysis, I needed to clean up the data a little. The TIGER Lines were in a different projection system that the rest of my data, so I reprojected them to match the other data. I also needed to clip my road features to the extent of my grid so that no roads would be outside my study area. I then ran my summarize within to get the sum of road segments in each cell. Finally, I needed to find the percentage of completion for each cell. To achieve this, I used a standard percent difference calculation that gave me both a positive and a negative percentage. Below is the map of my data without the roads to avoid excess map clutter:

As you can see on my map, there are both grid values of positive and negative percentage values. Areas with positive percentages increasing along the blue color ramp portion indicate cells where the sum of the County Center Line roads is greater than the sum of the TIGER Lines (higher percentage completion of County than TIGER). Cells that move up the red portion into the negative percentages indicate cells that have a greater sum of TIGER Line roads compared to the sum of the County Center Line roads (higher percentage completion of TIGER than County). Within this data, it should also be noted that there are two specific grids that are special. There is one cell that contains no roads for either of the datasets and is marked as gray with a 'No Data' attribute. Second, there is one cell (darkest red) where TIGER lines are 100% completion because there is no County road data within that cell.

Saturday, September 7, 2019

Special Topics in GIS - Module 1.2: Data Quality - Standards

Hello Everyone!

This weeks lab is an extension of Spatial Data Quality. For this weeks lab, I did my data quality assessment according to the National Standard for Spatial Data Accuracy (NSSDA). According to the NSSDA, some criteria need to be met when selecting test points. For this lab, I was given two road datasets. One data set is Albuquerque streets from the city of Albuquerque. The second road dataset is Albuquerque streets from StreetMap USA which is distributed from ESRI. Finally, I was provided several satellite aerial images of the study area portion of Albuquerque divided into quadrangles. When comparing the road datasets to the satellite aerial images, it was evident that on the surface, the two datasets had very differing positional accuracy from each other. For my positional accuracy analysis, I chose 20 randomly selected intersection points within one of the provided aerial image quadrangles of Albuquerque. Proper intersections that I chose for analysis were cross(+) intersections and right angle '90-degree' 'T' intersections. Per the NSDAA standards, my test points had a distribution of at least 20 percent of points in each quadrant of my aerial quadrangle and at least 10 percent spacing (at least 370 feet apart) distance of the diagonal length of the quadrangle. To select these points, I created intersection points for both road datasets using a geoprocessing tool within ArcGIS Pro. I then selected the random test points at the appropriate type of intersection ensuring to select the correct intersection for both road datasets and following the aforementioned NSDAA distribution/spacing rules. My test points can be seen below for one of the road datasets:

Once my test points had been selected, I then digitized reference points to compare the positional accuracy bases on the aerial satellite imagery location of each intersection. Once the test points and reference points were created, test points were assigned matching Point IDs with the reference points so their coordinate values could easily be analyzed. After assigning XY coordinate values to both sets of test points and my reference points, I exported them as DBF files and then plugged them into a positional accuracy spreadsheet provided by the NSSDA that calculates the positional accuracy using the 95th percentile. Essentially the table compares the XY position of each test point to its matching reference point (the importance of matching Point IDs for both test points and reference point). This sheet calculated the following values. Sum, Mean, Root Mean Square Error (RMSE), and the National Standard for Spatial Data Accuracy statistic which multiplies the RMSE by a value of 1.7308 (95th Percentile for Horizontal Accuracy) to yield the horizontal positional accuracy at the 95th percentile. My formal accuracy statements can be found below that meet the NSSDA guidelines:

ABQ Streets Test Points:

Horizontal Positional Accuracy: Tested 14.106 feet horizontal accuracy at 95% confidence level.

Vertical Positional Accuracy: Not applicable

Street Map USA Test Points:

Horizontal Positional Accuracy: Tested 258.682 feet horizontal accuracy at 95% confidence level.

Vertical Positional Accuracy: Not applicable

I genuinely enjoyed working through this weeks lab and look forward to sharing more special topics with you and as always... ~Map On!

Tuesday, September 3, 2019

Special Topics in GIS - Module 1.1: Calculating Metrics for Spatial Data Quality

Hello Everyone!

It's hard to believe that we are already going full speed in the fall semester of 2019. Soon 2020 will be upon us and I will have completed my first year of graduate school here at UWF. Last semester, I focused primarily on GIS Programming and Spatial Data Management using SQL. This semester, I'll be focusing on both special and advanced topics in GIS as well as Remote Sensing and Aerial Imagery. Let's jump right in for this phenomenal first week!

For this weeks lab in Special Topics of GIS, I was tasked with calculating metrics for spatial data quality. In this lab, I analyzed spatial data quality for a set of gathered waypoints taken by a handheld GPS unit in two separate ways. The first was via a map with buffer zones (below) showing three percentiles of precision. The second which will not be discussed in this post is a root-mean-square error analysis and a cumulative distribution function graph.

Before I delve into my findings, accuracy and precision need to be explained in the realm of GIS. For the purpose of this post, I am assessing horizontal accuracy and precision. To derive the horizontal precision, which is the closeness of the recorded points to one another, I calculated an average projected waypoint I then created three buffers for precision percentiles that contain an x amount of points. The buffers I created were at 50%, 68%, and 95%. For horizontal accuracy, which is how close the measured values are to the actual (reference) point, I measured the distance of my average projected waypoint from my horizontal precision calculation to the actual reference point.

Now that my methods of determining horizontal precision and accuracy have been explained, I would like to share my results with you.

For horizontal precision, I got a value of 4.5 meters when measuring precision at the 68th percentile. If we are basing the precision off of the 68th percentile then these results would be precise. For my horizontal accuracy, the distance from the average waypoint (blue) to the reference (actual) point was 3.24 meters. Compared to the precision value, these results have fairly high accuracy. Overall, after assessing the horizontal accuracy and precision, it can be observed that the GPS waypoints collected in this test are more accurate than precise. Determining accuracy and precision is, of course, subjective. If these measurements were taken by a surveying company, the resulting precision and accuracy values would be considered failure by survey standards. However, if these waypoints were referencing an object such as the location of a fire hydrant or electrical unit box, they would be much more suitable. Finally, in terms of bias, many factors impact the results. How good is the GPS unit? Are there any satellite connection interference variables such as buildings or weather? Is the user holding the unit consistently in one position? These can all play a role in how data is collected.

I look forward to sharing my future work with you all and as always...

~Map On!

Friday, July 5, 2019

GIS Programming - Module 7 Lab

Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32

>>> print ("Hello Everyone!")

Hello Everyone!

I cannot believe it but this was the final lab of my GIS Programming Class! For this weeks lab material, I worked completely with raster image data which is hands down my favorite type of data in GIS (sorry vector...). This week I was tasked with creating a raster dataset from two existing rasters. The final output raster needed to meet the following conditions an include the following features:

1. Reclassified values of forest landcover (41,42, and 43). This only shows forested land.
2. Highlight elevation areas with a slope value greater than 5 degrees and less than 20 degrees.
3. Highlight elevation areas with an aspect-value greater than 150 degrees and less than 270 degrees.

Once the parameters of my script had been defined, I needed to create an overall structure of how my script would be written. This was how I designed my script.

Start
>>>import necessary modules
>>>import arcpy.sa (spatial analyst module)
>>>set outputs and overwrite parameter to true (print success message)
>>>conditional if statement that won't run if spatial analyst extension is not enabled
>>>reclassify values of 41,42, and 43, to a value of '1'
>>>set good slope value condition
>>>set good aspect value condition
>>>combine 3 rasters
>>>save final combined raster (print success message)
>>>Else portion of statement that prints message saying saptial analyst isnt enabled.
Stop

As you can see from the results above, my script ran correctly with the print messages spread throughout to give the user updated progress as the script runs. If the spatial analyst extension was not enabled, the else statement message would have printed instead of the script running through completion. The results of my final raster also turned out successful. The areas in red are those that are all suitable according to the parameters I was given. All are forested areas that have a slope between 5 and 20 degrees and an aspect between 150 and 270 degrees. I hope you have all enjoyed this journey with me through this course, I have learned so much! Thank you for taking the time to keep updated on my pursuits and until next time...

~Map On!

Wednesday, June 26, 2019

GIS Programming - Module 6 Lab

Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32

>>> print ("Hello Everyone!")

Hello Everyone!

It's so hard to believe that I just completed my second to last GIS Programming module lab! Time sure has flown by in this course. This weeks lab in addition to some other life events really had me stumped in the beginning but after the dust settled and about 6 cups of coffee later the results proved successful! For this weeks lab and lecture, I learned all about geometries. When looking at the geometries of features in GIS there is a hierarchy of understanding that can really help you. The first and highest level is a feature. This essentially is each row in the attribute table. In this weeks lab, I worked with a rivers shapefile from a Hawaii dataset so for this example, each feature would be a stream/river in its entirety. The next level of the hierarchy is an array. An array is essentially the collection of points/vertices that make up a feature. An example would be that a specific feature has an array of 15 vertices. Finally, the last level of the hierarchy is the individual point/vertex. These are usually expressed in the (X, Y) vertex format. Essentially the structure is as follows:

Feature > Array > Vertex

For this weeks lab, I was tasked with working with the aforementioned geometries. I was given a shapefile containing river features from Hawaii and was tasked with writing the geometries of each feature to a newly created TXT file. For my text file, I needed individual lines that provided me with the following information: Feature ID, Vertex ID, X Point, Y Point, and Feature Name. In total there were 25 features in my data and 247 total vertices that I had to list with their respective X and Y points and feature names. Before I get to the results, I would like to share the basis of my code so you can understand how I got my results.

~To Do:

1.     Set my environment parameters
2.     Create a new TXT file that’s writable
3.     Create a search cursor that calls on OID (Object ID), SHAPE, and NAME
4.     3 For loops:
a.      First: Iterates Rows
b.      Second: Iterates Array
c.      Third: Iterates Vertices
5.     Print and Write my results in the script and new TXT file
a.      Feature #, Vertex #, X Point, Y Point, Feature Name
6.     Delete row and cursor variables
7.     Close file access

Start
>>>import statements
>>>environment statements (workspace and overwrite)
>>>define feature class
>>>open new writable file
>>>define the cursor
>>>for loop to retrieve rows
>>>for loop to retrieve arrays
>>>for loop to retrieve vertices
>>>print and write to the newly created TXT file
>>>delete row and cursor variables
>>>close file
End

My results turned out better than expected (below):

As you can see, new lines were written to my TXT file starting with Feature 0 then iterating through each Vertex in the array providing you with the X point, Y point and the name. Once Feature 0's (Honokahua Stream) vertex array had been iterated through, Feature 1 (Honokowai Stream) was iterated through next until all 247 vertices were complete for the 25 features total. Overall implementing the nested for loops in my script was the toughest part and caused the most hangup for me. The final module will be one of my favorites in this class as it pertains to Raster data!