Is MySQL’s GIS really worth using?
Is GIS worth using in MySQL? In the past few post, I have explored what GIS is and how it is used. GIS encoded data is wonderful and can help with all kinds of cool queries. I'm late getting this article written so lets get right to it.
The most common geographical query is for all the point within some distance from a given point. I'll try to focus on ways to answer this type of query. Accuracy of the answer is always important. Think carefully about your query. Do you want every pizza place within a radius of a port or within a square mile? Or, do you really want it within a miles walking distance?
I'm using the common city_lookup table for these tests. Here is the schema.
CREATE TABLE `city_lookup` ( `city_id` INT(7) NOT NULL DEFAULT '0', `feature` VARCHAR(20) NULL DEFAULT NULL, `name` VARCHAR(50) NULL DEFAULT NULL, `pop_2000` INT(10) NULL DEFAULT NULL, `fips_55` VARCHAR(7) NULL DEFAULT NULL, `county` VARCHAR(50) NULL DEFAULT NULL, `fips` VARCHAR(7) NULL DEFAULT NULL, `state` CHAR(3) NULL DEFAULT NULL, `state_fips` CHAR(3) NULL DEFAULT NULL, `display` TINYINT(3) NULL DEFAULT NULL, `lat` DOUBLE NULL DEFAULT NULL, `lon` DOUBLE NULL DEFAULT NULL,
PRIMARY KEY (`city_id`), KEY `lat` (`lat`), KEY `lon` (`lon`))ENGINE=MyISAM
This table uses simple numeric columns to store latitude (lat) and longitude (lat). There are 35,432 records in the city_lookup table. I’ll flush the cache before each query.
I added at geomentry point to his table by converting latitude and longitude with this command.
ALTER TABLE city_lookup ADD location GEOMETRY NOT NULL AFTER lon; UPDATE city_lookup set location = point(lon,lat)
Simple Queries
Now to search for data. There are lots of formulas / calculations to find geo distences and navigation. Most of these are highly accurate. I have chosen a "POW()" formula because it requires the least work for MySQL. (Best guess.)
SELECT NAME,lat,lon,ASTEXT(location) FROM city_lookup
WHERE (POW(lat - 35.5,2) + POW(lon - -97.6,2)) < .02
NAME | lat | lon | ASTEXT(location) |
---|---|---|---|
Oklahoma City | 35.4676 | -97.5164 | POINT(-97.5164 35.4676) |
Woodlawn Park | 35.5114 | -97.65 | POINT(-97.65 35.5114) |
Bethany | 35.5187 | -97.6323 | POINT(-97.6323 35.5187) |
Warr Acres | 35.5226 | -97.6189 | POINT(-97.6189 35.5226) |
Nichols Hills | 35.5509 | -97.5489 | POINT(-97.5489 35.5509) |
The Village | 35.5609 | -97.5514 | POINT(-97.5514 35.5609) |
Now let's do almost the same query using GIS functions and a bounding box created with two points. Now we are searching a square, not a circle.
SELECT name,lat,lon,AsText(location) FROM city_lookup
WHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;
name | lat | lon | AsText(location) |
---|---|---|---|
Oklahoma City | 35.4676 | -97.5164 | POINT(-97.5164 35.4676) |
Woodlawn Park | 35.5114 | -97.65 | POINT(-97.65 35.5114) |
Bethany | 35.5187 | -97.6323 | POINT(-97.6323 35.5187) |
Warr Acres | 35.5226 | -97.6189 | POINT(-97.6189 35.5226) |
Nichols Hills | 35.5509 | -97.5489 | POINT(-97.5489 35.5509) |
The Village | 35.5609 | -97.5514 | POINT(-97.5514 35.5609) |
Timestamp Duration Message Line Position
10/27/2010 3:38:02 PM 0:00:02.420 Query OK
Well, that’s not any faster. Be we did get the results we expected. I ran a hundred of each query on a quit system. The POW() query takes 0.157 of a seconds and the MBRContains() query takes 0.171 of a second on average.
Is it the Math?
Maybe the math used in the queries is having an effect. I’ll use benchmark to test the basic functions. This will not be completely fair. To make this work, I had to added a POINT() function to the MBRContains() functions so I can run the MBRContains “calculation” in benchmark.
select benchmark (10000000, (POW(35.6 - 35.5,2) + POW(-97.7 - -97.6,2)) < .02 ) ;
This runs in 3.354 seconds.
select benchmark (10000000, MBRContains(GeomFromText('LineString(-97.7 35.6, -97.5 35.4)'),POINT(-97.6,35.5)) ) ;
This runs in 5.460 seconds. Now I’ll try to remove the time was taken by the POINT() function?
select benchmark (10000000, POINT(-97.6,35.5));
This ran in only 0.967. So the MBRContains() function runs in 4.493 after removing the time POINT() takes. Still the POW() functions looks better. It runs in 3/4th the time of the MBRContains() function.
Indexing?
Explain shows neither query is using an index. In a working application, both queries would contain variables that would replace the latitude and longitude numbers (35.5 and -97.6). Because the POW() query uses these as a part of the WHERE clause it is not able to use either the lat or lon index.
So far both POW() and GIS queries are searching through the entire database and taking the same time. (I saw that coming.)
Next I created an index for the location column and tried the query again.
ALTER TABLE city_lookup ADD SPATIAL INDEX `location` (`location`) ;
SELECT name,lat,lon,AsText(location) FROM city_lookup
WHERE MBRContains(GeomFromText('LineString(-98.7 35.6, -97.5 35.4)'),location) ;
Now the average time for the GIS query is .00162. That’s almost ten times faster!
Conclustion
You should be using GIS functions, but be aware of the limitations.
- MySQL only uses bounding box points. Complex shapes will NOT exclude records within the bounding box but outside your polygon.
- MBRContains function is NOT a distance function. If you are starting with a point and distance you will need to calculate the the difference in Lat and Lon to create the bounding box points. (1 deg of latitude ~= 69 miles and 1 deg of longitude ~= cos(latitude)*69)
For my next post:
Reverences:
This is a really great talk on GEO searches with MySQL by Alexander Rubin. http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Wiki descriptions on Latitude http://en.wikipedia.org/wiki/Latitude Longitude http://en.wikipedia.org/wiki/Longitude and geographical distance http://en.wikipedia.org/wiki/Geographical_distance
PlanetMySQL Voting: Vote UP / Vote DOWN