MySQL Geo Distance Code and Samples

Over the years, I’ve implemented and re-implemented the haversine formula whenever I’ve needed to compute great-circle distances, which is pretty common when you’re doing geospatial proximity searches, e.g., “What’s near me?” or “How far are these two locations from each other?”

For an implementation of the formula in MySQL, Alexander Rubin’s presentation is quite useful, and is what I’ve based the following code on this time around.

Hopefully, rather than re-implementing this yet again the next time I need it, I’ll find my own blog entry on it and just reuse this code. I keep forgetting where I’ve stashed the most recent copy, which is why I keep ending up re-implementing this every time I’ve needed it.

Here’s the code, tested on MySQL 5.1.41:

DELIMITER $$

DROP FUNCTION IF EXISTS geodist $$
CREATE FUNCTION geodist (
  src_lat DECIMAL(9,6), src_lon DECIMAL(9,6),
  dst_lat DECIMAL(9,6), dst_lon DECIMAL(9,6)
) RETURNS DECIMAL(6,2) DETERMINISTIC
BEGIN
  SET @dist := 3956 * 2 * ASIN(SQRT(
      POWER(SIN((src_lat - ABS(dst_lat)) * PI()/180 / 2), 2) +
      COS(src_lat * PI()/180) *
      COS(ABS(dst_lat) * PI()/180) *
      POWER(SIN((src_lon - dst_lon) * PI()/180 / 2), 2)
    ));
  RETURN @dist;
END $$

DROP FUNCTION IF EXISTS geodist_pt $$
CREATE FUNCTION geodist_pt (src POINT, dst POINT) 
RETURNS DECIMAL(6,2) DETERMINISTIC
BEGIN
  RETURN geodist(Y(src), X(src), Y(dst), X(dst));
END $$

DROP PROCEDURE IF EXISTS geobox $$
CREATE PROCEDURE geobox (
  IN src_lat DECIMAL(9,6), IN src_lon DECIMAL(9,6), IN dist DECIMAL(6,2),
  OUT lat_top DECIMAL(9,6), OUT lon_lft DECIMAL(9,6),
  OUT lat_bot DECIMAL(9,6), OUT lon_rgt DECIMAL(9,6)
) DETERMINISTIC
BEGIN
  /*
   * src_lat, src_lon -> center point of bounding box
   * dist -> distance from center in miles
   * lat_top, lon_lft -> top left corner of bounding box
   * lat_bot, lon_rgt -> bottom right corner of bounding box
   */
  SET lat_top := src_lat - (dist / 69);
  SET lon_lft := src_lon - (dist / ABS(COS(RADIANS(src_lat)) * 69));
  SET lat_bot := src_lat + (dist / 69);
  SET lon_rgt := src_lon + (dist / ABS(COS(RADIANS(src_lat)) * 69));
END $$

DROP PROCEDURE IF EXISTS geobox_pt $$
CREATE PROCEDURE geobox_pt (
  IN pt POINT, IN dist DECIMAL(6,2),
  OUT top_lft POINT, OUT bot_rgt POINT
) DETERMINISTIC
BEGIN
  /*
   * pt -> center point of bounding box
   * dist -> distance from center in miles
   * top_lft -> top left corner of bounding box
   * bot_rgt -> bottom right corner of bounding box
   */
  CALL geobox(Y(pt), X(pt), dist, @lat_top, @lon_lft, @lat_bot, @lon_rgt);
  SET top_lft := POINT(@lon_lft, @lat_top);
  SET bot_rgt := POINT(@lon_rgt, @lat_bot);
END $$

DROP PROCEDURE IF EXISTS geobox_pt_bbox $$
CREATE PROCEDURE geobox_pt_bbox (
  IN pt POINT, IN dist DECIMAL(6,2), OUT bbox POLYGON
) DETERMINISTIC
BEGIN
  /*
   * pt -> center point of bounding box
   * dist -> distance from center in miles
   * bbox -> bounding box, dist miles from pt
   */
  CALL geobox_pt(pt, dist, @top_lft, @bot_rgt);
  SET bbox := Envelope(LineString(@top_lft, @bot_rgt));
END $$

DELIMITER ;

And, here’s examples on how to use the above function and procedures:

SELECT @src := pt FROM exp_geo
WHERE postal_code = '07405' AND city = 'Butler';

CALL GEOBOX_PT(@src, 10.0, @top_lft, @bot_rgt);
SELECT Y(@top_lft) AS lat_top, X(@top_lft) AS lon_lft,
       Y(@bot_rgt) AS lat_bot, X(@bot_rgt) AS lon_rgt;

CALL GEOBOX_PT_BBOX(@src, 10.0, @bbox);
SELECT AsWKT(@bbox);

-- Assuming INDEX on (lat, lon) and SPATIAL INDEX on (pt) ...

-- Fast:
SELECT g.postal_code, g.city, g.state,
       GEODIST(Y(@src), X(@src), lat, lon) AS dist
FROM exp_geo g
WHERE lat BETWEEN Y(@top_lft) AND Y(@bot_rgt)
AND lon BETWEEN X(@top_lft) AND X(@bot_rgt)
HAVING dist < 5.0
ORDER BY dist;

-- Also fast:
SELECT g.postal_code, g.city, g.state, GEODIST_PT(@src, g.pt) AS dist
FROM exp_geo g
WHERE lat BETWEEN Y(@top_lft) AND Y(@bot_rgt)
AND lon BETWEEN X(@top_lft) AND X(@bot_rgt)
HAVING dist < 5.0
ORDER BY dist;

-- Slow:
SELECT g.postal_code, g.city, g.state, GEODIST_PT(@src, g.pt) AS dist
FROM exp_geo g
WHERE Y(pt) BETWEEN Y(@top_lft) AND Y(@bot_rgt)
AND X(pt) BETWEEN X(@top_lft) AND X(@bot_rgt)
HAVING dist < 5.0
ORDER BY dist;

-- Also slow:
SELECT g.postal_code, g.city, g.state, GEODIST_PT(@src, g.pt) AS dist
FROM exp_geo g
WHERE MBRContains(@bbox, pt) = 1
HAVING dist < 5.0
ORDER BY dist;

Normally, I try to explain the stuff I post, but this time I’m doing it for somewhat selfish reasons — so I don’t lose this work yet again. I’m just throwing this up here so Google will index it, so I can find it the next time I need it.

I suppose if you read this and have questions, please ask them in the comments below. I’ll do my best to answer, if I can.

Comments

  1. Thank you for adding this.
    Perhaps a table structure sample here will also be a useful reference.

    CREATE TABLE `zipcode_spatial` (
    `id` int(10) unsigned NOT NULL
    AUTO_INCREMENT,
    `zipcode` char(7) NOT NULL, …
    `lon` int(11) DEFAULT NULL,
    `lat` int(11) DEFAULT NULL,
    `loc` point NOT NULL,
    PRIMARY KEY (`id`),
    KEY `zipcode` (`zipcode`),
    SPATIAL KEY `loc` (`loc`)
    ) ENGINE=MyISAM;

Speak Your Mind

*