OPTICS Clustering Explanation

Course Curriculum

OPTICS Clustering Explanation

OPTICS Clustering Explanation

PTICS Clustering stands for Ordering Points To Identify Cluster Structure. It draws inspiration from the DBSCAN clustering algorithm. It adds two more terms to the concepts of DBSCAN clustering. They are:-

  1. Core Distance: It is the minimum value of radius required to classify a given point as a core point. If the given point is not a Core point, then it’s Core Distance is undefined.

  1. Reachability Distance: It is defined with respect to another data point q(Let). The Reachability distance between a point p and q is the maximum of the Core Distance of p and the Euclidean Distance(or some other distance metric) between p and q. Note that The Reachability Distance is not defined if q is not a Core point.

This clustering technique is different from other clustering techniques in the sense that this technique does not explicitly segment the data into clusters. Instead, it produces a visualization of Reachability distances and uses this visualization to cluster the data.

Pseudocode:

The following Pseudocode has been referred from the Wikipedia page of the algorithm.

OPTICS(DB, eps, MinPts)

#Repeating the process for all points in the database
for each point pt of DB

#Initializing the reachability distance of the selected point
pt.reachable_dist = UNDEFINED
for each unprocessed point pt of DB

#Getting the neighbours of the selected point
#according to the definitions of epsilon and
#minPts in DBSCAN
Nbrs = getNbrs(pt, eps)

mark pt as processed
output pt to the ordered list

#Checking if the selected point is not noise
if (core_dist(pt, eps, Minpts) != UNDEFINED)

#Initializing a priority queue to get the closest data point
#in terms of Reachability distance
Seeds = empty priority queue

#Calling the update function
update(Nbrs, pt, Seeds, eps, Minpts)

#Repeating the process for the next closest point
for each next q in Seeds
Nbrs' = getNbrs(q, eps)
mark q as processed
output q to the ordered list
if (core_dist(q, eps, Minpts) != UNDEFINED)
update(Nbrs', q, Seeds, eps, Minpts)
The pseudo-code for the update function is given below:

update(Nbrs, pt, Seeds, eps, MinPts)

#Calculating the core distance for the given point
coredist = core_dist(pt, eps, MinPts)

#Updating the Reachability distance for each neighbour of p
for each obj in Nbrs
if (obj is not processed)
new_reach_distance = max(coredist, dist(pt, obj))

#Checking if the neighbour point is in seeds
if (obj.reachable_dist == UNDEFINED)

#Updation step
obj.reachabled_dist = new_reach_distance
Seeds.insert(obj, new_reach_distance)
else
if (new_reach_distance < obj.reachable_dist)

#Updation step
o.reachable_dist = new_reach_distance
Seeds.move-up(obj, new_reach_distance)
OPTICS Clustering v/s DBSCAN Clustering:

Memory Cost : The OPTICS clustering technique requires more memory as it maintains a priority queue (Min Heap) to determine the next data point which is closest to the point currently being processed in terms of Reachability Distance. It also requires more computational power because the nearest neighbour queries are more complicated than radius queries in DBSCAN.
Fewer Parameters : The OPTICS clustering technique does not need to maintain the epsilon parameter and is only given in the above pseudo-code to reduce the time taken. This leads to the reduction of the analytical process of parameter tuning.
This technique does not segregate the given data into clusters. It merely produces a Reachability distance plot and it is upon the interpretation of the programmer to cluster the points accordingly.

Spectral Clustering (Prev Lesson)
(Next Lesson) OPTICS Clustering Implementing using Sklearn