Skip to content

Add DBSCAN clustering algorithm in machine_learning/#14851

Open
PRERITARYA wants to merge 3 commits into
TheAlgorithms:masterfrom
PRERITARYA:add/dbscan-clustering
Open

Add DBSCAN clustering algorithm in machine_learning/#14851
PRERITARYA wants to merge 3 commits into
TheAlgorithms:masterfrom
PRERITARYA:add/dbscan-clustering

Conversation

@PRERITARYA

@PRERITARYA PRERITARYA commented Jun 23, 2026

Copy link
Copy Markdown

Describe your change:

Add DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
clustering algorithm implemented from scratch without any external ML libraries.

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.

Fixes #10683

Copilot AI review requested due to automatic review settings June 23, 2026 05:36
@algorithms-keeper algorithms-keeper Bot added the awaiting reviews This PR is ready to be reviewed label Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new machine_learning/dbscan.py module implementing the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm from scratch, providing a density-based alternative to existing clustering implementations in the repository.

Changes:

  • Adds a from-scratch DBSCAN implementation (dbscan) with input validation and -1 noise labeling.
  • Adds supporting helpers (euclidean_distance, get_neighbors) used by the clustering routine.
  • Includes doctests and reference links (Wikipedia + original paper) and a __main__ doctest runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +20
Point Types:
- Core point: Has at least `min_points` neighbors within `epsilon` distance
- Border point: Within `epsilon` of a core point, but has fewer than
`min_points` neighbors
- Noise point: Neither core nor border — labeled as -1
Comment on lines +52 to +70
def get_neighbors(
data: list[list[float]], point_index: int, epsilon: float
) -> list[int]:
"""
Return indices of all points within epsilon distance of data[point_index].

>>> data = [[0.0, 0.0], [0.1, 0.1], [5.0, 5.0]]
>>> get_neighbors(data, 0, 0.5)
[0, 1]
>>> get_neighbors(data, 2, 0.5)
[2]
>>> get_neighbors(data, 0, 10.0)
[0, 1, 2]
"""
return [
index
for index, point in enumerate(data)
if euclidean_distance(data[point_index], point) <= epsilon
]

@zain-cs zain-cs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation overall! The docstrings, type hints, and doctests are comprehensive and well-written.

One observation in dbscan(): after reassigning a noise point to the current cluster here:

if labels[current_point] == -1:
    labels[current_point] = current_cluster_id

the already_in_another check immediately below will never trigger for that point since it was just assigned. The guard is only meaningful for points that were previously assigned to another cluster — consider reordering the logic to make the intent clearer.

Otherwise looks solid! 👍

@algorithms-keeper algorithms-keeper Bot added tests are failing Do not merge until tests pass and removed tests are failing Do not merge until tests pass labels Jun 23, 2026
@PRERITARYA

Copy link
Copy Markdown
Author

Thank you for the review @zain-cs! You're right the already_in_another check was redundant after the noise reassignment. I've reordered the logic so the guard runs before the assignment, making the intent clearer. Please take another look when you get a chance!

@zain-cs zain-cs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback @PRERITARYA! The updated logic is
much cleaner — running the cluster membership guard before the
assignment makes the intent immediately clear.

I also noticed the ruff check is flagging W292 (no newline at end of
file) — a quick fix would be adding a trailing newline to get the
checks passing. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Implement DBSCAN Algorithm

3 participants