Automatic matching is difficult but I've made a number of changes to improve the matching in Jaikoz in the latest release.
Jaikoz searches for possible matches in Musicbrainz then rescores them taking additional information into account to find the best match, it does this because an original Musicbrainz score only takes into account the search terms when scoring but we need to consider more values. For example we do not specify a duration in a search because some songs do not even have a duration within Musicbrainz so would never be returned by a search, but having got some potential results we want to give a higher score to those with a duration that matches the original song. Musicbrainz uses Lucene for searching with its own custom analyzer for deciding which songs are returned by a search and this latest release of Jaikoz uses the exact same analyzer to ensure scoring is compatible. This is one advantage of working on both Musicbrainz and Jaikoz !
When searching for a track we now consider more variations of the name because songs entered into Musicbrainz are normalized , for example We Have Explosive (Pt. 5) should be entered into Musicbrainz as We Have Explosive, Part 5 but they might not have been. This normalization is detailed in the Style Guidelines and In Jaikoz we now check for the title as it appears in your metadata and also as a normalized version as far as possible.
We also make workarounds for common errors in entering data. For example Musicbrainz Issue #5538 shows that users usually enter song titles as 'No. 1' , but in a large minority of cases enter 'No.1' , Jaikoz workrounds this issue.
Cluster Albums finds albums by artists with the same name but a different Release Id and tries to move the songs so that they are all on the same Release Id, note this is different to what 'Cluster' means in Musicbrainz Picard and perhaps I should have called it something different. Previously it did this by matching title against title for each Release Id being used, and picked the Release Id which had the most matches but now this has been improved. Firstly we use fuzzy matching on the title allowing for normalization as explained earlier. Secondly if all but a couple of tracks are successfully matched to one Release Id we allow matches on Acoustic Id and song length to shoehorn the remaining tracks into a potential release. This is really useful when the same song exists on two albums but is radically renamed between the two.