Skip to content

Splunk String Likeness Algorithms

Splunk apps can greatly supplement Splunk's capabilities when working with SPL searches. Utilizing a Splunk app called Jellyfisher we can take advantage of several string likeness algorithms to help us identify data that is similiar.

Jellyfisher App

Jellyfisher is a Splunk app version of the Jellyfish Python library. The Jellyfisher app can be installed on modern versions of Splunk Enterprise along with Splunk Cloud. The app is installed in the typical ways, either browsing/searching the Splunk UI or downloading the app from Splunkbase and uploading to your Splunk instance.

Jellyfish Algorithms

The Jellyfish library (and therefore the Jellyfisher Splunk app) includes several different comparison algorithms to choose from.1

  • String comparison algorithms:

    • Levenshtein Distance
    • Damerau-Levenshtein Distance
    • Jaro Distance
    • Jaro-Winkler Distance
    • Match Rating Approach Comparison
    • Hamming Distance
  • Phonetic encoding algortihms:

    • American Soundex - Metaphone
    • NYSIIS (New York State Identification and Intelligence System)
    • Match Rating Codex
  • Example:

    • The Levenshtein distance between "kitten" and "sitting" is 3.
    • The soundex representation of both "Robert" and "Rupert" is "R163"

Personally, I prefer using the Jaro-Winkler distance algorithm when comparing Splunk strings.

Usage in Splunk

A simple example of Jellyfisher, after installing it, is as follows.

| makeresults
| eval domain1 = "mktbs.net"
| eval domain2 = "mkts.net"
| eval domain3 = "gmail.com"
| jellyfisher jaro_winkler(domain1,domain2)
| rename jaro_winkler AS jaro_winkler_1_and_2
| jellyfisher jaro_winkler(domain1,domain3)
| rename jaro_winkler AS jaro_winkler_1_and_3

This search will result in the following table.

domain1 domain2 domain3 jaro_winkler_1_and_2 jaro_winkler_1_and_3
mktbs.net mkts.net gmail.com 0.9740740740740741 0.43703703703703706

In SPL, using the | eval syntax creates a Splunk field, we set the variables of domain1, domain2, and domain3 to be our example domains. Using jellyfisher against domain1 and domain2, or mktbs.net and mkts.net results in a very small change, we can see the value is 0.9740... In Jaro Winkler, the closer to 1 a comparison value is, the more like a string is.

In our second example we compare domain1 and domain3, or mktbs.net and gmail.com. These strings aren't terribly close together and the resulting value of .4370... shows that.

Use Cases

Jellyfisher algorithms are great when we need to determine how similar a string is, more so when we are looking to find slight changes, things that are difficult to notice with the human eye.

  • ATO detection
    • Specifically where an attacker is changing the account email address to something very similar to the existing email address
  • Phishing detection
    • Jellyfisher algorithms can be used to compare senders/email subjects/email body content to detect variations on known phishing words or phrases.
  • Password disclosure detection
    • If you are logging your SSO logs from a provider such as Okta, Azure Entra, or Ping Identity, sometimes users may enter their password in the username field. Normally, this wouldn't be a big deal, but your SSO provider likely logs an event such as No User Found and likely includes the "username" in that log. If the user is remote, where the user is working from home, it is very likely that from those identity provider logs you can determine the user who entered their password in the username field. In this case, you will want to ensure that the user resets their password.
    • Jellyfisher can assist here by helping us understand if the user did not enter a password in the username field. We do this to avoid false positives inundating us with alerts.
    • We can do this by comparing the data that was entered to expected data. For example, if your email domain is mycompany.com, it is not uncommon to see users making minor errors such as entering mycompany,com or mycompany.con. Jellyfisher will help us understand if the incorrect username was very similar to their actual username, and likely not a password.

  1. Most of this language comes from the Splunkbase page for Jellyfisher. Take a look at the docs there for more details on the specifics of the Jellyfisher supported algorithms.