Offensive Words 🤬

Danny Roden
Danny Roden Administrator
edited December 2023 in Functions library

Summary

The functions contained within this package all relate to flagging and dealing with data containing offensive language. All of these functions uses a domain of offensive words (contained within the .dmxd package) which contains a list of known offensive terms.

Note: this data is open source and originates from: https://code.google.com/archive/p/badwordslist/downloads.

Caution: due to the nature of this content, please be aware this dataset (and preview results shown in the screenshots below) does contain some terms which may cause offense.

Contains offensive word(s)

This function uses the reference data to identify records that contain these offensive terms so as to support data discovery activities.

The function can be applied to any column and simply checks for the presence of values found within the reference dataset. It checks for these matches where the value is found separated by a space (i.e. words) or where found between certain special characters (@.-) thus allowing for swear words to be detected within an email address (or freetext field).

See below for a preview of the rule definition and some sample output results:


Does not contain offensive word(s)

This function is the reverse of the above and checks a given input does not contain anything known to be offensive. It is recommended that this function is used as a rule and will output 'true' when no match is found, and 'false' if a match is found against the reference data.


See below for a preview of the rule definition and some sample output results:


Extract offensive word(s)

Lastly, this function strips out the matched value so as to support analysis of any records where offensive matches are found. You may want to use this function as a mechanism for evaluating the validity of the dataset within this package (see further down for tips on editing this).


See below for a preview of the rule definition and some sample output results:


Note: If you decide that you want to tweak the reference data (i.e. you're finding some false positives, or there are

terms missing from it), you can do this by downloading the reference data from Data Studio (using the 'download as csv' button) and then edit it accordingly.

Once you've made those changes, upload it into the existing dataset (using the 'Upload new data' button).

After you've done this the function will dynamically update to point at referencing the updated data.



Compatibility:

These functions are compatible with all instances of Data Studio from v2.1.11 onwards.

[dl-button|Download|https://us.v-cdn.net/6031645/uploads/JF32EAPXE715/offensive-terms.dmxd]