-
Unlock the Power of Markdown in Aperture Data Studio
Did you know Aperture Data Studio supports Markdown? This lesser-known feature can transform how you present information, making your dashboards, workflows, and catalog entries more engaging and easier to read. Markdown enables a wide range of formatting options, including: Text formatting (headers, subheaders, bold,…
-
Understanding Cloud Licensing Security Setup
If your account manager or consultant has recommended upgrading your Aperture Data Studio deployment to a cloud license, it’s because Aperture Data Studio v3.0 will only be available through cloud licensing. Cloud licensing brings many benefits (for more details, please refer to this write-up), but might require some…
-
Customizing your Dashboard with the Custom content widget
Data Studio Dashboards can be configured by adding, ordering and resizing different widgets including Views and Charts. One of these widgets is the Custom content widget: https://docs.experianaperture.io/data-quality/hosted-aperture-data-studio/data-studio-objects/dashboards/#custom-content~customizing-dashboards Firstly…
-
Connecting to an Amazon RDS MySQL Database
A user recently asked about connecting Data Studio to their AWS RDS MySQL Database. The first thing to note is that MySQL on RDS uses the MySQL community edition, which is not supported by Data Studio's native JDBC driver. When attempting to connect, you'll receive this error: "Connections to MySQL Community Server are not…
-
Licensing for Aperture Data Studio
This article provides an overview of all things cloud licensing including pre-requisites, activation and management guide and links to helpful documentation. Should you wish to switch to cloud licensing, please speak to your account manager or your local support team. Advantages of cloud licensing Faster setup and time to…
-
Using ❄️ Snowflake with Aperture Data Studio
Snowflake is a popular cloud-based data warehouse that is used by many organizations to store, process, and analyze large volumes of diverse data. Connecting Snowflake to Aperture Data Studio Aperture Data Studio can import data and metadata from file stores, CRMs and databases, including Snowflake. To connect to Snowflake…
-
Map list: Transform every item in a list
What it does: Applies a transformation function to each item in the list and returns a new list with the transformed values. Where it works: Comma-separated list JSON list Business Use Case & Example: Adding new prefix for employee ids before loading into a master database: This is perfect for standardizing formats,…
-
Map path: Targeted transformation in JSON documents
If you’re working with structured JSON data, Map path is your go-to Higher Order Function for applying transformations to specific fields. What it does: MapPath uses a JSONPath expression to locate fields in a JSON document and applies a function to each matched item. Where it works: JSON list JSON record Business Use…
-
Filter list: Keep only what matters
What it does: Returns a new list containing only the items that meet the condition defined by the operator function. Where it works: Comma-separated list JSON list Business Use Cases & Examples: Comma-separated list: Filtering emails with valid format from a list JSON List: Extract customers that matches a certain interest…
-
Any match: Find just one match in a list
What it does: Checks whether at least one item in a list satisfies a condition defined by a Boolean-returning function. Where it works: Comma-separated list JSON list Business Use Cases & Examples: Comma-separated list Flag if any items in the list are a UK postcode: JSON list Identify if the shopping cart contains a…
-
All match: Validate every item in a list
In this post, we will take a closer look at All match, one of the higher order functions now available in Aperture Data studio v3.2. What it does: All match checks whether every item in a list satisfies a condition defined by a Boolean-returning function. Where it works: Comma-separated lists JSON lists Business Use Cases…
-
Unlock powerful list and record logic with the higher order functions (part of List transformation)
In this series, we will explore the new set of Higher Order Functions (introduced in Aperture Data Studio v3.2) that bring more flexibility and control to how you work with lists and records in Aperture Data Studio. These functions allow you to pass another function as a parameter, making it possible to filter, match, and…
-
Gen AI in Aperture Data Studio
Aperture Data Studio has functionality that uses several different machine learning and artificial intelligence techniques, including clustering, cosine similarity, deep learning, and more recently GenAI. Gen AI Actions [in v3.1] When exploring data in the grid you are able to describe what you want to calculate or how…
-
A solution design for validating multiple mixed file schemas
We've recently been working on a solution to the following ask: A user has hundreds of data files with different schemas (different column names, datatypes, numbers of columns etc) Each file has an accompanying metadata file, defining expected attributes (columns, datatypes, primary keys, lengths) We want to be able to…
-
Make Your Data Pop: Visual Styling in Aperture Data Studio
Allow your data to speak for itself - Aperture Data Studio offers powerful ways to make your results more visual and intuitive, perfect for quick reviews and better decision-making. 1. Set Cell Style for RAG Colour Coding The Set Cell Style function lets you apply a Red-Amber-Green (RAG) overlay to cells based on business…
-
ℹ How to track DQ issues with Issue lists
Issue lists allow you to capture problematic records, assign them to stakeholders and collaborate on a resolution. Records that have been fixed will automatically be resolved. Setting up your Issues list Go to Issue lists in left nav bar and create a new list to track related issues Create a Workflow to write issues to the…
-
Suggested approaches for manipulating JSON values
My data includes a column containing JSON values - I've made a dummy example below: https://us.v-cdn.net/6031645/uploads/CNP6XDWHH24R/color.csv Example JSON value: { "magenta": "#f0f", "yellow": "#ff0", "black": "#000", "usageList": [ "giraffe", "lion", "axolotl" ], "properties": { "cost": 99, "difficulty": { "part_a":…
-
Word Frequency
Every now and then a scenario crops up where it'd handy to know how often a given word occurs within a given dataset. For example you're profiling a reasonably standardised list of values (e.g. job titles) and you want to identify unusually common terms (like 'manager' 'executive' etc) or infrequent ones (like 'test123').…
-
Bringing Data Together
Knowing the best path for working with multiple datasets can be confusing sometimes, understanding which approach is appropriate for what your wanting to achieve is key to getting the outcome you want. Here's a straight forward breakdown of 4 out-of-the-box Aperture Data Studio operations which will allow you to bring your…
-
 ℹ️  Importing your Datasets into Excel or Tableau using OData
You can easily get your data from Data Studio into Excel or Tableau using OData. OData (odata.org) is “an open protocol that allows the consumption of data via a simple and standard RESTful API”. If you choose to turn on OData in Data Studio (v2.2.3), you will be able to link your datasets (including snapshots) to any BI…
-
Enabling hard delete using the Salesforce connector
When deleting data from a Salesforce object you can tell the API to hard delete the data so that it does not go in to the recycle bin and therefore can't be recovered. It's also possible to do this with DELETEs in the Export step using the native Salesforce JDBC driver. Here are the steps: In your Salesforce External…
-
Aperture REST API calls as Datasets within Aperture
Data is available from Aperture via the REST API that is not available in Aperture. Full details of Spaces, User Groups, Users, Jobs, Datasets etc. We would like to use Aperture to manage Aperture. For example: An automated workflow which checks for long running jobs and notifies us of issues; a regular review of user…
-
Suggest Validation rules - Validate step
Aperture Data Studio has a Validate workflow step that allows you to define rules of how data in specific columns should be populated. This information can then be used to be alerted to any data quality problems or to track data quality improvements over time. These rules can be simple: should not be null should be a…
-
Best practices designing complex Workflows
A Workflow is a sequence of steps that defines a process to transform and manipulate your data. When a single Workflow tries to tackle too many actions it can become difficult to read, understand and manage: If you are designing a Workflow alone this might be fine, but if you are collaborating or planning to have others…
-
ℹ How to use Functions in Workflows
Workflows Workflows is the area in Data Studio where designer users tend to spend most of their time. For anyone new to it, imagine a data pipeline that manipulates one or more data sources through a number of stages and then does something with the results. Whilst a Workflow is easy to build by connecting different steps…
-
How to best keep several environments in sync
I'd like to share a question that was recently raised by a user as I think it could be interesting for our community members. Question: We have two physical environments - Dev and Prod VMs and we have promoted all the objects from Dev to Prod environment (Export and Synchronised). Over time, Dev has gone through several…
-
Connecting to the ServiceNow REST API using OAuth2
The Autonomous REST Connector JDBC driver allows you to load (and refresh) the results returned from REST API calls as Data Studio Datasets, by translating SQL statements to REST API requests. In this article I'll show how to call ServiceNow's REST API (using OAuth2 authentication) to bring response data into Data Studio.…
-
Assign latest changed id to all linked records
Hi, I have a scenario where input file is providing changed address ids in from-address and to_address columns. Need to identify the links between the records using these 2 fields and assign latest id to all the linked records. Input: event_date,from_address_id,to_address_id 20210223,120,160 20210402,120,160…
-
🛢️ Loading from OData using the Autonomous REST Connector
The Autonomous REST Connector JDBC driver packaged with Aperture allows you to load (and refresh) the results returned from REST API calls as Datasets. OData is a standardized REST interface, which means that any data that can be queried via OData API can be loaded into Aperture. This article will demonstrate how to use…
-
🎞️ Here is a short demonstration video on sharing
In this video I will show how different types of user role and license assignments have different abilities within Data Studio. I will show one way that I might share a particular view of the data to 'consumer users' that they can look at and explore but cannot save or manipulate beyond data exploration.
-
Connecting to Salesforce with OAuth2
Data Studio uses a JDBC driver to connect to Salesforce, allowing data to be extracted, transformed, validated, and then optionally pushed back to source. As well as standard username / password authentication, the driver supports OAuth 2.0 authentication when establishing a connection. The following guide is a brief…
-
Use a value in your data as the export file name
If you are looking to use a value in your data as the export file name, you may have looked at the Export step and realized that it does not allow you to directly select a value from your data as a filename component. However, you can actually use a workflow parameter as a filename component to achieve this. You will have…
-
Business Validation and Enrichment
Aperture is a powerful tool when validating business information you already have. We can leverage trusted address sources, like the Royal Mail’s Postal Address File, for the UK. This will ensure that the addresses and business name information you hold is accurate and deliverable as per that trusted source. That’s all…
-
Use of If-Then-Else logic in Aperture Data Studio
Imagine you are asked to use Aperture Data Studio to generate a new field for your sales data. You have the input fields: Discount Code, Quantity and Price. The ask is to generate a new field “Offer Price” using the following logic: How can you do this in Aperture Data Studio? The answer is to build a custom transformation…
-
Documenting what data is available in Data Studio (to users who don't have access)
A question that cropped up earlier today was: 'how can we avoid different users connecting to the same data in different spaces?' Depending on how you've setup Data Studio and the processes around how your users work with it, this can be a bit of a challenge to tackle at the moment. However there's a solution, which is…
-
đź“‘ API paging with the Autonomous REST Connector
In this article I want to share a few approaches to configuring paging when accessing REST APIs using Aperture Data Studio's Autonomous REST (AutoREST) connector. What is the AutoREST connector? If you're reading this, you're probably already familiar with the AutoREST connector, but in case you're new to it, check out:…
-
How I got started with Data Studio
Assuming Data Studio is set up and your user account has been created, “Your space” is your personal work area. Any changes you make in this Space will not impact any other users. To understand some Aperture Data Studio specific terms (highlighted in bold below) the documentation site is a useful resource. Other resources…
-
Connecting to hosted Dynamics 365 with OAuth2
Data Studio’s Dynamics 365 connector can be used to read or write data from Microsoft Dynamics 365 apps. Click for a full list of supported Dynamics 365 ERP and CRM apps by the driver Click for a the driver's user documentation Click for basic config and troubleshooting steps in the Data Studio documentation The driver…
-
🎞️ A short video demonstation on using the Quick Actions bar, shortcuts and more
Since Aperture Data Studio 2.8.0 users have been able to use the Quick Actions bar to access frequent actions and quickly navigate the product. This short video gives a demonstration of how this can be used and also highlights other product shortcuts you may not have known. The main areas covered include: Quick Actions bar…
-
How to identify blocking keys for Find Duplicates
Blocking keys identifies records that are similar, creating blocks or potential groups of matches. Let’s look at an example where you have a list of names and date of birth that may contain duplicates. The rule of thumb is to be able to identify any possible chances of matches. Which elements would you use to say that any…
-
Quick Actions menu
Since v2.8 Aperture Data Studio has had a command palette that makes shortcuts and access to frequent actions more discoverable and accessible. Simply press CTRL + SHIFT + P from anywhere in the application to open the Quick Actions menu (and use the same shortcut or Esc to close the menu). The list of actions shown is…
-
How to highlight null/empty values
When performing exploratory data analysis, or presenting data in grids in a dashboard, you may want to highlight null/empty values. Within Aperture, you can build a custom function to do this and apply it to one or more columns in your dataset. Here is an example: Go To Functions, Create new function. Create a parameter…
-
How to set up an export file name based on a fixed or variable input
In this article, we will be exploring a number of ways to set up an export file name. Imagine you start with a simple workflow where you have a Source step and Export step. You want to export the source data to a csv file on the server export directory. Use default export file name By default, the filename consists of two…
-
How to append leading or trailing zeroes to a number
Aperture makes it very easy for you to append a chosen character to any value to achieve an overall length. You also have the option to add the character at the beginning or at the end. To achieve this, look for the Pad function that you can use within a Transform step. Example 1: To add leading zeroes to the value "1" in…
-
How to remove leading or trailing characters
Here's a simple example of how you can remove leading or trailing characters. Assume you have loaded a dataset, treating all columns as alphanumeric as follows: To remove the leading zeroes from Customer No, apply the Trim function in a Transform step. To remove the trailing colon symbol from Customer ID, use the Trim…
-
Find Duplicates Training Data
Attached is data to be used with Find Duplicates Training.
-
Find Duplicates Service Installer FAQ
Here are some FAQs on the Find Duplicates Service Installer available since Aperture Data Studio v2.4.8. What is the purpose of the new Find Duplicates Service Installer? The Find Duplicates Service Installer is created with the intention to simplify the installation of a separate instance of Find Duplicates on Windows.…
-
Tips and Tricks to make Find Duplicates Blocking Keys and Ruleset more readable
What is your first impression of the blocking keys and ruleset definitions required for Find Duplicates? We have observed that it may take a bit of learning to understand the syntax and structure to correctly update the keys and rules. Here are a few tips and tricks to help ease your experience with reading and updating…
-
Tune Rules with Find Duplicates Workbench
The Find Duplicates Workbench provides you with the capability to tune your Find Duplicates rules using machine learning. Once you have established your duplicate store with your initial settings, you will be able to start tuning your rules further. Example We are comparing some product information as generic strings. Here…
-
Identifying two names in a single string with the DelimitedField filter
Sometimes, the names data you have collected may not be in the best form to be parsed or standardized into the individual names components (Title, Surname, Forename) properly. You may have cases where your name field potentially contains multiple names, for example, Mark and Jane Spencer or Mr & Mrs Smith. In this case,…
-
Maximum cluster size
Find Duplicates comes with a default setting for maximum cluster size at 500. This setting is there to prevent excessive processing time and memory usage that will affect performance of Aperture Data Studio. If your cluster is too big i.e exceeding the maximum cluster size, you will notice that your records all have match…
-
Find Duplicates - Standardize Name and Address
When setting up your Find Duplicates step, the match column allows you to determine which columns you want to match with certain system-defined tags for Find Duplicates. These tags are used to determine how to process these columns when Find Duplicates is run. For Names and Addresses, you have the option to either use the…
-
Dashboard Custom content widget
The Dashboard Custom content widget allows you to add information such as text and links that might be useful to a customer such as details of their project integration and links to project docs. Example: This CAIS implementation uses version 4.0.0. See the release notes here:…
-
Find Duplicates with Phonetic Comparators
We now have the new phonetic comparators (Soundex, NYSIIS and Double Metaphone) for Find Duplicates that you can use to supplement the edit distance comparators (Levenshtein, JaroWinkler) for better match results. Why the need to supplement the edit distance comparators? Edit distance algorithms count the number of steps…
-
🎞️ A demonstration on data masking
Data Studio can suggest the hashing or masking of data based on a combination of data tagging and sensitive data classification. This short demonstrates this functionality
-
🎞️ Here is a short video demonstration on Training your own data tags
-
New Experian Data Quality Roadmap
I am delighted to announce that the interactive version of Experian Data Quality's product roadmap is now available on the User Documentation website, which is publicly available. We’ve collaborated with our customers to gain a deeper understanding of the information you’d like to access related to the product roadmap. As…
-
The chaining effect
What is the chaining effect? Find Duplicates in Aperture Data Studio works by identifying blocks or groups of duplicates, then comparing every possible record pair within a block based on rules to determine if they represent a single entity, represented by a cluster ID. Depending on how you have configured the rules, you…
-
Reviewing Find Duplicates Results
When you preview the Find Duplicates results, you will see the Cluster ID and Match status. Records with the same Cluster ID are identified as duplicates. The Cluster ID is influenced by the blocking keys that determine the record comparisons to be made. However, ultimately the duplicate records and match status is…
-
Smart Harmonization FAQ
As of release 2.4.5, you are able to turn on a preview of a new feature that involves smart models utilizing machine learning for harmonization. The smart models can be applied as Column specific rules at the additional options for Harmonize duplicates. What is harmonization? Harmonization is used to merge, blend or reduce…
-
Encryption-at-rest FAQ
Since release 2.4.5, you can turn on a new security setting at Settings>Security, Encryption to encrypt Data Studio resources. Here are some FAQs: What does resources include? Resources refers primarily to files in \data\resource folder and typically includes the imported datasets, snapshots and some cache files. Do I need…
-
Tuning Blocking Keys
Review Find Duplicates step results Reviewing the Find Duplicates results may not be the best way to confirm the effectiveness of the blocking keys. However, it does help to reveal obvious issues that may trigger further investigation. Once a set of Blocking Keys and Rules has been established at the Find Duplicates…
-
Building Rules
In order to start testing the Find Duplicates step, the Find Duplicates settings will also need to have a ruleset defined in addition to the blocking keys. When building rules, we will have to think about the following: How the Ruleset relates to Blocking Keys Blocking keys identifies potential matches. Rules determines if…
-
Profiling many datasets of different schemas
To profile thousands of datasets easily, you can create a reusable workflow with a replaceable source. However, if each of the datasets have a different schema, you will have to determine what is the most number of columns a dataset can have and use a generic schema with generic column names in your workflow. Assuming your…
-
Exact match and fuzzy match with Find Duplicates Step
Aperture Data Studio offers a Find Duplicates step that runs on a powerful standardization and matching engine. When you connect the Find duplicates step to your Source dataset, you will see that some configuration is needed before you can see results. The results of the Find Duplicate step provides the Cluster ID and…
-
Fuzzy Match with Regular Expressions
Imagine you have an address field containing variation of city names, for example: You need to determine the country based on the city name. You have an official list of countries and their cities like this: With any exact match techniques, some of the values will not be matched since they do not appear in the official…
-
Gauge Chart and Scoreboard chart with up/down tick
Gauge Chart A new chart type named Gauge chart has been added to the Chart step. The Gauge chart is particularly useful for highlighting metrics that may have several threshold levels. Here is a sample Data Quality report where the Gauge chart is used to indicate the Average Row Pass Rate. Scoreboard Chart with up/down…
-
🎞 Creating roles and assigning permissions
Data Studio gives you some very fine grain control over what a given user can do within the system. Here's a short video describing role creation and assignment.
-
🎞 Creating environments
You may have noticed that some users have more than one environment in their Data Studio Installation. Here's a video showing why, and how you can have multiple environments of your own
-
Selecting Best Record with Harmonize Duplicates
There are times when you just want a quick way to deduplicate your data without necessarily knowing how many duplicates are found. For example, you have a list of course completion status for a course OL100. There are multiple statuses recorded on different dates for each user. However, you are only interested in the…
-
Exact Match with List functions
There are a number of List functions that may be useful when you are trying to de-duplicate a list of values within a single column. Let's take a look at this dataset. How can we de-duplicate the list of fruits for each day? Connecting the dataset to a Transform step with List Frequency and List De-duplicate Functions will…
-
Exact Match with the Group Step
The Group step can be used as an easy way to identify and de-duplicate data that matches exactly. Let's take a look at an example, where you have a list of vehicles that may be duplicated. When you connect this dataset to a group step in a workflow, the results provide a count of each vehicle along with the unique list of…
-
Why worry about duplicated data?
What is duplicated data? Duplicated data refers to data representing the same entity. An entity can be anything such as a person, a company or a product. In the most obvious form, duplicated data refers to an exact copy of a record. However, this is not often as straightforward due to possible variations in the data that…
-
Quick introduction to Blocking Keys and Rules for Find Duplicates
The concept of Blocking Keys and Rules may be foreign to you if you haven’t already used the Find Duplicates step in Aperture Data Studio. Blocking keys identifies records that are similar, creating blocks or potential groups of matches. Rules compares every set of records in the resulting blocks, returning…
-
🎞 A Short Feature Demo on trending your data validation results in Data Studio
-
🎞 A Short Feature Demo on data validation in Data Studio
https://bcove.video/3hw8COt
-
🎞 A Short Feature Demo on data unions and find duplicates in Data Studio
-
🎞 A Short Feature Demo on dynamic workflows in Data Studio
https://bcove.video/3ygaHTL
-
🎞 A Short Feature Demo on data views in Data Studio
https://bcove.video/3hqh0Nt
-
🎞 A Short Feature Demo on dataset management in Data Studio
-
🎞 A Short Feature Demo on Profiling in Data Studio
-
Make your own reusable RIGHT function to return a number of characters from the end of a text string
A customer was asking if there is a way to achieve something similar to the RIGHT() function within Aperture Data Studio. We do have a flexible Substring function that you can use. Use -1 to indicate that the End position should be the last character of the Input value. Specify any negative number to return a number of…
-
Partition Values (Dynamic SQL Partition Row Number)
A question that's cropped up for a couple of clients recently is how do I segment my data and split/partition it? Sample Data (Starting Point) In the above example I have records which I'd like to split or partition based on a value in a specific field (this could be a category, source, date etc). Before I go through the…
-
Validate Customer data example Workflow
This simple validation workflow leverages the Sample Data of Customer V1, The FILTER step filters to just US Data The SAMPLE step only samples 1000 rows randomly (the randomization is done only once) The TRANSFORM steps replace _ with . for emails, reformat phones to the international standard and then generate boolean…
-
Unioning data example workflow with customer data
This workflow combines the sample data of Customer V1 and Customer V3 to demonstrate that the data from a common schema can be combined using a UNION step.4 The FILTER step reduces this to just UK data The BRANCH step sends the data in two directions, to an inline PROFILE step and to a SPLIT step. The profile, does a…
-
Joining Data example for Purchase Order Headers and Lines
This workflow uses PO sample data The PO header and rows, is joined using the order number This workflow performs a date calculation, adds an average aggregation based on a product ID grouping and then produces a reduced list of records sorted by average days to fulfil. You could consider this as a simple report workflow…