ferethai.blogg.se - Caseware idea filter characters

#Caseware idea filter characters zip#

Apply this to the Vendor and Employee databases.

This will provide a numeric indicator between 0 and 1 of how similar the first name is to the name that follows in the sorted database.

Use the following function to compare the sorted record value in this new column with the value below it and give a numeric indicator of the similarity: – note that the field name must be in quote marks.

Run a simple duplicate analysis at this time to identify exact matches.

To start the process of identifying duplicates and family members, it might help to sort the files on the new column creating a new database (INDEXing rearranges the records as a VIEW, SORTing the records will physically change the order of the records to match your criteria – and create a new database). Apply this in both the employee file and the vendor file. The first step would be to append a new column using the (lastname + firstname + middlename).

So “John George Richards”, “Jonathan Richards”, “J G Richards and “Mary Beth Richards” would all show up as matches using the backwards fuzzy match.Ī method to accomplish this would be to use the function in IDEA to compare the employee names. Formatting examples include:Ĭombined with inconsistent formatting and spelling, and the desire to find potential relatives of employees, we realized doing the fuzzy match backwards would target the last name Applying Fuzzy Logic to Acquire Clear Results 3 Practical Tips from Professionals Using IDEA New Levenshtein Distance Techniques Available in IDEA V9.2 The latest version of IDEA simplifies joining data sets together to identify possible infractions between employee and vendor master files using new Watch the video! no matter what the first/middle names looked like. However fuzzy matching can be applied to return a percentage match by reading data forwards and backward for a certain length.įor example “Westheimer Boulevard” can appear as “Westheimer,” “Westheimer Blvd.” Using a forwards match of the first five characters, approximately 80% of potential matches were identified, and the remainder was manageable enough to be manually identified and corrected.Ī backward fuzzy match is effective when trying to match customer and employee names. Variances in how names, addresses and numbers are entered or abbreviated within a database can quickly complicate the analysis process.

TIP #2 Matching Customer and Employee Dataĭata inconsistencies can make it virtually impossible to compare data. Due to the massive records in each database and the duplicates in the same database, the visual connector and were not used in this case. The process was replicated for each database for company-wide consistency. Possible duplicates that were identified as non-duplicates were deleted from the database.Īctual duplicates were sent to the customer department for cleanup. This process resulted in three files with possible duplicates, which were reviewed.

Detect duplicates on the new field from step 6 and generate a duplicate file with possible duplicates.

(Depending on the nature of the customer name, you may choose more or less characters to return, or use to return characters from the right)

Append a new field using (the new customer name field from step 5, 15) to return 15 characters from the new customer name field.

Append a new field using all special characters from Customer Name field, such as hyphens, periods, commas, “&”, and so on.

Detect duplicates on the new field from step 3 and generate a duplicate file with possible duplicates.Append a new field using (telephone) to include numbers from Telephone field (some telephone field includes brackets or hyphens.Detect duplicates on the new field from step 1 and generate a duplicate file with possible duplicates.

#Caseware idea filter characters zip#

Append a new field using (Customer Address + Zipcode) to include only numbers from addresses and zip codes.The fuzzy logic is true, false or possible, and the goal is to identify possible duplicates in each database, then identify possible duplicates across all five databases. Since each customer clerk had their own way of entering information into the system, identical duplicates are rare.

TIP #1 Identifying Duplicates in Single & Multiple DatabasesĪ company with five offices, each with individual customer databases containing 10 years of data, wanted to eliminate duplicate information and improve consistency by applying a conversion method. Here are some tips and techniques from experienced IDEA users to help apply fuzzy logic: Fuzzy logic techniques are an effective way to normalize data to identify potential matches, duplicates, errors or fraud.