Home » Company » FAQs

Quick Links

FAQs

Selected Questions and Answers

Important note: The FAQs below are not a comprehensive resource, and only address a fraction of the available capabilities in IRI software or questions people ask.

Please visit the IRI solutions and products sections to learn more. Also, do not hesitate to contact us if you have any questions, or need details on specific features or options applicable to your use case(s).

+ BI & Analytics

What kinds of analytics can Voracity perform?

IRI Voracity provides analytic capabilities in five ways, with two more pending in 2019:

1) Embedded reporting and analysis - via CoSort SortCL programs that write custom detail, summary, and trend reports in 2D formats complete with cross-calculation, and other incorporated data transformation, remapping, masking and formatting features. The reports can be descriptive, or through more fuzzy logic and functions like standard deviation -- and Boost-driven statistical and BIRT-driven linear regression graphs -- predictive.

2) Integration with BIRT in Eclipse - where at reporting time, BIRT charts and graphs you design get populated with 'IRI Data Sources' via ODA support for Voracity/CoSort SortCL output. What's interesting about this in-memory transfer of SortCL data and metadata to BIRT that the data integration/preparation get run when the report is requested, saving time as well as resources by having data prepared outside the BI layer (with CoSort or Hadoop engines)

3) Data preparation (franchising) that accelerates time-to-visualization for 10 third-party BI and analytic vendor platforms. This section of the IRI blog site features benchmarks run when SortCL (available to Voracity or CoSort users) alone runs ahead of BOBJ, Cognos, Microstrategy, QlikView, Splunk, Spotfire, R, and Tableau.

4) There is also a direct Voracity add-on for Splunk to feed data to Splunk for analysis there.

5) Integration with a cloud dashboard from DWDigest for interactive business intelligence you can customize and view in any browser, including the internal web browser in the IRI Workbench GUI for Voracity, built on Eclipse.

6) Streaming anlalytics (pending) through JupiterOne, where Voracity can be a source of Kafka-fed data streams, or a target for further procressing of live sentiment analysis data.

7) Colocation and integration with the KNIME Analytic Platform in the IRI Workbench IDE for Voracity, built on Eclipse, to allow citizen data scients to punch above their weight with machine learning, artificial intelligence, neural networks, unstructured data, and other advanced data mining nodes and projects.

Is CoSort a BI tool?

Which third-party BI/analytic platforms does CoSort (or Voracity) optimize?

+ CoSort & Compatible IRI Product Installation

How do I install and license the product(s)?

What is the difference between a fixed and floating license?

What version of RHEL are compatible with which versions of CoSort?

Do I need administrative access to my databases to connect to IRI Workbench (via JDBC) and/or the SortCL base engine (via ODBC)?

What is the recommended hardware sizing configuration / system requirements for jobs requiring SortCL (including CoSort/Voracity) or its subsets/spinoffs?

IRI Workbench is the Eclipse front-end (graphical job design client) piece, which runs on Windows, Linux, and macOS. The back-end SortCL program in the CoSort package/Voracity platform -- which is the data manipulation, mapping, munging, masking, and mining engine -- runs on all the above, plus all flavors of Unix including AIX, Solaris, HP-UX, and z/i/pSeries Linux. Native hosts or virtual machines should be considered similarly for this discussion.

Again, because SortCL is the back-end engine, and IRI Workbench is the front-end, for multiple IRI products, this answer applies not only to CoSort (sort) and Voracity (ETL, etc.) jobs, but also to FieldShield, RowGen, NextForm, and DarkShield. The requirements for Windows and Linux are similar as far as hardware is concerned.

The absolute minimum requirement for CoSort/SortCL CLI operations (only) is 40MB of RAM, but at least 512MB-2GB is recommended as a minimum for system RAM available to the CoSort user. Sort jobs that can fit entirely in memory are generally faster, and it's not uncommon for modern CoSort hostnames to be configured with 64GB-2TB of RAM to sort without the I/O overhead of work files.

The minimum configuration for Workbench is 4GB of RAM and 10GB of free disk space, after the installation of any VMs, DBs, etc. Workbench includes a JRE. Workbench and CoSort are tested and supported back to XP on Windows. We also test with various major Linux distributions of both Debian and Red Hat package management standards. However, 6GB and up works best for each system to accommodate multiple database connections and table parsing for metadata and job definition. In fact, for schema with hundreds of tables to enumerate, as much as 64GB of RAM could be appropriate for the Workbench machine(s) where DB-related jobs are built.

We recommend where possible co-location of the licensed back-end (SortCL executable) on, or within close network proximity to, your database source or target server(s) for performance reasons, particularly if there are known network bottlenecks. When it comes to data masking in FieldShield, DarkShield, and test data generation via RowGen, the bottleneck is typically in network performance AND I/O; the time it takes you currently to read and write your data now is roughly the time it will take you to mask, subset, or synthesize it, too. This is therefore another reason for same-system colocation where possible, or at least optimal I/O subsystems for the software use in volume (e.g., fibre channel, SSD, multi-core, fewer conncurrent processes, etc.).

If you are going to also run that CoSort engine on the same PC as Workbench, then you will have to allow extra capacity to run jobs. The requirements for the program increase with the size of the data that you intend to process at any one time. A recommended hardware platform for a PC running Workbench would be 8GB of RAM, and 10GB available disk space, plus additional disk space for temporary files equal to 1.5 times the largest data set to be processed. A general guiding principal for hardware is that the more RAM, the better the performance.

For more information on CoSort-specific tuning recommendations, see this article:
https://www.iri.com/blog/iri/business/frequently-asked-cosort-tuning-questions/

Many 'big data' CoSort and Voracity users license and cosortrc-tune the product on very large multi-core Unix systems to leverage hundreds of GB of RAM to out-perform Hadoop for example. If you run the Hadoop edition of Voracity, load balancing should be automatic. For DarkShield masking jobs through the API, multi-node load balancing is also possible through the NGINX reverse proxy server.

What is the recommended type/config (RAM, etc.) for an Amazon EC2 instance running IRI Workbench and the back-end SortCL program (for Voracity/CoSort/FieldShield/NextForm/RowGen)?

Would it be possible to virtualize (Vmware) the Windows CoSort license instance and then let users remote desktop into it?

Can IRI software run in a container (like Docker)?

How do I install IRI Workbench for Linux or macOS?

+ CoSort Installation Issues

How do I set up the correct COSORT_HOME environment on Windows?

I am getting an invalid key (error) message from CoSort on Windows. What can I do to fix this?

(When) can I use the same Serial Number?

I have CoSort Unix. After installation, I am getting a file error.

+ CoSort Sorting Apps & Performance Tuning

What is CoSort?

How fast can CoSort run?

At what data volume would I need CoSort? Are there limits to how much data or how many files CoSort can process?

How would I tune CoSort? What would I tune?

Usually through a CoSort Resource Control (cosortrc) text file, which can be global, user, and/or job-specific. On Windows, default registry settings also set up at installation time and can be overridden by an rc file. You can specify a ceiling and floor on CPU/core threads and memory, I/O buffers, and allocate/compress disk space for sort overflow. There are several other documented job controls also specified at setup, and easily modified (or secured) later.

Typically the single most important factor in file sort performance is the speed of the I/O channels. For small files, which may be sorted entirely in memory, this means optimizing the reading of the source files, and the writing of the target files. For large files, which will exceed the available memory, The throughput to the work area files will also be critical. Many times, the source and target file locations are fixed, and little can be done to improve their I/O performance. However, the local work areas where the temporary merge files are stored can often be optimized. For example, having more than one fast SSD type local drives, on separate controllers, can make overflow sorts almost as fast as in-memory sort jobs.

The location of the overflow files is specified by the WORK_AREAS tuning settings. Multiple locations can be specified, but there is no advantage to using multiple locations unless they are on separate devices and I/O channels. At a minimum, there should be at least one fast local drive that is not shared with either the source or target files.

After addressing the performance of the I/O channels, the next most important tuning settings are the ones related to memory usage. Fortunately, CoSort 10 has some new and powerful techniques to self-tume the memory allocation. We recommend starting with the CoSort 10 default settings for memory settings. This is best accomplished by having only the single setting of:

MEMORY_MAX AUTO

in the $COSORT_HOME/etc/cosort.rc file. Other memory related settings, like BLOCKSIZE and COMPRESS_WORKFILES will then be controlled by the intelligent algorithms in CoSort 10. Here is an example tuning file using the recommended settings. Of course you will want to set THREAD_MAX to a value up to the number of CPU cores that your license allows. In this example, the two work areas would be on local solid-state drives on two separate dedicated controllers.

THREAD_MAX 6 # Maximum number of sort threads

THREAD_MIN 1 # Minimum number of sort threads

MEMORY_MAX AUTO # Internal sort memory ceiling

WORK_AREAS /mnt/ssd1/work # Overflow (temp) file paths

WORK_AREAS /mnt/ssd2/work # Overflow (temp) file paths

MONITOR_LEVEL 0 # Runtime monitor level

MINIMUM_YEAR 70 # Century window

ON_EMPTY_INPUT PROCESS_WITH_ZEROS # Output file option

OUTPUT_TERMINATOR INFILE # Output terminator

Please email to support@iri.com with any additional questions or concerns.

What is the relationship between the cosortrc MEMORY_MAX, THREAD_MAX, AIO_BUFFERS, and BLOCKSIZE parameters?

How do we determine the best number of cores to use / threads to license?

CoSort will begin using multiple threads when the sort (input file/table) volume is at least two times larger than the BLOCKSIZE specified in your CoSort Resource Control (cosortrc) file, which is typically 1 to 2 MB in an auto-tuned job.

CoSort does not make any distinction between physical CPU chips, CPU cores, or hyper-threading. We do not attempt to micromanage which device a thread is created on. The operating system is in the best position to determine where to schedule new threads. CoSort just creates the sort threads, up to the maximum number specified in the tuning file, which cannot exceed the number of threads supported by the license. Cores are usually the best indicator of what's possible to expect in terms of peak performance before the point of diminishing return, subject to resource contention and Amdahl's Law.

Note also that each CoSort sortcl process is independent of all others. There is no inter-process communication or synchronization taking place, so in a concurrent multi-job environment, specifying only 1 or 2 max threads will likely be more efficient, even when each job running simultaneously may be high volume. Memory, however, is self tuning when on MEMORY_MAX is set to AUTO.

For testing different numbers of threads, we recommend that you request a license key through the normal CoSort installation and registration process (as instructed in the installation guide) for a temporary period, wherein you specify the total number of physical cores on the host machine. Then once you get the license keys to allow that number (up to 64), run jobs with different THREAD_MAX values (from 1 to the max) in your cosortrc file. You can also experiment with memory and overflow-related settings to see what works best.

Please advise support@iri.com if you need more details, and consult Section D of the Appendix chapter of the CoSort manual for technical specifics, and a way to automate the benchmarking process. Remember that your runtime results are logged so you can analyze performance off line later.

Is there any way to bind a CoSort thread to a specific core?

If my CoSort 10 rc value for MAX_MEMORY is 10%, is 10% of RAM the limit of what's available to any number of sortcl* jobs started in parallel (no matter how many threads they spawn)?

Does the BLOCKSIZE parameter impact the amount of memory used, and would be better to set a higher value for big jobs?

Can I increase AIO_BUFFERS beyond 2, and if so, what other parameter(s) should be adjusted?

What about monitoring and logging (statistics)?

What data types does CoSort support?

How do I suppress $SORTIN/OUT lines when converting MVS sort parms to CoSort SortCL scripts?

How and where does your SAS sort replacement work?

How do these plug-ins or conversion tools work?

+ Data Security (PII Classify, Search, Mask) -- IRI *Shield Tools (Voracity / Data Protector Suite)

Since DarkShield can also mask PII in my flat files and relational databases, when/why is FieldShield still recommended ... what's the difference between them?

N.B. The information below summarizes the details of this comparison article you may wish to read instead/as well.

Either:

IRI FieldShield for structured data, connected via both ODBC and JDBC drivers per DB-specifc installation steps inked here

IRI DarkShield for structured data, plus what's in C/LOB columns, and connected only via JDBC.

can find and mask sensitive data in relational databases, and are priced the same for them. Your use case should determine which one is a better fit.

The principal differences between FieldShield and DarkShield are that:

DarkShield can search and mask separately or simultaneously, while FieldShield performs them separately;
DarkShield cannot as handily map masked results to different kinds of targets, while FieldShield can go to other structured DB, file, and report targets (even at the same time, ETL-style);
Only DarkShield can also handle semi- and unstructured data source masking in case of you EDI files, raw text, documents, images, NoSQL, etc.;
Conversely for purely structured sources, FieldShield masking jobs are metadata-compatible with subsetting, incremental masking, test data synthesis, cleansing, ETL, reporting/wrangling, etc. thanks to its use of the SortCL data definition and manipulation program; and,
FieldShield is needed to support input phase filtering via SortCL include/omit or SQL query syntax -- as well as complex target field logic to address business needs, which may involve combined masking and string transformations, joins, conditional masking rules, reformatting, etc.

However, both DarkShield and FieldShield:

run on premise by default, but can also run in the cloud, and support LAN, Sharepoint, and Azure, S3 and GCP buckets;
use the same data classifications and masking functions to maintain structural and referential integrity in the target schema;
share the same IRI Workbench graphical IDE, have callable APIs, can be integrated into DevOps pipelines, and can run in CLI jobs;
are included components, along with IRI RowGen (for DB subsetting and synthesis) et al, in the IRI Voracity data management platform; and,
are subject to similar considerations for licensing in DB environments, which are advised in this FAQ.

It is not uncommon to license both products in a discounted bundle or Voracity platform transaction to satisfy multiple use cases, including for database sources alone. Please email info@iri.com with details about your requirements and request an online meeting for a discussion and/or live demo so you can be fully informed.

What are the key skills required during the deployment phase and what skill-sets are required for the users of the tool?

Which IRI data masking (shield) product should I be looking at?

How do I get ciphertext consistency (and thus referential integrity) with your masking tool?

We're excited about DarkShield but aren't seeing much online on how it works, uses machine learning, handles faces, etc.

Can we run DarkShield searching or masking jobs outside of IRI Workbench; i.e., can we call them into our application(s)?

Does the tool already have the masking rules predefined for the financial sector?

Which protection techniques can I use? What should I use?

Can you protect only flat-file fields, or database columns, or both? And why is your approach better than built-in database column encryption?

How can your product(s) mask data in MongoDB?

How does your product compare with MS SQL TDE (Transparent Data Encryption)?

1) TDE is only AES & 3DES encryption, and only for MS SQL RDBs.

IRI includes more encryption functions plus many other masking functions, and covers multiple sources with them, including other RDBs on-premise or in the cloud, legacy (COBOL, etc. files), free text (files or columns), JSON & XML files, MS Office and PDF documents, NoSQL DBs, and image files.

2) TDE is not column-specific; encryption applies broadly instead.

IRI allows encryption and other data masking functions to apply at the field or row level ad hoc, conditionally at the value level, and consistently at the column level across multiple tables or schema (and other sources) as a data class rule to preserve referential integrity, including with format preserving encryption.

3) TDE application is tied to SQL, and thus vulnerable to SQL injection or other hacks that can reverse its (entire) application.

IRI masking is performed statically on specific columns with potentially differential functions on different columns. Thus SQL can't reverse the data, and even if a hacker had access to our executable and knowledge of the script, plus one of the decryption keys, that would only potentially expose one column.

4) TDE's initial encryption and final decryption processes are computationally expensive; TDE reads and encrypts every DB page.

IRI field-level encrypt/mask jobs are I/O bound only; they happen at ODBC read/write speed, which can also be reduced through incremental masking (changed rows). The actual masking functions do not impose much incremental overhead above that.

5) TDE requires Azure Key Vault (EKM) to manage encryption keys.

IRI supports Azure Key Vault as well, but does not require it; each key can be stored locally in the file system or as an environment variable. IRI also supports the more robust Towsend Alliance Key Manager system.

6) TDE does not interoperate directly with other metadata systems and related applications.

IRI FieldShield functions are part of the SortCL program and 4GL, allowing masking on the fly within the same job scripts with ETL, CDC, DB subsetting, reorgs, data cleansing, migration, reporting, etc. FieldShield metadata is also shared with IRI RowGen for test data synthesis, is supported by MIMB and erwin platforms for interoperability with other data modeling, ETL and BI tools metadata, L, and is easy to learn, modify, share, use, deploy and schedule across different operating systems.

7) TDE is not integrated with built-in PII data classification or search, re-ID risk-score and audit trails, or third-party SIEM tools like Splunk ES.

All of the above is included with IRI FieldShield or DarkShield in IRI Workbench.

8) TDE is not supported in purpose-built test data provisioning systems for DevOps.

FieldShield and RowGen operations integrate directly into Value Labs test data hub, web services, or virtualized DB cloning tools like Actifio, Commvault, and Windocks (for containerized MS SQL images on demand).

How does your product compare with the Oracle Data Masking & Subsetting pack?

How would I use FieldShield or CoSort (SortCL in Voracity) to protect sensitive fields in my database tables, flat files, or 2D reports?

I am looking for a way to make a copy of a production database for UAT that removes/randomizes sensitive values (not just hide them). Does FieldShield do that?

Does the software support in-place masking, or only from source to target?

What happens if a masking operation fails due to a schema constraint?

Does your provided pseudonym option guarantee consistency; e.g., that all Peters will become Pauls? Or do we need to specify your option to "Use your own pseudonym list” and create a special crosswalk between the names in our database and fake names, and “Use original field as a look-up into pseudonym list”?

In IRI data masking products like FieldShield, CellShield, and DarkShield, the term pseudonymization means replacing (substituting) an identity with another. Use cases can involve values that are consistent and reproducible, some of those can be reversible or recoverable, while others can just be random.

All IRI pseudonymization techniques require the use of a set file with replacement values. If the replacement values need to be consistent, as you suggested, then there must be two columns of values separated by a tab character. We call these lookup set files (your crosswalks).

The requirements for a lookup set are pretty simple. Each possible value in the unprotected data set (column) must be present in the left-hand side of a row in the lookup set file. Every left-hand value must be followed by a tab, and then the unique replacement value. Finally, the lookup set file must be sorted on the left-hand column values.

In some circumstances, our application can create the set files for you based on the existing data and an optional list of possible replacement values. However, that can not be done when pseudonymization is applied by way of a non-data-specific rule. The only lookup sets that can be created for consistent replacements is when doing so from the field editor when creating or editing a single masking job. The created lookup set will be created with one left-hand value for every value in the source data at the time the set is created.

When the distinct values in the source data are known, there are two ways to furnish the replacement values. The user can specify a file containing replacement values, or we can read the original values from the source and then shuffle them to use for replacement values.

If the original data set is small, it is probably better to supply the replacement set in another, single column set file. However, when the set of original values is large, it may be sufficient to reuse a shuffled version of the original values as replacements. This is especially true when pseudonymizing large sets of values such as names, street names, cities, etc.

Here, there will be a finite number of values available. So it doesn't matter so much if you just shuffle the original value set. For example, no matter if the replacement names come from a separate set of first names, or a shuffled set of existing names, there will no doubt be common first names such as Peter and Paul in the replacement set, as well as the original values.

You have already seen the wizard page for pseudonymization as a rule. This is the page for pseudonymization as a field attribute. This screen is available when editing a field from the target field editor, or by selecting "Edit" from the context menu of a field in the outline view of a job script.

You may also find some of our blog posts informational on this topic:

https://www.iri.com/blog/data-protection/consistent-cross-table-data-pseudonymization/
https://www.iri.com/blog/data-protection/what-is-pseudonymization/ 
https://www.iri.com/blog/data-protection/data-masking-function-use/  

Can I apply data masking functions to BI/DW tasks or tools?

What methods can I use to mask data in NoSQL DBs like MongoDB, Cassandra, or ElasticSearch?

To control storage use in SharePoin, should we create a DarkShield job that identifies, classifies, and masks, or run it on demand?

For CellShield, how do I know if I need the 32- or 64-bit version?

DarkShield is a local app, do you plan to make it SaaS in the future?

What are the DarkShield service parameters for determining server resources?

What data is transmitted via the CellShield plugin to external devices on the Internet, and if it is transmitted can you specify the data which is being transmitted?

How robust is your encryption?

For administration, can only server administrators run the masking? That is, can we delegate on-demand masking to end users? Is there a web portal for this? Or if a local client, or users need need to access the server? How do other clients manage this and what do you recommend?

What about auditing these activities?

Besides the IRI masking tool search and audit reports, can I export that info to a SIEM?

+ Data Governance > Data/Metadata Forensics & RBACs

Does IRI software support RBAC?

Can we set up users and assign privileges to them in IRI Workbench?

Can roles be used to segregate administrative functions like managing data or metadata access rules, or managing role memberships. How about access to activity log data for auditing and reporting?

How do you control access to the product based on roles?

Can users be assigned multiple roles?

Can you define different roles for different data sources? How about different access rights based on those roles?

Does Voracity (FieldShield, or other IRI software) independently check who the end-user is attempting to access protected data, or does it rely on the underlying database or application access controls?

+ DW ETL Operations

What is Voracity?

What does Voracity include?

We already spent a fortune on our ETL tool. Can you just help use run its jobs faster?

Can we replace our legacy ETL tool automatically?

Why should I consider IRI Voracity? What does Voracity offer that others don't?

Some of the things that Voracity offers that legacy and open source ETL (much less ELT) tools do not are:

Built-in data profiling tools for flat-files, databases, and dark data (unstructured) document sources
Raw power and scalability with or without Hadoop; i.e., built-in performance in volume, but also seamless support for Hadoop!
A negligible learning curve: simple, explicit, accessible, and open text metadata you can easily use, modify and share
The ability to deploy jobs outside the GUI, running them via command line, batch, or any program via system or API call
An open source GUI you already know (Eclipse) that front-ends proven, robust manipulations and reports on big structured data
Advanced aggregation functionality like lead/lag, ranking and running, multiplication and expressions
Multiple nested layers for both conditions and derived fields with support for PCREs, fuzzy matching, C (math/trig) functions, locale and 'conversion specifiers', etc.
Composite data value definition for both production data (format masking) migration and test data generation
Built-in: data and DB profiling, migration, replication and administration
12 field protection (static data masking), DB subsetting, and synthetic referentially correct test data generation
Data-centric change data capture, slowly changing dimension and detail and summary reporting, plus trend (predictive analytics), and web log (clickstream analytics) reporting
Seamless metadata integration with Fast Extract (FACT) for major RDBs, plus Hadoop, AnalytiX DS and MIMB-embedding platforms
Superior price-performance, fast ROI, and immediate access to US-developer support

Another way to consider the differences is by looking at what Voracity's does not require, and why:

With Voracity, there is no need for:	Because Voracity:
separate transforms or transform stages	can combine filter, sort, join, aggregate, pivot, remap, custom and other transforms in the same job script and I/O pass, though it can represent and run them separately in separate task blocks
partitioning, manual or otherwise	automatically multi-threads and uses other system resources only your resource controls limit, and does not push transformations into the database layer where there are inherently less efficient
manual metadata definition	provides automatic metadata discovery and format conversion tools, and is supported by AnalytiX DS Mapping Manager and CATfx templates, as well as MITI's MIMB platform
separate BI (reporting) tools	can produce custom-formatted details and summary reports in the same job script and I/O pass with all the transforms, and/or hand off data to files, tables, or ODA streams in Eclipse for BIRT
separate data masking tools	includes every single function in FieldShield, the most robust data masking and encryption tool available.
separate test data tools	all the functions of RowGen, which can generate safe (no need for production data), intelligent (realistic and referentially correct) test data for DB, file, and report targets
long-term consulting	uses an already familiar Eclipse GUI and metadata defining both data and ETL processes
separate MDM hubs or data quality tools	has a wizard for MDM, plus support for: composite data type definitions, master data value lookups, joins, tables and set files suitable for production or test data
a new team sharing or version control paradigm	metadata repositories and job scripts work with any source code and metadata version control system, including AnalytiX DS and GIT, CVS or SVN in Eclipse
concerns about open source or support	is backed by IRI, a stable 38-year-old company with more than 40 international offices
a huge budget now, or a lease renewal headache later	is sold at affordable prices for perpetual or subscription use

Is CoSort an ETL tool?

Does CoSort work with other ETL tools or tasks?

Can we use CoSort or Voracity with DataStage?

Can we use CoSort or Voracity with Informatica?

We have hundreds of tables and files, and thousands of fields already defined for processing in our ETL tool(s). How can we exploit CoSort / Voracity on our flat files or other sources and leverage existing field layouts (i.e. not re-define them manually)?

We use CoSort already, so don't we already have Voracity?

What about data quality?

What about job scheduling?

+ Flat Files

What do practitioners say about this?

Why does CoSort rely on flat files? Why should I?

Why is this approach more cost-effective?

Through it's Sort Control Language (SortCL) program, users of the IRI CoSort product or IRI Voracity platform can leverage the resources of their existing file systems to perform these kinds of jobs without the overhead and administrative constraints of databases and SQL procedures -- not to mention the cost of megavendor ETL tools and ELT appliances, in-memory DBs, or complex Apache projects.

Through SortCL, you can perform and combine many of these activities simultaneously against multiple data sources of any size:

Data Transformation > select, sort, merge, join, aggregate, re-map, pivot, cross-calc, etc.
Data Cleansing > enrich, evaluate, filter, reformat, and validate data across disparate sources
Data Governance > manage and mask master data, manage metadata, improve data quality
Data Migration > remap file formats, data types, endian states, record formats
Data Replication > copy, shift, enrich, and re-purpose data from one or formats/platforms into others
Data Federation > virtualize ad hoc mash-up views and formatted reports, or feed direct BIRT displays via ODA
Data Masking > de-ID, encrypt, hash, pseudonymize, randomize, redact, tokenize and otherwise obfuscate fields
Data Presentation > get 2D BI via detail, delta, and summary reports in custom formats, even with embedded HTML
Data Franchising > filter, pivot, transform, and segment data into CSV, XML and ODBC hand-offs for BI tools
Data Staging > scrub and prepare bulk data for other ETL tools, databases, data and spreadmarts, and analytic platforms
Data Prototyping > generate safe, intelligent, and referentially correct DB, file, and custom-report-formatted test data

So, to process, present, protect, and prototype big data, SortCL and flat files are still the fastest, and most cost-effective approach to consolidating information lifecycle management (ILM) activities. And if you keep your data in HDFS, many of the core data transformation, masking, and test data generation functions designed in SortCL can run in MapReduce 2, Spark, Spark Stream, Storm or Tez through Voracity's Hadoop gateway (called "VGrid") without re-coding anything!

The perpetual use (CapEx) price of IRI data de-identification software (like IRI FieldShield) is usually based only on the number of hostnames where static data masking jobs run; i.e., wherever the executable/engine (SortCL program) is installed. The price does not vary by the number of cores, IRI Workbench (job design GUI) users, data sources, rows, functions, features, etc.

The same considerations below apply to IRI Voracity (data management platform) licenses which include FieldShield, CellShield EE, DarkShield, RowGen, CoSort, etc. DarkShield licenses used for database (RDB or NoSQL) sources with C/BLOBs are priced the same as FieldShield. For DarkShield masking of files and documents and NO databases involved, pricing per hostname is not fixed but volumetric, yet the same considerations below apply.

Base FieldShield and DarkShield can be purchased standalone (or discounted together) at one-time license fees that cover perpetual use, documentation, and the first year of technical support (for standard use cases). There are discounts on multiple and runtime (see below) licenses, and both FieldShield and DarkShield (as well as CellShield EE and RowGen and other included components) are also available (and more affordable in volume) with Voracity (see below).

At least one license of a standalone product (engine) is needed to mask data, though the base/first Voracity price tier includes five executable licenses. FieldShield users will need at least one (heavily discounted) local dev/test version of the executable in addition to at least one full-priced production server license. With FieldShield, Linux/Unix licenses cost about 1/3 more than the Windows version, but there are discounts for multiple licenses.

Generally, the total number and location of masking licenses needed should take factors like performance, convenience, and budget into consideration:

In performance terms, the number of licenses may match the number of (major) database servers in use ... particularly where tables volumes are large. While it is not a one-for-one requirement, masking jobs run faster if the masking engines are installed/licensed on each large DB source or target system (to avoid both network and I/O contention). The best way to know your tolerable degradation point on a given install is through testing.
In convenience terms, it's very common to also license the engine for local use on one or more IRI Workbench client dev/test PCs, where masking jobs are first designed and debugged, as well as on disaster recovery/failover systems. Regardless of the number of licenses however, IRI Workbench is always free to distribute.
In budget terms, additional perpetual-use production masking licenses of IRI *shield tools procured simultaneously (along with the first one at full price) are discounted 20%. Annual maintenance (support) renewals for those same *shield products are offered at 20% of their perpetual use license fee(s), and include upgrades, plus limited license transfer and disaster recovery rights. Maintenance is included with Voracity platform annual subscriptions, but offered at lower renewal rates for perpetual use Voracity licenses.

Additional Pricing Considerations

1) Voracity also includes one IRI CellShield EE Excel-side license, plus the same 5 "SortCL" engines that support data masking, DB subsetting, IRI RowGen test data synthesis, CoSort-powered data integration (ETL) and migration, data quality, analytics, and much more. Voracity has higher tiers for broader deployments (e.g., Tier 2 up to 15 hostnames, Tier 3 up to 50, and so on) where the unit pricing drops in higher volumes, though you can only license in tiers, not per hostname, in the case of Voracity. Annual Voracity (OpEx) subscription license include support whereas perpetual use (CapEx) licenses only include support in the first year.

2) For dynamic data masking applications through the FieldShield 'Sandkey' API library (SDK) or DarkShield REST API -- or for runtime decryption sites using static FieldShield or DarkShield engines without IRI Workbench, documentation, or support -- there is more deeply discounted, volume-tiered, pricing. That is offered in a royalty covenant attached to a primary static data masking development license (which covers IRI support and the other features above). Dynamic data masking performed on the same systems/s) (hostname/s) on which the static data masking license is already installed does not require additional licensing, however. Note that real-time (incremental, CDC-based) data masking via IRI Ripcurrent is only available (and included) with Voracity.

3) IRI is planning another, separately available proxy-based dynamic data masking system for database applications and pricing will be different, but discounted with the rest of FieldShield for data classification, discovery (searching), and static data masking. Similarly, IRI also offers premium bundled options for those interested in advanced DevOps and TDM through data and database virtualization; i.e., IRI-integrated solutions like Windocks (or Actifio or Commvault) or ValueLabs TDH (or Cigniti) for containerized or on-demand database/file instances scrubbed through Voracity masking, cleansing, subsetting or synthesis jobs.

4) See this FAQ regarding cases where additional or fewer hostname-specific licenses may be needed in the future.

Please contact info@iri.com if you have further questions, or need help pricing an IRI data masking or management product for your environment.

What happens if I need more (or fewer) licenses for an IRI product already purchased (like CoSort or FieldShield), or under subscription (like Voracity)?

How much does the CoSort package cost?

How are IRI's other offerings priced?

IRI Workbench, the graphical IDE built on Eclipse, is included with each Windows or Linux hostname on which a licensed IRI software product (in this list) runs.. A MacOS version is also available on request and is free only to licensees of one of the products below.
IRI CoSort is priced for perpetual use per hostname according to installed RAM aboard x86 Windows and Linux systems, and the number of licensed threads (ttpically up to the number of available physical cores) on each. On AIX, Solaris, HP-UX and zLinux, prices vary by OEM hardware model.
IRI Voracity -- which also includes CoSort, NextForm, FieldShield, DarkShield, RowGen, and Ripcurrent, is typically a subscription (one or five years) based only the total number of SortCL-executing hostname licenses, which are usually database or ETL servers. There are additional charges for Hadoop and other integrated (premium feature) options available through our partners. Perpetual use pricing is also available on request.
IRI FACT (VLDB unload) and IRI RowGen (test data generation) are priced per hostname according to the number of licensed CPU cores (threads) licensed for use on each. Note that FACT is supported in Voracity, but must be separately licensed.
IRI NextForm has multiple editions and prices. Find details here.
IRI FieldShield (static data masking) is based on the number of executing hostname licenses only, with higher costs on Linux and outside the US through IRI VARs. Runtime API calls to FieldShield SDK (dynamic data masking) functions are at various percentages of the full package price depending on volume.
IRI CellShield and IRI DarkShield pricing is based on the number of spreadsheets / documents that need protecting. DarkShield costs the same as FieldShield however, when used to mask data in a relational or NoSQL database.
IRI Data Masking as a Service (DMaaS) is performed at daily, hourly, or per-project rates.

Refer to the licensing information in each product section on this site for price ranges. Contact your IRI representative for more information and an NDA-confidential quotation for the use of any IRI software product, or for an IRI Professional Services engagement estimate.

How much is maintenance?

What does perpetual use mean?

+ ISV (OEM) Integration

Do you provide an SDK or IDE?

How can I tune large job performance on my customers' disparate platforms whose specifications are unknown?

How does the licensing work?

How long will it take to integrate CoSort?

How much are CoSort royalties?

Is CoSort tuning only global, user, or job specific?

What can I integrate?

What kind of performance (improvement) can I expect?

Will I pay for a development copy? How much?

How can CoSort improve my JOIN performance?

How can CoSort speed Oracle loads?

How does FACT improve extract performance?

How would CoSort replace SQL procedures?

We have hundreds of tables and thousands of columns defined in tables, though these are also represented as external files. How can we leverage SortCL without having to re-define these layouts by hand?

On	In
CP/M	1978
DOS	1980
Unix	1985
Linux	1990
Windows	1995
IBM i/Z	2000

Feature	Oracle Data Masking & Subsetting Pack	IRI FieldShield / Voracity	Reference URLs
Automatic Discovery of Sensitive Data and Relationships	Application Data Modeling automatically discovers columns from Oracle Database tables containing sensitive information based on built-in discovery patterns such as national identifiers, credit card numbers, and other personally identifiable information. It also automatically discovers parent-child relationships defined in the database.	The IRI Workbench graphical IDE, built on Eclipse, front-ends all data discovery -- DB profiling (which includes statistical information, integrity checking, and value searching), schema-wide PII searching and classification, and ER-diagramming functions for Oracle and other RDBs through IRI FieldShield. We are also working to implement machine-learned, NLP modeled named-entity recognition, which is already part of IRI DarkShield for PII search/mask ops for unstructured data.	https://www.iri.com/products/workbench/fieldshield-gui/profile

Extensive Masking Format Library and Application Templates	Centralized, extensive and customizable library of predefined masking formats such as national identifiers, credit card numbers, and other personally identifiable information facilitates ready-to-go masking formats. Downloadable masking templates for select versions of Oracle E-Business Suite and Oracle Fusion Applications further simplify the task of defining masking rules.	Same, and more. IRI users can select and re-use provided or custom-defined patterns for multiple NID, credit cards, phone numbers, and other data formats for searching, classification, masking, and prototyping purposes They can also make use of provided or custom-written test data generation functions to produce computationally valid NIDs. While it does not have downloadable templates specifically for Oracle, FieldShield users can define their own formats, and choose from 14 different categories of data masking functions (or "roll their own), and apply them every version of Oracle.	https://www.iri.com/solutions/data-masking/static-data-masking AND https://www.iri.com/products/workbench/data-sources

Comprehensive Masking Transformations	Comprehensive masking transformations caters to different masking use cases such as masking based on a condition, generating consistent masked outputs for a given input and more.	IRI FieldShield provides more data masking functions (see URL above), and preserves consistency and referential integrity capabilities using ad hoc or stored rules matched to pattern-matched column names or pattern/value-matched data classes. In addition, IRI Workbench supports more job design and management options, along with compatible data management jobs in the same free Eclipse pane of glass.	https://www.iri.com/products/workbench/fieldshield-gui/apply-rules AND https://www.iri.com/products/workbench

Multi Factor Subsetting	Sophisticated subsetting techniques facilitate generating subsets of data based on goals such as percentage of database size and percentage of rows in a table or based on conditions like region, time, department and more.	A database subsetting wizard is included with IRI Voracity subscriptions which automatically creates referential correct, masked or unmasked subsets of any connected database schema. Conditions can be tailored in dialogs during wizard generation or in the scripts. In addition, robust test data generation (from scratch) with all kinds of realistic conditions can be custom-defined.	https://www.iri.com/solutions/test-data/db-subsetting AND https://www.iri.com/solutions/test-data#techniques AND https://www.iri.com/blog/test-data/making-realistic-test-data-production/

Fast, Secure and Heterogeneous	Masking and subsetting can be performed on a cloned copy of the original data, eliminating any overhead on production systems. Alternatively, masking and subsetting can be performed during database export, eliminating the need for staging servers. Masking and subsetting can be performed on data in non-Oracle databases by staging the data in an Oracle Database using the relevant Oracle Database Gateway.	The same, and more. Masked (or subset) targets are usually sent to alternative target tables, which can be created in IRI Workbench, that same pane of glass built on Eclipse for all IRI job design which also serves for cross-DB administration. Actual cloning and masking is also available by tying FieldShield masking jobs into Commvault or Actifio-driven Oracle DB snapshots. And, IRI Voracity users can do data transformation, migration, and cleansing in their masking jobs.	https://www.iri.com/solutions/data-integration/etl AND https://www.iri.com/blog/data-transformation2/creating-executing-sql-statements-in-iri-workbench/ AND https://www.iri.com/ftp9/pdf/FieldShield/Actifio-FieldShield-DBClones.pdf AND https://documentation.commvault.com/11.20/configuring_third_party_data_masking_for_oracle_database_clones.html AND https://www.iri.com/news/newsletters/4th-quarter-2019

15 ADDITIONAL DATA MASKING AND RELATED CAPABILITIES AVAILABLE ONLY IN IRI FIELDSHIELD / VORACITY
		More supported static data masking functions to choose from, including custom-defined functions	https://www.iri.com/solutions/data-masking/static-data-masking
		Ability (via DarkShield) to classify, search, and mask inside unstructured (LOB, XML, JSON, CSV, PDF, free text, image) columns inside RDBs like Oracle, too, using the same data classes and masking functions!	https://www.iri.com/blog/data-protection/darkshield-relational-databases/
		Fit-for-purpose job creation wizards for automatic single- multi-source masking jobs and work flow generation	https://www.iri.com/services/training/courseware#governance
		Multiple job design, modification, execution (deployment), and sharing options	https://www.iri.com/services/training/courseware#workbench AND https://www.iri.com/blog/iri/iri-workbench/introduction-metadata-management-hub/
		Data classification wizard that incorporates search and masking rule matchers	https://www.iri.com/blog/data-protection/classify-mask-pii-in-databases-with-fieldshield/
		Database profiling and table or schema-wide PII search	https://www.iri.com/products/workbench/fieldshield-gui/profile
		Flat-file data profiling and PII search	https://www.iri.com/blog/iri/iri-workbench/flat-file-profiling/
		Dark data PII (unstructured file) search and reporting	https://www.iri.com/blog/migration/data-migration/unstructured-data-data-restructuring-wizard/
		Automatic application of data masking functions to data classes, table- or schema-wide	https://www.iri.com/products/workbench/fieldshield-gui/apply-rules AND https://www.iri.com/blog/data-protection/applying-field-rules-using-classification/ AND https://www.iri.com/blog/iri/iri-workbench/schema-pattern-search-data-class-association/ OR JUST https://www.iri.com/blog/data-protection/classify-mask-pii-in-databases-with-fieldshield/
		Support for ~ 160 data sources, including relational, file, NoSQL, cloud, big data, unstructured data, etc.	https://www.iri.com/products/workbench/data-sources
		Fit-for purpose re-ID risk scoring and reporting wizard, plus additional data masking (blurring and bucketing) functions to further anoymize quasi-identifying data in otherwise masked sets, with training and certification services optionally available	https://www.iri.com/solutions/data-masking/hipaa AND https://www.iri.com/solutions/data-masking/hipaa/risk-score AND https://www.iri.com/solutions/data-masking/static-data-masking/blur AND https://www.iri.com/ftp9/pdf/FieldShield/HIPAA_Data_Certification_Course_Outline.pdf
		Structurally and referentially correct test data generation (from DDL info only) and population via built-in RowGen functionality	https://www.iri.com/solutions/test-data AND https://www.iri.com/products/rowgen
		Fully compatible metadata and job integration with other data management (ETL, migration, analytics, data quality, etc.) operations in the IRI Voracity platform's Eclipse IDE, IRI Workbench	http://www.iri.com/products/cosort/sortcl-metadata AND http://www.iri.com/products/voracity/technical-details#capabilities AND https://www.iri.com/products/workbench
		Software development kit (SDK) for API-level integration of masking functions in bespoke applications, for dynamic data masking, etc.	https://www.iri.com/blog/vldb-operations/fieldshield-sdk-2/
		Real-time use of API mask functions in triggers for inserts, etc.	https://www.iri.com/blog/data-protection/real-time-data-masking/
		Proxy-based dynamic data masking, access control and SQL activity auditing for multiple databases.	https://www.iri.com/blog/data-protection/proxy-dynamic-masking-fieldshield/
		Ability to run many of the same masking functions in Hadoop, interchangeably (without modification to FieldShield job scripts) in MapReduce 2, Spark, Spark Stream, Storm or Tez	https://www.iri.com/solutions/big-data/hadoop-optional
		Masking functions and IRI Workbench UI shared with sister products (also included in Voracity subscriptions): IRI CellShield EE for masking data in Excel, and IRI DarkShield for masking data in unstructured text files, Office & .pd's, images, faces, NoSQL DBs, etc.	https://www.iri.com/products/cellshield/cellshield-ee AND https://www.iri.com/products/darkshield

This kind of partner ...	... we'd expect or provide a:
vendor-neutral consultancy or analyst	NDA or MNDA
referral-only	reciprocal finder's fee MoU
resale	reseller agreement
OEM integration	royalty agreement

Frequently Asked Questions

How may we help you?

Quick Links

Selected Questions and Answers

+ BI & Analytics

+ CoSort & Compatible IRI Product Installation

+ CoSort Installation Issues

+ CoSort Sorting Apps & Performance Tuning

+ Data Security (PII Classify, Search, Mask) -- IRI *Shield Tools (Voracity / Data Protector Suite)

+ Data Governance > Data/Metadata Forensics & RBACs

+ DW ETL Operations

With Voracity, there is no need for:

Because Voracity:

+ Flat Files

+ IRI Business

+ IRI Partnerships

+ IRI Software Pricing

+ ISV (OEM) Integration

+ Job Monitoring, Recovery, and Logging

+ Large XML File Creation/Conversion

+ Master Data

+ Custom Transformations

+ Oracle Acceleration

+ Support Site

+ Synthetic Test Data (RowGen-specific) Questions

Request More Information

Solutions

Products

Customers

Services

Company

Support

News

Partners

Frequently Asked Questions

How may we help you?

Quick Links

Selected Questions and Answers

With Voracity, there is no need for:

Because Voracity:

Request More Information

Follow us on

Get the IRI Newsletter