Quick Links
+
            
        
                                
                Overview
                                
                DataStage
                                
                ETI Solution
                                
                Informatica
                                
                OWB & ODI
                                
                Pentaho
                                
                Talend
                                
                Others
                                
                
                                        
                                                
 
 
 
 
 
 
 
                         
                    
                    
                                        
                    
                    
                    
                    
                                        
                        
                                    
            
            
            
            
        Challenges
While Pentaho Data Integration (PDI) is a powerful tool for preparing and integrating data, it also has some shortcomings:
 
Slow Transforms
- Native sorts, etc. may not run fast enough in high volume
 
Limited De-ID Features
- Cannot mask or encrypt data flowing through Kettle
 
Limited Test Data
- Cannot prototype ETL jobs without using production data
Solutions
PDI workflows support system commands, so data can be processed externally without disruption. IRI Voracity or its component software can help Pentaho users in the following ways:
 
Speed Transforms
- Use PDI's shell step to call an IRI CoSort job (e.g., SortCL script) to dramatically reduce sort, join, and aggregation times
- Run multiple jobs in one batch file
- Get results 14-16 times faster than Pentaho alone
  Blog
Using CoSort to Speed up the Sort Process in Pentaho
 
Mask Your Data
- Run IRI FieldShield jobs from the Shell step in Pentaho to protect data at rest
- Mask, encrypt, and encode (and others) data in your needed format
- Secure data at the field-level
  Blog
Masking Data in Pentaho
 
Test Your Apps
- Run IRI RowGen to populate tables, files and reports with synthetic test data that mimics production data
- Generate structurally- and referentially-correct DB test data for entire EDW
- Keep production data safe
  Blog
Creating Test Data for Pentaho




