CoSort Powers High-Volume ETL for Telco Data Warehouse
Reviewer
Beno Pradekso, ETL architect for Solusi247's telco engagements.
Background
Solusi247 is a Jakarta-based data warehouse consultancy specializing in the telecommunications industry. Solusi247 combines expertise in data integration and staging (as well as enterprise reporting) with domain knowledge of both fixed and mobile (especially GSM and CDMA) telecommunications. In Solusi's most recent client engagement, the central DW and reporting tasks involved extracts from MSC, IN, MMSC, GSN, billing systems, voucher management systems (prepaid services), CRM and other network data sources.
Platforms
Sun SF68000 with eight 900 MHz CPUs and 8GB of main memory. There are also 5TB storage SANs with four fibre channel I/O, three Blade servers with 3 GHz CPUs and 2GB of memory for binary-ASCII conversions, and one data collection machine.
Problem Solved
Our telco customer's MSC (switch) produces more than 50 million call detail records per day. The data must be transformed and loaded into a central database used by more than 600 customer care staff throughout Indonesia. The preprocessing and loading is performed between 1:00 a.m. and 4:00 a.m. - a three-hour window for resolving references such as type of call, operator code, area code (down to detail location) and call metrics (minutes and charges). Solusi uses our own product (SDTRAT) to convert the binary data into ASCII on the three Blade Servers - this takes 45 minutes. The final load takes 45 minutes. CoSORT must perform all of the data integration and staging between conversion and loading. That is, within the remaining portion of the staging window, CoSORT must filter, sort, join, aggregate and reformat all 50 million rows. Our customer is highly satisfied with our solution, particularly because of CoSORT's speed, fit and scalability.
Product Functionality
We can quickly design and execute the customized transformations that we need with CoSORT's sort control language (SortCL), a fourth-generation data definition and manipulation language. We only need to create our source layouts once, in centralized SortCL meta data repositories, and then simply refer to our field names in SortCL job scripts when remapping the input data through process to output (targets). We benefit from SortCL's ability to rapidly perform and combine filtering, sorting, merging, aggregating and reformatting in one or just a few job scripts and in one I/O pass.
Strengths
CoSORT's sort and join speed is incredible, beating every other database, ETL and sort tool on the market. Because the SortCL syntax is so logical and explicit, it has been easy for us to add more simultaneous processing during the client's same production window, such as grouping data according to type (incoming/outgoing, peak/off-peak, days/weekends, etc.).
Weaknesses
Our developers found it easier to code the more complex SortCL outer join scripts using word processors than via CoSORT's Java GUI for SortCL. Although SortCL coding is straightforward, we would prefer to use the GUI to automate our meta data creation.
Selection Criteria
Because of the enormous volume of call data moving through the ETL process, we had to find a solution that would be easy to set up, scale well (and not impact concurrent apps), and be flexible enough to meet our growing application requirements. We evaluated and benchmarked other sort-related and ETL tools, as well as our own 3GL programs, and found CoSORT to be not only the fastest flat-file manipulator on the market, but the best programmatically, functionally and financially.
Deliverables
We include CoSORT with our own Web-based portal application, which includes a warning/monitoring system, scheduling, workflow and reference-data maintenance. CoSORT's SortCL runs on the command-line, from a Java GUI, in batch or via program calls. Additionally, SortCL is only one of several UIs in CoSORT, a suite which also includes: utilities to convert various third-party data layouts and legacy sorts to SortCL; an interactive, menu-driven program for sort/merge operations (which can save the specs for batch runs); and approximately a dozen native plug-and-play replacements for slower, third-party sorts such as those within DataStage, Informatica, MF COBOL, SAS, Software AG Natural and the UNIX O/S.
Vendor Support
CoSORT support has been very helpful and responsive - throughout the world. The CoSORT engineers also volunteer technical suggestions to further improve our processing performance. Their support has really helped us grow our business as application providers and consultants.
Documentation
CoSORT's manual and online help systems are clear and comprehensive.