Introducing the ASN.1 Format and Support in IRI Voracity
Abstract
Abstract Syntax Notation One (ASN.1) is a language for describing the content and encoding of message data exchanged between computers (particularly in the telco industry). This is the first in a series of five articles covering the file format and the comprehensive new data engineering you can perform on ASN.1 files using IRI software.
Each file is described by an ASN.1 specification (also referred to as a schema) file that usually has an .asn extension. This human-readable metadata file defines every field in the message and is automatically processed in SortCL-compatible jobs in the IRI Voracity data management platform and its component products (CoSort, NextForm, FieldShield and RowGen).
ASN.1 CDRs
Every landline or cellular telephone call made over a Public Land Mobile Network (PLMN) creates one or more call records. These Call Detail Records (CDRs) or Usage Detail Records (UDRs) are generated by the mobile switching center (MSC) — the primary GSM/CDMA service delivery node responsible for routing voice calls, SMS, and other services.
CDRs contain information that the network operator uses for subscriber identification, call charging, services obtained, call routing, etc. After a “data collector” in the network switch captures the CDRs, they are usually converted by a “mediation” system from a binary format into a flat file format.
CDRs are typically encoded in a format compliant with a standard called ASN.1 (Abstract Syntax Notation One). ASN.1 is a flexible framework for representing data structures in telecommunications and computer networks, with some adoption in vehicle traffic and financial transaction logs, too.
One example of a standardized ASN.1 schema is TAP3, which many CDRs follow.
Flat CDRs are used by downstream billing and analytic applications which need to consume that data. Indeed, Voracity and most data integration (ETL) and telco application service provider (ASP) operations have relied on mediation to convert and enrich the data first, because they cannot natively process the raw, binary ASN.1 formats themselves.
The reason for this is that ASN.1 is designed to stringently encode machine-generated data for communicating to a non-specific, downstream processor. Mediation has been necessary because of unknown processors, and because the records are structured, macro’d, and not human readable.
In raw form, in general, CDR data can have more than 700 fields which may or may not appear in the actual runtime stream. But again, the character strings and values are encoded in octets which are not human-readable.
Until now.
Starting with CoSort Version 10.5 in 2021, IRI is featuring direct support for processing any ASN.1 encoded file in the company’s core Sort Control Language (SortCL) data manipulation program. This allows SortCL-compatible program users (in Voracity, CoSort, FieldShield, NextForm, and RowGen) to process data such as native ASN.1 CDRs directly at runtime — without prior mediation.
To process an ASN.1 encoded file with any SortCL-compatible job script, the associated schema file is required as well, in order to get the proper structure of the data.
ASN.1 Protocols
There are many networking protocols and other types of data that follow specifications defined by ASN.1 schemas. Typically, there is a general standard — as there is for TAP3 — that should be followed. However, many telco and other vendors do not follow the specification properly, or they use a custom specification.
Since there are so many possible different specifications, and even those defined by a standard such as TAP3 are not always followed, there needs to be a dynamic way to interpret input files that are defined by an ASN.1 specification. OSS Nokalva’s Compile-and-go library (CAGL) was chosen to integrate with in order to meet this dynamic interpretation requirement.
The CAGL requires an ASN.1 specification file (usually with the .asn extension), a corresponding data file, and the encoding rules of the data file. With those requirements, the CAGL can theoretically handle any type of ASN.1 specification and encoding rule dynamically. The CAGL has been integrated into IRI’s SortCL as an extension module and new process type that gets loaded dynamically.
ASN.1 Encoding Rules
An ASN.1 input file could contain the same data, but be encoded according to different encoding rules. They range from BER (Basic Encoding Rules) — which is a non-human-readable format — to JER (JSON Encoding Rules), and XER (XML Encoding Rules) — which are human readable formats just like JSON and XML.
PER (Packed Encoding Rules) are the most compact encoding rules. PER does not send the tag portion of the Tag-Length-Value, unlike BER. In other words, the tag to indicate what kind of data follows is not present in the PER encoding, but the length of the value and the value itself are present.
Octet Encoding Rules (OER) are the fastest ASN.1 encoding rules. In contrast to PER, OER favors encoding/decoding speed and ease of implementation over compactness of the encodings. In a SortCL script, if no encoding rules are specified, Basic Encoding Rules will be assumed.
The TAP3 CDR standard is the current version of the TAP standard. According to the GSMA (Groupe Speciale Mobile Association):
Roaming is a key feature of GSM, giving consumers seamless same-number contactability in over a hundred countries. Operators exchange call event details on these roaming subscribers. TAP is the process that allows a visited network operator (VPMN) to send billing records of roaming subscribers to their respective home network operator (HPMN). TAP3 is the latest version of the standard and enables billing for a host of new services that networks can offer.
This image displays a snippet of data from a file encoded by the TAP3 standard ASN.1 schema, outputted to Excel with column headers using a SortCL script.
ASN.1 Support in IRI Software
ASN.1 support will be documented in the IRI product manuals for CoSort, NextForm, FieldShield and RowGen, and throughout the IRI Workbench GUI where their jobs — and those of the umbrella Voracity platform that includes them and performs ETL, etc. — are designed. IRI professional services can help you create CDR data integration, masking, migration, billing, and analytic reports using SortCL. Other ASN.1-compatible data is handled through the same mechanism.
Note that you can also use this facility to pre-process your own forms of unstructured and semi-structured data that is defined with an ASN.1 schema, and feed it to SortCL. Consider not only the data processing you can do with those sources directly, but how you can combine multiple sources of disparate ‘big data’ in a fast, low-cost data integration environment. See the next article in this series for more details.
Wrap Up
This article gave an overview of the basics of ASN.1. Future articles in this series will cover examples, like that displayed in the image below, to show what’s possible with ASN.1 data in Voracity data discovery, integration, migration, governance, and analytic operations.
This image from IRI Workbench shows a visualization of the mapping and transforms performed on the data from its initial input(s) to output(s). This example shows a job that does basic reporting of raw data encoded by a TAP3 schema into readable output in text and XLSX file formats.
In the next article of this series on ASN.1 support in IRI software, the details of how data encoded by an ASN.1 schema are handled. The next articles after this, in order, are:
- ASN.1 Integration with SortCL
- SortCL ASN.1 Examples
- Using IRI Workbench with ASN.1 encoded data
- Gaining insight from Call Detail Records