How to Mask Data in Web Logs
This article is third in a 3-part series on CLF and ELF web log data. We first introduced CLF and ELF web log formats, then introduced IRI solutions for processing web log data, and here we conclude by masking private data in web log files. Note that this article specifically addresses the masking of structured web log files via IRI FieldShield, while other web, application, and device log formats that are semi- or un-structured can be scanned for PII and masked using IRI DarkShield. The IRI Voracity data management platform includes both tools, plus the ability to cleanse, manipulate, remap, and report from various log and message formats.
Web log files are created by, and stored on, website servers to track visitors’ clickstream information trail. Some of the information in these logs is sensitive or personally identifiable.
As we know from articles in the data masking sections of IRI’s blog and website, there are multiple ways to shield personally identifiable information (PII) or otherwise sensitive data in structured sources. String masking, for example, covers over (or redacts) original values using other characters. Encryption, on the other hand, produces ciphertext that de-identifies the original value, but allows its restoration (decryption).
IRI FieldShield software protects PII in databases and many other data sources — including web logs — with multiple field-level security functions. FieldShield can mask, encrypt, delete, or randomize IP addresses, along with other items subject to data protection and privacy laws. FieldShield also supports pseudonymization, hashing, redaction, and sub-string manipulation of data in structured file formats.
Consider the sample Extended Log Format (ELF) file below. It contains the visit date, time, IP address, server IP address, port, protocol, number of transferred bytes, and the URL of the opened page:
2014-05-24,12:55:15,32.09.130.15,96.48.225.22,GET,80,200,10801,"http://www.iri.com/products/fieldshield/why-is-fieldshield-better" 2014-05-24,20:55:15,96.47.227.21,96.46.220.42,GET,80,200,10801,"http://www.iri.com/solutions/data-masking/encryption/format-preserving-encryption" 2014-05-24,22:18:01,12.41.114.23,96.45.225.98,GET,80,200,10801,"http://www.iri.com/solutions/data-masking/de-identification/overview" 2014-05-24,13:15:06,96.46.230.79,96.47.126.99,GET,80,200,10801,"http://www.iri.com/products/workbench/fieldshield-gui/apply-rules" 2014-05-24 23:15:06,96.45.226.19,95.47.214.50,GET,80,200,10801,"http://www.iri.com/blog/data-protection/data-risk-fieldshield-mitigation/" 2014-05-25,23:15:22,11.11.111.11,95.47.214.50,GET,80,200,10801,"http://www.iri.com/blog/test-data/rowgen-v3-automates-database-test-data-generation/"
Use the Encryption and Decryption dialog in the IRI Workbench GUI for FieldShield to apply field-level encryption. Below is an example of encrypting each visitor’s IP address with a format-preserving AES-256 function:
Similar dialogs exist for string masking, pseudonymization, randomization, hashing, de-ID, encoding, etc.
The portable FieldShield job script created automatically in the GUI (or by hand, if you prefer), reflects both field encryption and redaction:
/INFILE=rawlog.elf /PROCESS=ELF /FIELD=(DATE, POSITION=1,TYPE=ASCII, SEPARATOR=" ") /FIELD=(TIME, POSITION=2,TYPE=ASCII, SEPARATOR=" ") /FIELD=(C_IP, POSITION=3, SEPARATOR=" ", TYPE=IP_ADDRESS) /FIELD=(S_IP, POSITION=4, SEPARATOR=" ", TYPE=IP_ADDRESS) /FIELD=(CSMETHOD, POSITION=5,TYPE=ASCII, SEPARATOR=" ") /FIELD=(S_PORT, POSITION=6, SEPARATOR=" ") /FIELD=(STATUS, POSITION=7, SEPARATOR=" ") /FIELD=(BYTES, POSITION=8, SEPARATOR=" ") /FIELD=(CS_URI_STEM, POSITION=9, SEPARATOR=" ",TYPE=ASCII,FRAME='"') /OMIT WHERE C_IP EQ "11.11.111.11" /OUTFILE=maskedlog.elf /PROCESS=ELF /HEADREC="DATE TIME MASKED IP CS_URI_STEM\n\n" /FIELD=(DATE, POSITION=1, SIZE=12, TYPE=ASCII) /FIELD=(TIME, POSITION=15, SIZE=10, TYPE=ASCII) /FIELD=(ENC_AES256_C_IP=enc_fp_aes256_alphanum(C_IP), POSITION=30, SIZE=12, TYPE=IP_ADDRESS) /FIELD=(replace_chars(CS_URI_STEM , "*",7, 8, "#", 30, 8), POSITION=45, SIZE=55, TYPE=ASCII)
After running the script, we get the ELF-style output desired … but in fixed position, and compliant with privacy regulations.
DATE TIME MASKED IP CS_URI_STEM 2014-05-24 12:55:15 32.09.130.15 http:/********.com/products/f########ld/why-is-fieldshi 2014-05-24 13:15:06 05.07.569.95 http:/********.com/products/w########/fieldshield-gui/a 2014-05-24 20:55:15 98.68.117.52 http:/********.com/solutions/########king/encryption/fo 2014-05-24 22:18:01 69.67.212.32 http:/********.com/solutions/########king/de-identifica 2014-05-24 23:15:06 42.01.555.73 http:/********.com/blog/data-########on/data-risk-field
See additional formatting, filtering, transformation, and calculation functions in the previous blog on CLF and ELF Web Log Data Processing. Contact fieldshield@iri.com for assistance.