Cytomic Data Watch (Personal data monitoring)

Files with Personally Identifiable Information (PII) are files that contain information that can be used to identify individuals related to the organization (for example, customers, employees, and suppliers) This information can include different types of data, such as social security numbers, phone numbers, and email addresses.

Cytomic Data Watch is the Advanced EDR security module that enables companies to comply with data protection regulations, such as the GDPR. It also monitors and improves the visibility of personal data (PII) stored in an organization IT infrastructure.

To achieve this, Cytomic Data Watch provides three key features:

  • It generates a complete, daily inventory of the PII files found on the network, along with basic information such as their name, extension, and the name of the computer where the file was detected.

  • It discovers, audits, and monitors the entire life cycle of PII files in real time: from data at rest, to data in use (the operations taken on personal data), and data in motion (data exfiltration).

  • It provides tools to perform flexible, content-based searches and delete duplicate personal data files to limit their presence across the network.

Introduction to Cytomic Data Watch operation

To fully understand the processes involved in the discovery and monitoring of the personal data stored across an organization, you must be familiar with some concepts associated with the technologies used by Cytomic Data Watch.

Entity

Each word or group of words with their own meaning referring to a certain type of personal information is called 'entity'. These entities include personal ID numbers, first and last names, phone numbers, and other.

Given the highly ambiguous and variable nature of natural language, each entity can have different formats depending on the language, and so it is necessary to apply flexible, adaptable algorithms for the detection of personally identifiable information. Generally, analyzing entities consists of applying a set of predefined formats or expressions to data and uses the local context surrounding the detection, as well as the presence or absence of certain keywords, to avoid false positives. For more information, see Supported entities and countries.

PII file

After an entity is identified, the context in which it appears is evaluated to determine if the information it provides is enough to identify a specific person. If it is, the file can be protected with specific processing and access protocols that enable the organization to comply with the applicable legislation (GDPR, PCI, etc.). This evaluation process leverages a monitored machine learning model and a mature model based on the analysis of entities and the global context of documents to finally classify a file with detected entities as a PII file to protect.

Unstructured files and IFilter components

Cytomic Data Watch scans unstructured files (text files with different formats, spreadsheets, PowerPoint presentation files, etc.) searching for entities and classifying files as PII files or non-PII files. However, to correctly interpret the content of unstructured files, certain third-party components must be installed on user computers. These components are called IFilters and are not part of the Advanced EDR installation package. Microsoft Search, Microsoft Exchange Server, and Microsoft SharePoint Server, along with other operating system and third-party product services, use IFilters to index user files and enable content-based searches.

Each file format supported by Cytomic Data Watch has its own associated IFilter component, and many of them come preinstalled with the Windows operating system. However, other components must be manually installed or updated.

The Microsoft Filter Pack is a free single point-of-distribution for Office IFilters. After it is installed, it enables Cytomic Data Watch to parse the content of all file formats supported by the Microsoft Office productivity suite. For more information, see Microsoft Filter Pack Component.

Index process

This consists of inspecting and storing the contents of all files supported by Cytomic Data Watch to generate an inventory of PII files and search the content of these files. The indexing process has little impact on computer performance, but does require a significant amount of time. You can schedule the start of the indexing task or limit its scope to expedite the process and improve the results returned by searches. For more information, see The indexing process.

Normalization process

When performing an indexing process, Cytomic Data Watch applies a number of rules to homogenize indexed data. The aim of this process is to store each word individually and increase its chances of being found, as well as reducing search times. The rules to apply during the normalization process vary depending on whether the content to store is an entity or plain text. For more information, see Search requirements and properties.

PII file inventory

After a computer is indexed and all entities and PII files are identified, Cytomic Data Watch generates an inventory, accessible to you , with the names of the files and their characteristics. This inventory is sent to the Advanced EDR server once a day. For more information, see PII file inventory.

Cytomic Data Watch does not send the contents of files with PII to the Advanced EDR server. It only sends their attributes (name, extension, etc.) and the number and type of found entities.

File searches

Cytomic Data Watch finds files by name, extension, or content on the indexed storage drives of computers on the network.

Searches run in real time. As soon as you run a search task, you start to see results from the target computers. For more information, see File searches.

Monitoring of the actions taken on PII files

Cytomic Data Watch monitors the events that affect PII files and sends them to the Cytomic Insights console. This tool shows the trend of PII files on the network, enabling you to view whether they have been copied, moved, emailed, etc. For more information about Cytomic Insights, see the Cytomic Data Watch Administration Guide at https://info.cytomicmodel.com/resources/guides/DataWatch/en/DATAWATCH-guide-EN.pdf.