TECHNOLOGY

Level to HN: Greenmask 0.2 – Database anonymization tool

Dump anonymization and synthetic records generation tool

Greenmask is a highly effective delivery-supply utility that is designed for logical database backup dumping,
anonymization, synthetic records generation and restoration. It has ported PostgreSQL libraries, making it legitimate.
It is stateless and does now not require any adjustments to your database schema. It is designed to be highly customizable and
backward-compatible with new PostgreSQL utilities, like a flash and legit.

Discord
Telegram
X (formerly Twitter) Follow

Build status
Documentation
License
GitHub Release
GitHub Downloads (all assets, all releases)
Docker pulls
Go Report Card

schema.png

Greenmask has a Playground – it is a long way a sandbox atmosphere in Docker with
sample databases included to lend a hand you are making an try Greenmask without any additional actions

  1. Clone the greenmask repository and navigate to its directory by running the next commands:

    git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask
  2. Once you’ve cloned the repository, delivery the atmosphere by running Docker Fabricate:

    docker-construct flee greenmask
  • Deterministic transformers
    — deterministic manner to records transformation per the hash
    solutions. This ensures that the the same enter records will continuously develop the the same output records. Nearly every transformer
    helps either random or hash engine making it universal for any utilize case.
  • Dynamic parameters — nearly every
    transformer helps dynamic parameters, permitting to parametrize the
    transformer dynamically from the desk column mark. Right here is purposeful for resolving the purposeful dependencies
    between columns and relaxing the constraints.
  • Transformation validation and easy maintainable – Right through
    configuration assignment, Greenmask offers validation
    warnings, records transformation diff and schema diff solutions, permitting you to video display and protect transformations
    effectively
    within the guts of the tool lifecycle. Schema diff helps to lead certain of info leakage when schema changed.
  • Partitioned tables transformation inheritance
    — Outline transformation configurations as soon as and train them to all
    partitions inner partitioned tables (the utilize of apply_for_inherited parameter), simplifying the anonymization assignment.
  • Stateless – Greenmask operates as a logical dump and does now not affect your new database schema.
  • Ghastly-platform – Could perhaps also fair additionally be easily built and performed on any platform, due to the its Crawl-essentially based mostly mostly structure,
    which eliminates platform dependencies.
  • Database kind gain – Ensures records integrity by validating records and the utilize of the database driver for
    encoding and decoding operations. This sort guarantees the preservation of info formats.
  • Backward compatible – It fully helps the the same solutions and protocols as new vanilla PostgreSQL utilities.
    Dumps created by Greenmask will be efficiently restored the utilize of the pg_restore utility.
  • Extensible – Users accept as true with the pliability
    to implement domain-essentially based mostly mostly transformations
    in any programming language or
    utilize predefined templates.
  • Integrable – Integrate seamlessly into your CI/CD device for computerized database anonymization and
    restoration.
  • Parallel execution – Mediate excellent thing about parallel dumping and restoration, vastly lowering the time required
    to raise outcomes.
  • Provide quantity of storages – affords a quantity of storage solutions for native and remote records storage,
    including directories and S3-fancy storage solutions.
  • Pgzip toughen for faster compression — by
    setting --pgzip, it must quickens the dump and restoration
    processes through parallel compression.

Greenmask is great for plenty of scenarios, including:

  • Backup and Restoration. Exercise Greenmask for your on a regular foundation routines inspiring logical backup dumping and restoration. It
    seamlessly handles projects fancy desk restoration after truncation. Its performance carefully mirrors that of pg_dump
    and pg_restore, making it a easy replacement.
  • Anonymization, Transformation, and Files Masking. Employ Greenmask for anonymizing, remodeling, and covering
    backups, critically when constructing a staging atmosphere or for analytical solutions. It simplifies the deployment of
    a pre-production atmosphere with consistently anonymized records, facilitating faster time-to-market within the near
    lifecycle.

It is evident that potentially the most appropriate manner for executing logical backup dumping and restoration is by leveraging
the core PostgreSQL utilities, namely pg_dump and pg_restore. Greenmask has been purposefully designed to
align with PostgreSQL’s native utilities, ensuring compatibility. Greenmask essentially handles records dumping
operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore,
declaring seamless integration with PostgreSQL’s popular instruments.

Greenmask makes utilize of the directory structure of pg_dump and pg_restore. This structure is terribly appropriate for
parallel execution and partial restoration, and it consists of certain metadata recordsdata that succor in figuring out the backup and
restoration steps. Greenmask has been optimized to work seamlessly with remote storage techniques and anonymization
procedures.

  • s3 – This feature helps any S3-fancy storage device, including AWS S3, making it versatile and adaptable to
    varied cloud-essentially based mostly mostly storage solutions.
  • directory – Right here is the popular different, representing the popular filesystem directory for native storage.

Files Anonymization and Validation

Greenmask works with COPY strains, collects schema metadata the utilize of the Golang driver, and employs this driver within the
encoding and decoding assignment. The validate characterize affords a manner to assess the affect on each and every schema
(validation warnings) and records (transformation and displaying differences). This characterize lets you validate
the schema and records transformations, ensuring the specified outcomes sooner or later of the Anonymization assignment.

If your desk schema relies on purposeful dependencies between columns, that you can take care of this subject the utilize of the
Dynamic parameters. By setting dynamic
parameters, that you can unravel reminiscent of created_at and updated_at cases, the build the
updated_at wishes to be better or equal than the created_at.

If you happen to want to implement custom logic imperatively utilize
TemplateRecord or
Template transformers.

Greenmask offers a framework for creating your custom transformers, which will be reused efficiently. These
transformers will be seamlessly built-in without requiring recompilation, due to the the PIPE (stdin/stdout)
interaction.

Furthermore, Greenmask’s structure is designed to be highly extensible, making it imaginable to introduce various
interaction protocols, reminiscent of HTTP or Socket, for conducting anonymization procedures.

PostgreSQL Model Compatibility

Greenmask is compatible with PostgreSQL variations 11 and elevated.

  • Utilized the Demo database, supplied by PostgresPro, for integration
    checking out solutions.
  • Employed the adventureworks database created
    by morenoh149/postgresDBSamples, within the Docker Fabricate playground.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button