Level to HN: Greenmask 0.2 – Database anonymization tool
Greenmask is a highly effective delivery-supply utility that is designed for logical database backup dumping,
anonymization, synthetic records generation and restoration. It has ported PostgreSQL libraries, making it legitimate.
It is stateless and does now not require any adjustments to your database schema. It is designed to be highly customizable and
backward-compatible with new PostgreSQL utilities, like a flash and legit.
Greenmask has a Playground – it is a long way a sandbox atmosphere in Docker with
sample databases included to lend a hand you are making an try Greenmask without any additional actions
-
Clone the
greenmask
repository and navigate to its directory by running the next commands:git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask
-
Once you’ve cloned the repository, delivery the atmosphere by running Docker Fabricate:
docker-construct flee greenmask
- Deterministic transformers
— deterministic manner to records transformation per the hash
solutions. This ensures that the the same enter records will continuously develop the the same output records. Nearly every transformer
helps eitherrandom
orhash
engine making it universal for any utilize case. - Dynamic parameters — nearly every
transformer helps dynamic parameters, permitting to parametrize the
transformer dynamically from the desk column mark. Right here is purposeful for resolving the purposeful dependencies
between columns and relaxing the constraints. - Transformation validation and easy maintainable – Right through
configuration assignment, Greenmask offers validation
warnings, records transformation diff and schema diff solutions, permitting you to video display and protect transformations
effectively
within the guts of the tool lifecycle. Schema diff helps to lead certain of info leakage when schema changed. - Partitioned tables transformation inheritance
— Outline transformation configurations as soon as and train them to all
partitions inner partitioned tables (the utilize ofapply_for_inherited
parameter), simplifying the anonymization assignment. - Stateless – Greenmask operates as a logical dump and does now not affect your new database schema.
- Ghastly-platform – Could perhaps also fair additionally be easily built and performed on any platform, due to the its Crawl-essentially based mostly mostly structure,
which eliminates platform dependencies. - Database kind gain – Ensures records integrity by validating records and the utilize of the database driver for
encoding and decoding operations. This sort guarantees the preservation of info formats. - Backward compatible – It fully helps the the same solutions and protocols as new vanilla PostgreSQL utilities.
Dumps created by Greenmask will be efficiently restored the utilize of the pg_restore utility. - Extensible – Users accept as true with the pliability
to implement domain-essentially based mostly mostly transformations
in any programming language or
utilize predefined templates. - Integrable – Integrate seamlessly into your CI/CD device for computerized database anonymization and
restoration. - Parallel execution – Mediate excellent thing about parallel dumping and restoration, vastly lowering the time required
to raise outcomes. - Provide quantity of storages – affords a quantity of storage solutions for native and remote records storage,
including directories and S3-fancy storage solutions. - Pgzip toughen for faster compression — by
setting--pgzip
, it must quickens the dump and restoration
processes through parallel compression.
Greenmask is great for plenty of scenarios, including:
- Backup and Restoration. Exercise Greenmask for your on a regular foundation routines inspiring logical backup dumping and restoration. It
seamlessly handles projects fancy desk restoration after truncation. Its performance carefully mirrors that of pg_dump
and pg_restore, making it a easy replacement. - Anonymization, Transformation, and Files Masking. Employ Greenmask for anonymizing, remodeling, and covering
backups, critically when constructing a staging atmosphere or for analytical solutions. It simplifies the deployment of
a pre-production atmosphere with consistently anonymized records, facilitating faster time-to-market within the near
lifecycle.
It is evident that potentially the most appropriate manner for executing logical backup dumping and restoration is by leveraging
the core PostgreSQL utilities, namely pg_dump and pg_restore. Greenmask has been purposefully designed to
align with PostgreSQL’s native utilities, ensuring compatibility. Greenmask essentially handles records dumping
operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore,
declaring seamless integration with PostgreSQL’s popular instruments.
Greenmask makes utilize of the directory structure of pg_dump and pg_restore. This structure is terribly appropriate for
parallel execution and partial restoration, and it consists of certain metadata recordsdata that succor in figuring out the backup and
restoration steps. Greenmask has been optimized to work seamlessly with remote storage techniques and anonymization
procedures.
- s3 – This feature helps any S3-fancy storage device, including AWS S3, making it versatile and adaptable to
varied cloud-essentially based mostly mostly storage solutions. - directory – Right here is the popular different, representing the popular filesystem directory for native storage.
Greenmask works with COPY strains, collects schema metadata the utilize of the Golang driver, and employs this driver within the
encoding and decoding assignment. The validate characterize affords a manner to assess the affect on each and every schema
(validation warnings) and records (transformation and displaying differences). This characterize lets you validate
the schema and records transformations, ensuring the specified outcomes sooner or later of the Anonymization assignment.
If your desk schema relies on purposeful dependencies between columns, that you can take care of this subject the utilize of the
Dynamic parameters. By setting dynamic
parameters, that you can unravel reminiscent of created_at and updated_at cases, the build the
updated_at wishes to be better or equal than the created_at.
If you happen to want to implement custom logic imperatively utilize
TemplateRecord or
Template transformers.
Greenmask offers a framework for creating your custom transformers, which will be reused efficiently. These
transformers will be seamlessly built-in without requiring recompilation, due to the the PIPE (stdin/stdout)
interaction.
Furthermore, Greenmask’s structure is designed to be highly extensible, making it imaginable to introduce various
interaction protocols, reminiscent of HTTP or Socket, for conducting anonymization procedures.
Greenmask is compatible with PostgreSQL variations 11 and elevated.
- Documentation
- Electronic mail: toughen@greenmask.io
- Telegram
- Discord
- DockerHub
- Utilized the Demo database, supplied by PostgresPro, for integration
checking out solutions. - Employed the adventureworks database created
bymorenoh149/postgresDBSamples
, within the Docker Fabricate playground.