An Intelligent Data Anonymization Framework for Privacy-Preserving Sensitive Information Protection
Keywords:
Abstract
In the digital era, organizations regularly handle large volumes of sensitive information, including personal identifiers, financial records, healthcare details, system logs, and login credentials. Protecting such information is essential for maintaining user trust and supporting compliance with privacy regulations such as GDPR, HIPAA, and the Indian DPDP Act. Traditional redaction and basic masking techniques often reduce the usefulness of data and may not provide a balanced solution for both privacy protection and data usability. To address this issue, this paper presents SmartMask, an intelligent framework for privacy-aware sensitive information masking. The proposed system uses policy-driven masking techniques along with an integrated secure password vault to protect confidential data while preserving its practical value for analysis, sharing, and testing. The framework supports multiple masking strategies, including full masking, partial masking, format-preserving substitution, numeric noise injection, and complete field removal. A configurable policy engine allows users to define, save, and reuse masking rules across different datasets and application domains. The system operates completely offline and supports multiple data formats such as CSV, Excel, JSON, and log files. An enhanced PII detection module uses pattern recognition and context-aware text analysis to identify emails, phone numbers, IP addresses, credit card numbers, bank details, and other personally identifiable information. Sensitive credentials are stored in an encrypted password vault using AES-256 encryption, providing an additional layer of security. The application includes a Flask and Bootstrap-based web interface with drag-and-drop file upload, real-time masking preview, interactive dashboards, audit logs, and compliance-ready reports. The generated reports include before-and-after masking comparison, detected sensitive fields, applied policies, and processing summaries. Designed to run on standard student laptops without internet dependency, the proposed framework demonstrates a practical privacy-preserving solution that balances confidentiality, compliance support, and operational usability.