The 2020 Verizon Data Breach Investigations Report (DBIR) says that nearly half (45%) of the breaches featured hacking, and are tied to web application vulnerabilities.
It has more than doubled year over year, 22% of breaches from social attacks and malware attacks, 17% of breaches due to misconfigurations, 8% of unauthorized access and 4% of physical attacks.
To make the hosting and running of a web application possible, several web application components are needed. In a basic environment there should be at least a web server software (such as Apache or IIS), web server operating system (such as Windows, Linux, MacOS), database server (such as MySQL, MSSQL or PostgreSQL) and a network based service, such as FTP or SFTP.
For a secure web server, all of these components also need to be protected to make sure that sensitive data is secured properly. If security breaks at any point, the malicious attackers can gain access to the web application and retrieve data from the database or tamper it.
Sensitive data can be any sort of information that needs to be protected from unauthorized access to safeguard the privacy or security of an individual or organisation. It can include any information pertaining to:
Passwords
Passphrases
Encryption keys
OAuth tokens
Credit card numbers
Personal contact information such as names, phone numbers, email addresses, user accounts, physical addresses, etc
Demographic information such as gender, age, income, education, ethnicity
In some states and countries: machine identifying information such as MAC, IP addresses, serial numbers, etc
Also it can be personally identifying information (PII) or high business impact (HBI) data. Sensitive data varies a lot from country to country and the way you have to store and secure sensitive data can also vary accordingly. Various compliance standards, such as the Payment Card Industry (PCI) compliance standard, require special measures to be taken, when collecting sensitive data to stay in compliance.
In today’s world of infrastructure security- network, host, and application-level, data security becomes more important. Data security, includes the security of:
Data-in-transit
Data-at-rest
And the right storage mechanisms should be chosen for storing these data. Storage mechanisms should save information more reliably, reduce bandwidth, and improve responsiveness.
Data model is a subset of the implementation model which describes the logical and physical representation of persistent data in the system.
Structured: Structured data conforms to a tabular format with relationships between the different rows and columns, typical of SQL DBMS, flexible and dynamic queries, where the full range of query types may not be known a priori. Example: IndexedDB.
Key/Value: Key/Value datastores, and related NoSQL databases use an associative array as the fundamental data model where each key is associated with one and only one value in a collection. Examples: Cache API in the browser, Apache Cassandra on the server.
Byte Streams: File systems and other hierarchically organized data, stores data as a variable length, string of bytes, leaving any form of internal organization to the application layer. Examples: file systems and cloud storage services.
Storage methods for web applications can be evaluated according to scope over which data is made persistent.
Session Persistence: Data is persisted only as long as there exists an active single web session or browser tab. Example: Session Storage API.
Device Persistence: Data is persisted across sessions and browser tabs/windows on a particular device. Example: Cache API.
Global Persistence: Data is retained across sessions and devices. It is the most robust form of data persistence. It can’t be stored on the device itself, so server-side storage is needed. Example: Google Cloud Storage.
Client-side storage allows users to store different types of data on the client with users’ permission and then retrieve them whenever needed. This allows users to persist data for long-term storage, save sites or documents for offline usage, keep user-specific settings for the site, and more.
Data can be stored in different ways, such as session storage, local storage, cookies, webSQL, cache and indexedDB.
SessionStorage object is used to store data on a temporary basis and cleared when the page session ends. Since SessionStorage is tab specific, it is not accessible from web workers or service workers. It is limited to about 5 MB and can contain only strings. It may be useful for storing small amounts of session specific information, for example, IndexedDB key.
LocalStorage object is used to store data for the entire website on a permanent basis. LocalStorage is not accessible from web workers or service workers. It is limited to about 5MB and can contain only strings. LocalStorage should be avoided because it is synchronous and will block the main thread.
Cookies are sent with every HTTP request, so storing data in it will significantly increase the size of web requests. They are synchronous, and are not accessible from web workers. Like LocalStorage and SessionStorage, cookies are limited to only strings. Cookies have their uses, but not a good choice for storage.
WebSQL Support has been removed from almost all major browsers. The W3C stopped maintaining the Web SQL spec in 2010, with no plans to further updates planned. WebSQL should not be used, and existing usage should be migrated to IndexedDB.
Cache has been deprecated and support will be removed from browsers in the future. Application cache should not be used, and existing usage should be migrated to service workers and the Cache API.
Unlike most modern promise-based APIs, IndexedDB is event based. Promise wrappers like idb for IndexedDB hide some of the powerful features but more importantly, hide the complex machinery (e.g. transactions, schema versioning) that comes with the IndexedDB library. It is a low level API that requires significant setup before use, which can be particularly painful for storing simple data.
Data storage is usually handled server-side. Data storage can occur on physical hard drives, disk drives, USB drives or virtually in the cloud. Files are backed up and easily available when systems ever crash beyond repair.
There are three broad types of data storage, including direct attached storage, network attached storage and storage area network.
DAS is a storage system where servers are directly connected to the storage device. In DAS, to access data by applications, block-level access protocol is used. Some of the common devices in this category include:
Hard Drives
Solid-State Drives (SSD)
CD/DVD Drives
Flash Drives
Network-attached storage is a file-level computer data storage server and it is connected to a computer network. It offers dedicated file serving and sharing through the network. It increases performance, reliability with features like RAID and swappable drives designed for higher multi-drive workloads.
A storage area network is a dedicated and high-performance storage system. It transfers block-level data between servers and storage devices. SAN is usually used in data centers, enterprises or virtual computing environments.
Computer storage devices are any type of hardware that stores data. It keeps and retains information short-term or long-term. It can be a device inside or outside a computer or a server.
Hard Disk Drive (HDD) or Fixed Disk Drive (FDD), is a non-volatile, hardware data storage device attached to a computer or server. It magnetically stores, retrieves, and outputs digital data using a series of stacked rotating metallic disks that have been coated with magnetic material. The rotating disks are paired with an actuator arm which reads and writes the digital data to the disks.
Solid State Device (SSD) is a storage device that uses integrated circuit assemblies to store and retrieve data, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It offers swift data transfer between SSD and a smaller physical size than a disk array.
An optical disc drive reads and writes all common Compact Disk (CD) and Digital Versatile Disk (DVD) formats. CD drives are built into computers. A DVD will hold more information than a CD, and therefore can be used for a wide variety of media and storage.
These storage devices include both flash memory drives and hard disk drives for balanced performance. Hybrid flash arrays use form factors and electrical interfaces that are compatible with common HDD bays. Hybrid flash arrays offer low-cost startup, reasonable performance costs and fast data access on demand.
Hybrid cloud storage is an approach for managing cloud storage that uses both local and off-site resources. It offers a secure and compliant option that helps to assure business continuity. It accommodates frequent backups and long-term archives as well as future scaling and always-on availability. The combination of cloud and on-premises storage adds a layer of safety to ensure data is protected and available, and storage space could potentially be unlimited.
Computer programs used to perform a backup; creates additional exact copies of files, databases or entire computers. Software for system and enterprise backups typically comes with a license or a subscription rate billed monthly or annually.
Accumulates the backup software and hardware components within a single device. Configurations may be complicated and reliability may be at risk with misconfigurations and incorrect software tuning.
Complete cloud-based or online storage solutions offer virtual data storage which stores data on the internet through a cloud computing provider. They manage it and are responsible for data availability and accessibility, not just on a local computer or external hard disk. Reliability tends to be on point, but organizations need to consider a cloud storage security strategy before implementing.
OWASP Top 10 is the list of the 10 most common application vulnerabilities with its risks, impact, countermeasures and it is updated every three to four years. The latest OWASP vulnerabilities list was released in 2017, they are:
Injection
Broken Authentication
Sensitive Data Exposure
XML External Entities (XXE)
Broken Access Control
Security Misconfigurations
Cross Site Scripting (XSS)
Insecure Deserialization
Using Components with Known Vulnerabilities
Insufficient Logging and Monitoring
User authentication plays an important role in addressing many important data protection principles, as it is essential to meeting security, access, consent, and accountability requirements.
Maintaining confidentiality, integrity, and availability for data security is a basic factor in securing data. Authentication of users and even of communicating systems is performed by various mechanisms, but the basic factor of these is cryptography.
Authentication of users takes several forms, but all are based on the combination of authentication factors: something an individual knows (such as a password), something they possess (such as a security token), or some measurable quality (such as a fingerprint).
Single factor authentication is based on only one authentication factor. Stronger authentication requires additional factors; for instance, two factor authentication is based on two authentication factors (such as a pin and a fingerprint).
Access controls are generally described as discretionary or non-discretionary, and the most common access control models are:
Discretionary Access Control (DAC) is a type of access control system that assigns access rights based on rules specified by users. Permission management can be difficult to maintain; DAC does not scale well beyond a small set of users.
Role Based Access Control (RBAC), also known as a non-discretionary access control, assigns rights based on organizational roles instead of individual user accounts within an organization and the access policy is determined by the system. A subject can access an object or execute a function, only if the set of permissions or role allows it.
Mandatory Access Control (MAC) uses a hierarchical approach to control access to files/resources. A subject’s label specifies the level of trust, and an object’s label specifies the level of trust that is required for accessing it. If a subject is to gain access to an object, the subject label must dominate or at least it should be as high as the object label. Access policy is determined by the system.
There are multiple ways for encrypting data at rest. Following is an outline of various forms of encryption that are protection methods for securing data at rest:
Full disk encryption of data at the disk level - cryptographic method that applies encryption to the entire hard drive including data, files, the operating system and software programs. This is a brute-force approach to encrypt data but this also involves performance and reliability concerns. If encryption is not done at the drive hardware level, then it affects the system in terms of performance and even minor disk corruption can be fatal as the OS, applications, and data.
Directory level (or Filesystem) - Entire data directories are encrypted or decrypted as a container and to access those files encryption keys are required. Used for segregating data of identical sensitivity or categorization into directories that are individually encrypted with different keys.
File level - Only specific files with sensitive data are encrypted rather than encrypting an entire hard drive or even a directory. It can be more efficient to encrypt individual files.
Application level - Allows to encrypt entire files or specific fields of data at the application level, before it is stored. The application manages encryption and decryption of application-managed data.
The two goals of securing data in motion are preventing data from being compromised with its confidentiality, integrity, availability. To protect data in motion:
Implement security framework for data by enforcing end-to-end encryption, strong authentication, automation of file based tasks, rules and policy management, user Ad Hoc secure file transfers, guaranteed delivery, integration with existing security controls, etc
Restrict cloud sharing/alternative transfer methods
Identify critical assets and vulnerabilities
The most common way to protect data in motion is to utilize encryption combined with authentication to create a conduit to safely pass data.
To sum up things, in order to store and secure sensitive data it is important to choose the right mechanism. Yet it is evident that securing sensitive data cannot be assured only with the right storage mechanism, it also requires proper security of the application. If the application ends up being vulnerable, then it makes it easier for an attacker to retrieve sensitive data.