WO2016186602A1 - Deletion prioritization - Google Patents

Deletion prioritization Download PDF

Info

Publication number
WO2016186602A1
WO2016186602A1 PCT/US2015/030937 US2015030937W WO2016186602A1 WO 2016186602 A1 WO2016186602 A1 WO 2016186602A1 US 2015030937 W US2015030937 W US 2015030937W WO 2016186602 A1 WO2016186602 A1 WO 2016186602A1
Authority
WO
WIPO (PCT)
Prior art keywords
data entity
subsets
subset
storage locations
information content
Prior art date
Application number
PCT/US2015/030937
Other languages
French (fr)
Inventor
Dave DONAGHY
Ben SIMPSON
John Butt
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/030937 priority Critical patent/WO2016186602A1/en
Publication of WO2016186602A1 publication Critical patent/WO2016186602A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations

Definitions

  • Computer systems include storage devices to store parts of a data entity such as file, database table, memory page, communication stream, etc for subsequent retrieval.
  • a data stream may be written and/or stored as separate parts in multiple storage locations.
  • FIG. 1 is a block diagram of an example deletion prioritization device
  • FIG. 2A is a flowchart of an example of a method for deletion prioritization
  • FIG. 2B is a flowchart of an example of a method for a deletion operation.
  • FIG. 3 is a block diagram of an example system for deletion prioritization.
  • a data stream may be written and/or stored as separate parts in multiple storage locations.
  • Deletion prioritization may be used when a data entity is requested to be deleted. As a data stream is stored, it may be written to multiple physical and/or logical storage locations (e.g., different hard drives or memory segments). Deletion prioritization assigns an order to the storage locations so that if the deletion operation is interrupted, the highest priority of these locations will have been deleted.
  • machine-readable storage medium refers to any electronic, magnetic, optical, or other physical storage device that stores executable instructions or other data (e.g., a hard disk drive, random access memory, flash memory, etc.).
  • FIG. 1 is a block diagram of an example deletion prioritization device 100 consistent with disclosed implementations.
  • Deletion prioritization device 100 may comprise a processor 1 10 and a non-transitory machine-readable storage medium 120.
  • Deletion prioritization device 100 may comprise a computing device such as a server computer, a desktop computer, a laptop computer, a handheld computing device, a smart phone, a tablet computing device, a mobile phone, or the like.
  • Device 100 may further comprise a data storage 150.
  • Processor 1 10 may comprise a central processing unit (CPU), a semiconductor-based microprocessor, or any other hardware device suitable for retrieval and execution of instructions stored in machine-readable storage medium 120.
  • processor 1 10 may fetch, decode, and execute a plurality of receive request instructions 130, assign priority instructions 132, and delete data entity instructions 134 to implement the functionality described in detail below.
  • Device 100 may further comprise a plurality of data storage locations 150(A)-(D).
  • Executable instructions may be stored in any portion and/or component of machine-readable storage medium 120.
  • the machine-readable storage medium 120 may comprise both volatile and/or nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power.
  • the machine-readable storage medium 120 and data storage locations 150(A)-(D) may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, and/or a combination of any two and/or more of these memory components.
  • the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), and/or magnetic random access memory (MRAM) and other such devices.
  • the ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), and/or other like memory device.
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • Data storage locations 150(A)-(D) may comprise physical storage devices, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition.
  • a logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
  • Receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations.
  • a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files.
  • Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc.
  • Such data entities may be stored across data storage locations 150(A)-(D) according to a format of the data entity and a type of the storage locations.
  • a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 150(A), a second part of the file stored at a second location on data storage 150(B), a third part of the file stored at a third location on data storage 150(C), and so on.
  • an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
  • Assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority.
  • the information content value may comprise a rating of the density of information stored in a particular location.
  • the information content value may be calculated as the data entity is originally written to storage locations 150(A)-(D), as the data entity is updated or refreshed, and/or at the time the request for deletion is received.
  • a data entity may comprise an MPEG video stream.
  • Such a video stream may be broken into subsets stored in different places comprising a series of frame types comprising l-frames, P-frames, and B-frames.
  • An l-frame comprises all of the information needed to decode the data into a single image frame, and has a high information content value.
  • a P-frame relies on data from a previous frame to complete the image frame rather than comprising all of the necessary data, and so comprises a medium information content value.
  • a B- frame further relies on data from previous and forward frames, and comprises the lowest information content value of the frame types.
  • a text document may be stored as subsets comprising primary text content comprising a high information content value, formatting and/or style information comprising a medium information content value, and empty padding comprising a low information content value.
  • the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations.
  • a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value.
  • a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 150(A)-(D) may be calculated to comprise a low information content value due to its amount of duplication and re-use.
  • these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
  • Delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D).
  • the request to delete the data entity may comprise a request to perform a secure delete on the data entity.
  • a non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system.
  • a secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity.
  • FIG. 2A is a flowchart of a method 200 for deletion prioritization consistent with disclosed implementations. Although execution of method 200 is described below with reference to the components of deletion prioritization device 100, other suitable components for execution of method 200 may be used.
  • Method 200 may start in stage 205 and proceed to stage 210 where device 100 may receive a request to securely delete a data entity, wherein a plurality of subsets of the data entity are stored at a plurality of storage locations. For example, a user may perform a delete command on a directory comprising a plurality of files. Each file and/or portion of any of the files may comprise subsets of the directory data entity to be securely deleted.
  • receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations.
  • a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files.
  • Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc. Such data entities may be stored across data storage locations 150(A)-(D) according to a format of the data entity and a type of the storage locations.
  • a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 150(A), a second part of the file stored at a second location on data storage 150(B), a third part of the file stored at a third location on data storage 150(C), and so on.
  • an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
  • Method 200 may then advance to stage 220 where device 100 may identify an information content value for each of the plurality of subsets of the data entity.
  • the information content value may comprise a rating of the density of information stored in a particular location.
  • the information content value may be calculated as the data entity is originally written to storage locations 150(A)-(D), as the data entity is updated or refreshed, and/or at the time the request for deletion is received.
  • a text document may be stored as subsets comprising primary text content comprising a high information content value, formatting and/or style information comprising a medium information content value, and empty padding comprising a low information content value.
  • a directory data entity may comprise subsets of individual files wherein the information content value of each file is based on the file's size, update frequency, and/or last update date/time.
  • the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations.
  • a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value.
  • a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 150(A)-(D) may be calculated to comprise a low information content value due to its amount of duplication and re-use.
  • these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
  • Method 200 may then advance to stage 225 where device 100 may assign a priority to each of the plurality of subsets of the data entity according to the respective information content value.
  • assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority.
  • Method 200 may then advance to stage 225 where device 100 may perform a secure delete of a first subset of the plurality of subsets of the data entity comprising a highest assigned priority.
  • the largest file or least compressible block in a data entity may comprise the highest information content value and so be the first subset of the data entity to be deleted.
  • delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D).
  • the request to delete the data entity may comprise a request to perform a secure delete on the data entity.
  • a non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system.
  • a secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity.
  • stage 225 may be repeated for each subsequently lower priority subset of data until all subsets of the data entity have been deleted.
  • method 200 may then end at stage 230.
  • FIG. 2B is a flowchart of a method 250 for an example of a method for a deletion operation consistent with disclosed implementations. Although execution of method 200 is described below with reference to the components of deletion prioritization device 100, other suitable components for execution of method 200 may be used. In some implementations, method 250 may be performed by device 100 as part of stage 225 of method 200
  • Method 250 may start in stage 255 and proceed to stage 270 where device 100 may securely delete a first subset of a data entity.
  • a secure delete may, for example, overwrite the addressable memory on the data storage locations 150(A)-(D) for the first subset of the data entity with other data, such as randomly generated bits, to prevent recovery of the first subset of the data entity.
  • stage 225 may be repeated for each subsequently lower priority subset of data until all subsets of the data entity have been deleted.
  • Method 250 may advance to stage 265 where device 100 may determine whether a second secure delete of a second subset of the plurality of subsets of the data entity would impact the secure delete of the first subset of the plurality of subsets of the data entity.
  • first subset may be stored at data storage location 150(A) and second subset may be stored at data storage location 150(B). If locations 150(A) and 150(B) are located on the same physical disk, then performing a secure delete at a second location on the same physical disk may be determined to impact the speed and/or performance of the secure delete at the first location.
  • a performance impact on the first delete may be determined to occur.
  • a second processor is available and/or the second subset is located on a separate physical disk, then no performance impact may be determined to occur.
  • Other performance impacting factors may also be considered, such as network bandwidth availability for network attached storage locations.
  • method 250 may return to stage 260 until the first deletion is completed. Otherwise, method 250 may proceed to stage 270 where device 100 may perform the second secure delete of the second subset of the plurality of subsets of the data entity.
  • the second subset may comprise the next highest assigned priority among the subsets of the data entity to be deleted. Method 250 may then end at stage 275.
  • FIG. 3 is a block diagram of a system 300 for deletion prioritization.
  • System 300 may comprise a computing device 310 comprising an intake engine 315, a content size engine 320, and a deletion engine 325.
  • System 300 may further comprise a plurality of data storage locations 340(A)-(C).
  • Data storage locations 340(A)-(C) may comprise, for example, physical storage devices, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition.
  • a logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
  • Computing device 310 may comprise, for example, a general and/or special purpose computer, server, mainframe, desktop, laptop, tablet, smart phone, game console, and/or any other system capable of providing computing capability consistent with providing the implementations described herein.
  • Data storage 340(A)-(C) may each comprise a physical storage device, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition.
  • a logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
  • Each of engines 315, 320, and 325 may comprise any combination of hardware and programming to implement the functionalities of the respective engine.
  • the programming for the engines may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the engines may include a processing resource to execute those instructions.
  • the machine- readable storage medium may store instructions that, when executed by the processing resource, implement engines 315, 320, and 325.
  • system 300 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine- readable storage medium may be separate but accessible to system 300 and the processing resource.
  • Intake engine 315 may receive a data entity to be stored, allocate a subset of the data entity to each of a plurality of storage locations, and write each subset of the data entity to the respective each of the plurality of storage locations. Intake engine 315 may further receive an update to the data entity and write each updated subset of the data entity to the respective each of the plurality of storage locations. For example, intake engine 315 may receive a document file comprising a content subset, a template subset, and a metadata subset. The content subset may be allocated and written to data storage 340(A), the template subset may be allocated and written to data storage 340(B), and the metadata subset may be written to data storage 340(C). In some implementations, subsets of the data entity need not comprise specific sections such as content and metadata, but may simply comprise chunks of bytes written in separate locations that may be reassembled into the data entity.
  • Content value engine 320 calculate an information content value for each of the subsets of the data entity according to a compressibility of each of the subsets of the data entity and update the information content value for each of the subsets of the data entity according to the update of the data entity.
  • the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations.
  • a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value.
  • a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 340(A)-(C) may be calculated to comprise a low information content value due to its amount of duplication and re-use.
  • these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
  • assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority.
  • the information content value may comprise a rating of the density of information stored in a particular location.
  • the information content value may be calculated as the data entity is originally written to storage locations 340(A)-(C), as the data entity is updated or refreshed, and/or at the time the request for deletion is received.
  • Deletion engine 325 may receive a request to securely delete the data entity from the plurality of storage locations, assign a priority to each of the subsets of the data entity according to the updated information content value for each of the subsets of the data entity, and securely delete each of the subsets of the data entity in order of the assigned priority for each of the subsets of the data entity.
  • receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations.
  • a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files.
  • Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc.
  • Such data entities may be stored across data storage locations 340(A)-(C) according to a format of the data entity and a type of the storage locations.
  • a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 340(A), a second part of the file stored at a second location on data storage 340(B), a third part of the file stored at a third location on data storage 340(C), and so on.
  • an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
  • Delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D).
  • the request to delete the data entity may comprise a request to perform a secure delete on the data entity.
  • a non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system.
  • a secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity.
  • the disclosed examples may include systems, devices, computer- readable storage media, and methods for deletion prioritization. For purposes of explanation, certain examples are described with reference to the components illustrated in the Figures. The functionality of the illustrated components may overlap, however, and may be present in a fewer or greater number of elements and components. Further, all or part of the functionality of illustrated elements may coexist or be distributed among several geographically dispersed locations. Moreover, the disclosed examples may be implemented in various environments and are not limited to the illustrated examples.

Abstract

Examples disclosed herein relate to deletion prioritization instructions to receive a request to delete a data entity stored across a plurality of storage locations, assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations and delete the data entity stored at each of the plurality of storage locations in order of the assigned priority.

Description

DELETION PRIORITIZATION
BACKGROUND
[0001 ] Computer systems include storage devices to store parts of a data entity such as file, database table, memory page, communication stream, etc for subsequent retrieval. In some situations, a data stream may be written and/or stored as separate parts in multiple storage locations.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] In the accompanying drawings, like numerals refer to like components or blocks. The following detailed description references the drawings, wherein:
[0003] FIG. 1 is a block diagram of an example deletion prioritization device;
[0004] FIG. 2A is a flowchart of an example of a method for deletion prioritization;
[0005] FIG. 2B is a flowchart of an example of a method for a deletion operation; and
[0006] FIG. 3 is a block diagram of an example system for deletion prioritization.
DETAILED DESCRIPTION
[0007] As described above, computer systems include storage devices to store parts of a data entity such as file, database table, memory page, communication stream, etc for subsequent retrieval. In some situations, a data stream may be written and/or stored as separate parts in multiple storage locations. Deletion prioritization may be used when a data entity is requested to be deleted. As a data stream is stored, it may be written to multiple physical and/or logical storage locations (e.g., different hard drives or memory segments). Deletion prioritization assigns an order to the storage locations so that if the deletion operation is interrupted, the highest priority of these locations will have been deleted.
[0008] In the description that follows, reference is made to the term, "machine- readable storage medium." As used herein, the term "machine-readable storage medium" refers to any electronic, magnetic, optical, or other physical storage device that stores executable instructions or other data (e.g., a hard disk drive, random access memory, flash memory, etc.).
[0009] Referring now to the drawings, FIG. 1 is a block diagram of an example deletion prioritization device 100 consistent with disclosed implementations. Deletion prioritization device 100 may comprise a processor 1 10 and a non-transitory machine-readable storage medium 120. Deletion prioritization device 100 may comprise a computing device such as a server computer, a desktop computer, a laptop computer, a handheld computing device, a smart phone, a tablet computing device, a mobile phone, or the like. Device 100 may further comprise a data storage 150.
[0010] Processor 1 10 may comprise a central processing unit (CPU), a semiconductor-based microprocessor, or any other hardware device suitable for retrieval and execution of instructions stored in machine-readable storage medium 120. In particular, processor 1 10 may fetch, decode, and execute a plurality of receive request instructions 130, assign priority instructions 132, and delete data entity instructions 134 to implement the functionality described in detail below. Device 100 may further comprise a plurality of data storage locations 150(A)-(D).
[001 1 ] Executable instructions may be stored in any portion and/or component of machine-readable storage medium 120. The machine-readable storage medium 120 may comprise both volatile and/or nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power.
[0012] The machine-readable storage medium 120 and data storage locations 150(A)-(D) may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, and/or a combination of any two and/or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), and/or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), and/or other like memory device.
[0013] Data storage locations 150(A)-(D) may comprise physical storage devices, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition. A logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
[0014] Receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations. For example, a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files. Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc. Such data entities may be stored across data storage locations 150(A)-(D) according to a format of the data entity and a type of the storage locations. For example, a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 150(A), a second part of the file stored at a second location on data storage 150(B), a third part of the file stored at a third location on data storage 150(C), and so on. For another example, an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
[0015] Assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority. The information content value may comprise a rating of the density of information stored in a particular location. The information content value may be calculated as the data entity is originally written to storage locations 150(A)-(D), as the data entity is updated or refreshed, and/or at the time the request for deletion is received.
[0016] For example, a data entity may comprise an MPEG video stream. Such a video stream may be broken into subsets stored in different places comprising a series of frame types comprising l-frames, P-frames, and B-frames. An l-frame comprises all of the information needed to decode the data into a single image frame, and has a high information content value. A P-frame relies on data from a previous frame to complete the image frame rather than comprising all of the necessary data, and so comprises a medium information content value. A B- frame further relies on data from previous and forward frames, and comprises the lowest information content value of the frame types. Similarly, a text document may be stored as subsets comprising primary text content comprising a high information content value, formatting and/or style information comprising a medium information content value, and empty padding comprising a low information content value.
[0017] In some implementations, the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations. For example, a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value. For another example, a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 150(A)-(D) may be calculated to comprise a low information content value due to its amount of duplication and re-use. In some implementations, these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
[0018] Delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D). In some implementations, the request to delete the data entity may comprise a request to perform a secure delete on the data entity. A non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system. A secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity.
[0019] FIG. 2A is a flowchart of a method 200 for deletion prioritization consistent with disclosed implementations. Although execution of method 200 is described below with reference to the components of deletion prioritization device 100, other suitable components for execution of method 200 may be used.
[0020] Method 200 may start in stage 205 and proceed to stage 210 where device 100 may receive a request to securely delete a data entity, wherein a plurality of subsets of the data entity are stored at a plurality of storage locations. For example, a user may perform a delete command on a directory comprising a plurality of files. Each file and/or portion of any of the files may comprise subsets of the directory data entity to be securely deleted.
[0021 ] In some embodiments, receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations. For example, a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files. Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc. Such data entities may be stored across data storage locations 150(A)-(D) according to a format of the data entity and a type of the storage locations. For example, a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 150(A), a second part of the file stored at a second location on data storage 150(B), a third part of the file stored at a third location on data storage 150(C), and so on. For another example, an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
[0022] Method 200 may then advance to stage 220 where device 100 may identify an information content value for each of the plurality of subsets of the data entity. The information content value may comprise a rating of the density of information stored in a particular location. The information content value may be calculated as the data entity is originally written to storage locations 150(A)-(D), as the data entity is updated or refreshed, and/or at the time the request for deletion is received. For example, a text document may be stored as subsets comprising primary text content comprising a high information content value, formatting and/or style information comprising a medium information content value, and empty padding comprising a low information content value. For another example, a directory data entity may comprise subsets of individual files wherein the information content value of each file is based on the file's size, update frequency, and/or last update date/time. [0023] In some implementations, the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations. For example, a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value. For another example, a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 150(A)-(D) may be calculated to comprise a low information content value due to its amount of duplication and re-use. In some implementations, these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
[0024] Method 200 may then advance to stage 225 where device 100 may assign a priority to each of the plurality of subsets of the data entity according to the respective information content value. In some implementations, assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority.
[0025] Method 200 may then advance to stage 225 where device 100 may perform a secure delete of a first subset of the plurality of subsets of the data entity comprising a highest assigned priority. For example, the largest file or least compressible block in a data entity may comprise the highest information content value and so be the first subset of the data entity to be deleted.
[0026] In some implementations, delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D). In some implementations, the request to delete the data entity may comprise a request to perform a secure delete on the data entity. A non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system. A secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity. In some implementations, stage 225 may be repeated for each subsequently lower priority subset of data until all subsets of the data entity have been deleted.
[0027] After deleting the subset(s) of the data entity at stage 225, method 200 may then end at stage 230.
[0028] FIG. 2B is a flowchart of a method 250 for an example of a method for a deletion operation consistent with disclosed implementations. Although execution of method 200 is described below with reference to the components of deletion prioritization device 100, other suitable components for execution of method 200 may be used. In some implementations, method 250 may be performed by device 100 as part of stage 225 of method 200
[0029] Method 250 may start in stage 255 and proceed to stage 270 where device 100 may securely delete a first subset of a data entity. A secure delete may, for example, overwrite the addressable memory on the data storage locations 150(A)-(D) for the first subset of the data entity with other data, such as randomly generated bits, to prevent recovery of the first subset of the data entity. In some implementations, stage 225 may be repeated for each subsequently lower priority subset of data until all subsets of the data entity have been deleted.
[0030] Method 250 may advance to stage 265 where device 100 may determine whether a second secure delete of a second subset of the plurality of subsets of the data entity would impact the secure delete of the first subset of the plurality of subsets of the data entity. For example, first subset may be stored at data storage location 150(A) and second subset may be stored at data storage location 150(B). If locations 150(A) and 150(B) are located on the same physical disk, then performing a secure delete at a second location on the same physical disk may be determined to impact the speed and/or performance of the secure delete at the first location. Similarly, if only one CPU is available for performing the stages of method 250, a performance impact on the first delete may be determined to occur. In contrast, if a second processor is available and/or the second subset is located on a separate physical disk, then no performance impact may be determined to occur. Other performance impacting factors may also be considered, such as network bandwidth availability for network attached storage locations.
[0031 ] If the second delete operation is determined to impact the performance of the first delete operation, method 250 may return to stage 260 until the first deletion is completed. Otherwise, method 250 may proceed to stage 270 where device 100 may perform the second secure delete of the second subset of the plurality of subsets of the data entity. In some implementations, the second subset may comprise the next highest assigned priority among the subsets of the data entity to be deleted. Method 250 may then end at stage 275.
[0032] FIG. 3 is a block diagram of a system 300 for deletion prioritization. System 300 may comprise a computing device 310 comprising an intake engine 315, a content size engine 320, and a deletion engine 325. System 300 may further comprise a plurality of data storage locations 340(A)-(C).
[0033] Data storage locations 340(A)-(C) may comprise, for example, physical storage devices, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition. A logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
[0034] Computing device 310 may comprise, for example, a general and/or special purpose computer, server, mainframe, desktop, laptop, tablet, smart phone, game console, and/or any other system capable of providing computing capability consistent with providing the implementations described herein. Data storage 340(A)-(C) may each comprise a physical storage device, such as a hard disk drive and/or a solid state drive, and/or a logical storage device, such as a database, a user-based logical data storage (e.g., a user's home directory and associated files), a network-attached storage, and a logical partition. A logical storage device may comprise data stored on part of and/or across a plurality of physical storage devices.
[0035] Each of engines 315, 320, and 325 may comprise any combination of hardware and programming to implement the functionalities of the respective engine. In examples described herein, such combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the engines may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the engines may include a processing resource to execute those instructions. In such examples, the machine- readable storage medium may store instructions that, when executed by the processing resource, implement engines 315, 320, and 325. In such examples, system 300 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine- readable storage medium may be separate but accessible to system 300 and the processing resource.
[0036] Intake engine 315 may receive a data entity to be stored, allocate a subset of the data entity to each of a plurality of storage locations, and write each subset of the data entity to the respective each of the plurality of storage locations. Intake engine 315 may further receive an update to the data entity and write each updated subset of the data entity to the respective each of the plurality of storage locations. For example, intake engine 315 may receive a document file comprising a content subset, a template subset, and a metadata subset. The content subset may be allocated and written to data storage 340(A), the template subset may be allocated and written to data storage 340(B), and the metadata subset may be written to data storage 340(C). In some implementations, subsets of the data entity need not comprise specific sections such as content and metadata, but may simply comprise chunks of bytes written in separate locations that may be reassembled into the data entity.
[0037] Content value engine 320 calculate an information content value for each of the subsets of the data entity according to a compressibility of each of the subsets of the data entity and update the information content value for each of the subsets of the data entity according to the update of the data entity.
[0038] In some implementations, the information content value may be calculated according to a compressibility, an update frequency, a last update date/time, a size, and/or a link count of a subset of the data entity stored at each of the plurality of storage locations. For example, a data entity with a high compressibility may be calculated to comprise a low information content value as higher compressibility generally indicates long strings of identical bits (e.g., long series of 0s or 1 s) rather than unique data while data entities comprising frequent and/or recent updates may be calculated to have high information content value. For another example, a file comprising a high number of symbolic links and/or incorporated into a high number of other data entities (e.g., a stylesheet) on data storage locations 340(A)-(C) may be calculated to comprise a low information content value due to its amount of duplication and re-use. In some implementations, these factors may be combined to calculate a final information content value according to a weighting and/or scoring algorithm.
[0039] In some implementations, assign priority instructions 132 may assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations. For example, the locations may be assigned their priority with the location having the highest information content value being assigned the highest priority. The information content value may comprise a rating of the density of information stored in a particular location. The information content value may be calculated as the data entity is originally written to storage locations 340(A)-(C), as the data entity is updated or refreshed, and/or at the time the request for deletion is received.
[0040] Deletion engine 325 may receive a request to securely delete the data entity from the plurality of storage locations, assign a priority to each of the subsets of the data entity according to the updated information content value for each of the subsets of the data entity, and securely delete each of the subsets of the data entity in order of the assigned priority for each of the subsets of the data entity.
[0041 ] For example, receive request instructions 130 may receive a request to delete a data entity stored across a plurality of storage locations. For example, a user, application, service and/or process may request a deletion of a data entity such as a file and/or a group of files. Other data entities may comprise, for example, database entries, memory pages, network streams, application packages, etc. Such data entities may be stored across data storage locations 340(A)-(C) according to a format of the data entity and a type of the storage locations. For example, a large file may be fragmented, with a first part of the file stored at a first memory location on data storage 340(A), a second part of the file stored at a second location on data storage 340(B), a third part of the file stored at a third location on data storage 340(C), and so on. For another example, an application may store a data entity in a particular file format that combines content data with template and/or style data stored in different locations, such as a web page file that comprises a link to a stylesheet file.
[0042] Delete data entity instructions 134 may delete the data entity stored at each of the plurality of storage locations in order of the assigned priority. For example, a user may execute a delete command on a file, application program, database row or table, a directory, etc. An operating system for device 100 may interpret the delete command as a request to remove the target data entity from wherever it is written and/or stored on data storage locations 150(A)-(D). In some implementations, the request to delete the data entity may comprise a request to perform a secure delete on the data entity. A non-secure delete may, for example, simply remove an entry in a file system providing the location of the data entity to an application and/or operating system. A secure delete may, for example, overwrite the actual memory on the data storage locations 150(A)-(D) for the data entity with other data, such as randomly generated bits, to prevent recovery of the data entity.
[0043] The disclosed examples may include systems, devices, computer- readable storage media, and methods for deletion prioritization. For purposes of explanation, certain examples are described with reference to the components illustrated in the Figures. The functionality of the illustrated components may overlap, however, and may be present in a fewer or greater number of elements and components. Further, all or part of the functionality of illustrated elements may coexist or be distributed among several geographically dispersed locations. Moreover, the disclosed examples may be implemented in various environments and are not limited to the illustrated examples.
[0044] Moreover, as used in the specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise. Additionally, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. Instead, these terms are only used to distinguish one element from another.
[0045] Further, the sequence of operations described in connection with the Figures are examples and are not intended to be limiting. Additional or fewer operations or combinations of operations may be used or may vary without departing from the scope of the disclosed examples. Thus, the present disclosure merely sets forth possible examples of implementations, and many variations and modifications may be made to the described examples. All such modifications and variations are intended to be included within the scope of this disclosure and protected by the following claims.

Claims

CLAIMS We claim:
1 . A non-transitory machine-readable storage medium comprising instructions for deletion prioritization which, when executed by a processor, cause the processor to:
receive a request to delete a data entity stored across a plurality of storage locations;
assign a priority to each of the plurality of storage locations according to an information content value associated with each of the plurality of storage locations; and
delete the data entity stored at each of the plurality of storage locations in order of the assigned priority.
2. The non-transitory machine-readable medium of claim 1 , further comprising calculating the information content value according to a compressibility of a subset of the data entity stored at each of the plurality of storage locations.
3. The non-transitory machine-readable medium of claim 1 , further comprising calculating the information content value according to an update frequency of a subset of the data entity stored at each of the plurality of storage locations.
4. The non-transitory machine-readable medium of claim 1 , further comprising calculating the information content value according to a last update date of a subset of the data entity stored at each of the plurality of storage locations.
5. The non-transitory machine-readable medium of claim 1 , wherein the information content value is calculated upon receiving a request to store the data entity to the plurality of storage locations.
6. The non-transitory machine-readable medium of claim 5, further comprising recalculating the information content value upon receiving a request to update the data entity stored across a plurality of storage locations.
7. The non-transitory machine-readable medium of claim 1 , wherein the request to delete the data entity comprises a request to perform a secure deletion.
8. A computer-implemented method for deletion prioritization comprising:
receiving a request to securely delete a data entity, wherein a plurality of subsets of the data entity are stored at a plurality of storage locations;
identifying an information content value for each of the plurality of subsets of the data entity;
assigning a priority to each of the plurality of subsets of the data entity according to the respective information content value; and
performing a secure delete of a first subset of the plurality of subsets of the data entity comprising a highest assigned priority.
9. The computer-implemented method of claim 8, wherein the information content value comprises a value calculated as the plurality of subsets of the data entity are stored at the plurality of storage locations.
10. The computer-implemented method of claim 9, further comprising recalculating the calculated value as an updated plurality of subset of the data entity are stored at the plurality of storage locations.
1 1 . The computer-implemented method of claim 8, wherein the information content value comprises a value calculated according to a compressibility of each of the plurality of subsets of the data entity.
12. The computer-implemented method of claim 8, further comprising: determining whether a second secure delete of a second subset of the plurality of subsets of the data entity would impact the secure delete of the first subset of the plurality of subsets of the data entity; and
in response to determining that the second secure delete of the second subset of the plurality of subsets of the data entity would not impact the secure delete of the first subset of the plurality of subsets of the data entity, performing the second secure delete of the second subset of the plurality of subsets of the data entity.
13. The computer implemented method of claim 12, wherein the second subset of the plurality of subsets of the data entity comprises a next highest assigned priority.
14. The computer implemented method of claim 12, wherein determining whether the second secure delete of the second subset of the plurality of subsets of the data entity would impact the secure delete of the first subset of the plurality of subsets of the data entity comprises determining whether the second subset of the plurality of subsets of the data entity is stored on a different physical storage device than the first subset of the plurality of subsets of the data entity.
15. A system for deletion prioritization, comprising:
an intake engine to:
receive a data entity to be stored,
allocate a subset of the data entity to each of a plurality of storage locations,
write each subset of the data entity to the respective each of the plurality of storage locations,
receive an update to the data entity, and
write each updated subset of the data entity to the respective each of the plurality of storage locations;
a content value engine to:
calculate an information content value for each of the subsets of the data entity according to a compressibility of each of the subsets of the data entity, and
update the information content value for each of the subsets of the data entity according to the update of the data entity; and a deletion engine to:
receive a request to securely delete the data entity from the plurality of storage locations,
assign a priority to each of the subsets of the data entity according to the updated information content value for each of the subsets of the data entity, and
securely delete each of the subsets of the data entity in order of the assigned priority for each of the subsets of the data entity.
PCT/US2015/030937 2015-05-15 2015-05-15 Deletion prioritization WO2016186602A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/030937 WO2016186602A1 (en) 2015-05-15 2015-05-15 Deletion prioritization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/030937 WO2016186602A1 (en) 2015-05-15 2015-05-15 Deletion prioritization

Publications (1)

Publication Number Publication Date
WO2016186602A1 true WO2016186602A1 (en) 2016-11-24

Family

ID=57318921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/030937 WO2016186602A1 (en) 2015-05-15 2015-05-15 Deletion prioritization

Country Status (1)

Country Link
WO (1) WO2016186602A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11567697B2 (en) 2020-11-12 2023-01-31 International Business Machines Corporation Prioritization of stored data during a delete process in a virtualized storage system
US11853576B2 (en) 2021-09-09 2023-12-26 Hewlett Packard Enterprise Development Lp Deleting data entities and deduplication stores in deduplication systems

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002046887A2 (en) * 2000-10-23 2002-06-13 Xyron Corporation Concurrent-multitasking processor
US20080059692A1 (en) * 2006-09-04 2008-03-06 Sandisk Il Ltd. Device for prioritized erasure of flash memory
US20110113432A1 (en) * 2007-10-31 2011-05-12 Microsoft Corporation Compressed storage management
US20120216006A1 (en) * 2007-12-25 2012-08-23 Canon Kabushiki Kaisha Information processing apparatus and information processing method that selects data to be deleted without a user having to perform a delete operation
US20130018852A1 (en) * 2011-07-15 2013-01-17 International Business Machines Corporation Deleted data recovery in data storage systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002046887A2 (en) * 2000-10-23 2002-06-13 Xyron Corporation Concurrent-multitasking processor
US20080059692A1 (en) * 2006-09-04 2008-03-06 Sandisk Il Ltd. Device for prioritized erasure of flash memory
US20110113432A1 (en) * 2007-10-31 2011-05-12 Microsoft Corporation Compressed storage management
US20120216006A1 (en) * 2007-12-25 2012-08-23 Canon Kabushiki Kaisha Information processing apparatus and information processing method that selects data to be deleted without a user having to perform a delete operation
US20130018852A1 (en) * 2011-07-15 2013-01-17 International Business Machines Corporation Deleted data recovery in data storage systems

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11567697B2 (en) 2020-11-12 2023-01-31 International Business Machines Corporation Prioritization of stored data during a delete process in a virtualized storage system
US11853576B2 (en) 2021-09-09 2023-12-26 Hewlett Packard Enterprise Development Lp Deleting data entities and deduplication stores in deduplication systems

Similar Documents

Publication Publication Date Title
US10031675B1 (en) Method and system for tiering data
EP3108371B1 (en) Modified memory compression
US10635359B2 (en) Managing cache compression in data storage systems
US8799238B2 (en) Data deduplication
WO2016086819A1 (en) Method and apparatus for writing data into shingled magnetic record smr hard disk
US9330105B1 (en) Systems, methods, and computer readable media for lazy compression of data incoming to a data storage entity
US10747678B2 (en) Storage tier with compressed forward map
US8352447B2 (en) Method and apparatus to align and deduplicate objects
US20170293450A1 (en) Integrated Flash Management and Deduplication with Marker Based Reference Set Handling
CN104636266B (en) Cover tile magnetic recording hard disk, cover tile magnetic recording hard disk write method and the device of data
US9772790B2 (en) Controller, flash memory apparatus, method for identifying data block stability, and method for storing data in flash memory apparatus
KR20130108298A (en) Card-based management of discardable files
US8886901B1 (en) Policy based storage tiering
US9658774B2 (en) Storage system and storage control method
US11093453B1 (en) System and method for asynchronous cleaning of data objects on cloud partition in a file system with deduplication
WO2016186602A1 (en) Deletion prioritization
CN108334457B (en) IO processing method and device
US9658926B1 (en) Systems and methods for dynamic save streams
US7565483B2 (en) Method and apparatus for exchanging data with a hard disk
US11163446B1 (en) Systems and methods of amortizing deletion processing of a log structured storage based volume virtualization
US20150127891A1 (en) Write performance preservation with snapshots
US20170046093A1 (en) Backup storage
JP5494817B2 (en) Storage system, data management apparatus, method and program
CN109977121B (en) Big data rapid storage system
JP2010191903A (en) Distributed file system striping class selecting method and distributed file system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892710

Country of ref document: EP

Kind code of ref document: A1