Secure document storage and retrieval is one of the hottest research directions in cloud computing. Though many searchable encryption schemes have been proposed, few of them support efficient retrieval over the documents which are encrypted based on their attributes. In this paper, a hier- archical attribute-based encryption scheme is first designed for a document collection. A set of documents can be encrypted together if they share an integrated access structure. Compared with the ciphertext-policy attribute-based encryption (CP-ABE) schemes, both the ciphertext storage space and time costs of encryption/decryption are saved. Then, an index structure named attribute-based retrieval features (ARF) tree is constructed for the document collection based on the TF-IDF model and the documents� attributes. A depth-first search algorithm for the ARF tree is designed to improve the search efficiency which can be further improved by parallel computing. Except for the document collections, our scheme can be also applied to other datasets by modifying the ARF tree slightly. A thorough analysis and a series of experiments are performed to illustrate the security and efficiency of the proposed scheme.
we attempt to design a fine-grained access control mechanism for the encrypted documents which also support efficient document search. The search result of a query is defined as the top-k relevant encrypted documents with legal attributes. The process of executing a document query is presented and it is mainly composed of five stages
The data owner is responsible for collecting and pre- processing the documents, and then obtains a set of high quality files F. He sets the attributes for each document and then hierarchically encrypts the document collection based on attributes. In addition, an index vector is extracted from each document based on the document�s content and attributes. An index structure I is constructed based on the index vectors of the documents. At last, both the encrypted documents C and encrypted index structure are sent to the cloud server. The cloud server is responsible for storing the encrypted documents and executing document search based on the index structure.
When a data user wants to search a set of interested documents, she first needs to register herself as an authorized data user at the certificate authority (CA) center. Then, if possible, several attributes selected from A are assigned to the data user by CA and a corresponding secret key associated with these attributes is sent to the data user. At last, the data user can send a query request Q to the cloud server.
Once a query is received from a data user, the cloud server first communicates with the CA to check the legality of the data user and her attributes. If the data user is authorized, the cloud server searches the index structure to obtain the search result SR. Then the corresponding encrypted docu- ments are extracted from the encrypted document collection C and sent to the data user. At last, the data user decrypts the documents by her secret key. Note that, the legality checking functionality is optional which can be employed to improve the security level of the whole system. With legality checking, the data users who didn�t register themselves in the CA center cannot search the interested documents through the cloud server. However, the security of the system doesn�t greatly decrease without this functionality and it can be explained by the fact that the illegal data users cannot decrypt the documents returned by the cloud server because they don�t have the secret keys.
An intuitive approach is encrypting the documents first and then outsourcing the encrypted documents to the cloud. A large number of searchable document encryption schemes have been proposed in the literatures, including single keyword Boolean search schemes, single keyword ranked search schemes and multi-keyword Boolean search schemes . However, all these schemes cannot support effective, flexible and efficient document search because of their sim- ple functionalities.
Privacy-preserving multi-keyword ranked document search schemes are more promising and practical. However, all the documents in these schemes are organized by a coarse-grained access control mechanism, i.e., each authorized data user can access all the encrypted documents. As an example, the whole IEEE Xplore Digital Library can be accessed by all the authorized organizations (e.g., the universities) at present and this cannot satisfy the data owners and users in the future.
� A practical hierarchical attribute-based document collec- tion encryption scheme is proposed in which the documents are organized and controlled based on attributes. The proposed scheme can greatly decrease the storage and computing bur- dens. � We map the documents to vectors in which both the keywords and associated attributes are considered. The ARF tree is proposed to organize the document vectors and support time-efficient document retrieval. In addition, a depth-first search algorithm is designed.
� A thorough simulation is performed to illustrate the securi- ty, efficiency and effectiveness of our scheme. Specifically, the proposed encryption scheme performs very well in both time and storage efficiency. In addition, our scheme also provides efficient and accurate document retrieval method.
we consider a new encrypted document retrieval scenario in which the data owner wants to control the documents in fine-grained level. To support this service, we first design a novel hierarchical attribute-based document encryption scheme to encrypt a set of documents together that share an integrated access structure.
Further, the ARF tree is proposed to organize the document vectors based on their similarities. At last, a depth-first search algorithm is designed to improve the search efficiency for the data users which is extremely important for large document collections. The performance of the approach is thoroughly evaluated by both theoretical analysis and experiments.