ReMIX Project: A Reconfigurable Memory for Indexing Mass of Data
Image
Goal
- The ReMIX project aims to design an original memory architecture for both storing very large indexed data structures, and allowing fast information retrieval.
Technology
- The ReMIX project combines two technologies:
- FLASH memories: to provide a large data capacity together with a fast access
- FPGA devices: to tailor indexing search to the memory
Applications
- Applications focus on content-based search, especially in the field of genomics, images ant text processing.
Status (mai 2006)
- A ReMIX system of 512 Gbytes of FLASH memory is currently tested. 8 RMEM boards of 64 Gbytes each are plugged into a 5 node cluster.
Context
Indexing is a well-known technique that accelerates searches within large volumes of data, such as the ones needed by applications related to genomics, to content-based image or text retrieval. Very large index (larger than the main memory capacity) are generally stored on magnetic disks. In that case, the design of indexes is fully disk-oriented, since minimizing disk I/Os is the key point to reduce response times. Therefore, disk-oriented design indirectly impacts the search algorithms that navigate within the index since they have to favor sequential patterns , avoiding as much as possible any random access to data.
ReMIX Idea
The ReMIX project proposes the design of a dedicated and very large index memory (several hundreds of Giga bytes), big enough to entirely store huge indexes. The use of an almost unlimited memory raises completely new issues when designing indexes. Furthemore, it allows to entirely revisit the principles that are at the root of almost all existing indexing strategies. Here, within this scheme, direct access to data, massive parallel processing, huge data redundancy, pre-computed structures, etc., can be advantageously promoted to speed-up the search.
Reconfigurable Resources
The index memory includes reconfigurable hardware resources to tailor at a hardware level the memory management to best support the specific properties of each indexing scheme. It also offers the opportunity to implement generic paxil again, at a hardware level algorithms having interesting potential parallelism for processing data directly from the output of the index memory. As an example, image indexing requires massive distances calculation between image descriptors: this kind of calculation can be directly performed by the reconfigurable index memory.
FLASH technology
Characteristics of the index we manipulate are both their large volume and their relative stability. Indexing huge amount of data (several gigabytes) takes time and is not performed continuously. An index can be recomputed every day, every week or each time a new data release is available. Consequently, the storage device only need to support a raisonable number of write operations, while allowing illimited read accesses. The FLASH memory technology fit these requirements. In addition, the memory capacity is high (more than 1 Gbtes per chip) and the access time is low compared to magnetic disks.
ReMIX Memory Specificity
It is important to point out that this new memory architecture is far from being a simple memory extension to substantially increase the memory capacity of a standard computer. The reasons are the following:
- The reconfigurable index memory is not a simple storage device. It is enhanced with additional reconfigurable hardware resources for tailoring its use according to the index characteristics and to the data it manipulates.
- The reconfigurable index memory does not fit in the addressing space of the processor. It is indirectly accessed by specific queries submitted by the processor in order to execute crucial and costly indexing subroutines.
- The reconfigurable index memory does not hold any cache hierarchy, and therefore memory accesses do not have to worry about the data locality. Memory read operations have a unique cost, whatever the memory address, and whatever the previous memory accesses.
- Due to the FLASH technology, writing operation are limitted. It only aims to periodicaly store huge volume of data while allowing unlimitted read access.
Lascia un commento
You must be logged in to post a comment.