Join the Community
and take part in the story

Files on a volume?


#1

Hi -

Is there a way to get the list of files located on a specific rawx?

Thanks,


Olivier


#2

Hello @olc,

While your question got me thinking for a couple of days on a technical solution to your problem, I believe that the need for such a listing can be misleading in itself.

If I understand correctly, you want to know what objects have been handled by a specific rawx service. However, object data can lead to multiple chunks being created, e.g. for a three-times replication, the total number of chunks will be 3 * ((object_size // chunksize) + (object_size % chunksize > 0)?1:0 ) (where // is the integer division, and % is the modulo). So for an object of 35MB with a chunksize of 10MB stored with THREECOPIES you would get 12 chunks handled by any 12 rawx services matching location criteria (distance > 1).

So a particular rawx instance can potentially store a copy of a chunk, but it carries no guarantee that it contains the raw data comprising the whole object you have stored.

While I won’t give you the details of how to implement this, I can still give you the method I’d use: by listing meta2 chunk locations of all the containers, and filtering them to get only the ones associated with your particular rawx service, and then cross-referencing these chunks to find the objects they are associated with, you could effectively get a list of objects which have at least one chunk handled by a particular rawx.

Now as we are not using consistent hashing to predetermine where the data is going to land this operation is indeed time and resource consuming, which leads to my question: what is the real need behind such an operation?

If it is a matter of trust in the data to be properly secured, I can guarantee you it is with the distance and location constraints imposed on all chunk replicas. Moreover, you will find that all volumes of a properly configured and perfectly homogenous cluster will be statistically filled at the same rate, thanks to a smart scoring system and proper load balancing on the grid.