CosmoSim Blog

Three new tables are now available via the query interface for the MDPL2 simulation:

These tables contain galaxy properties from different semi-analytical codes: Galacticus (Benson 2012), SAGE (Croton et al. 2016) and SAG (a modified version of Springel et al. 2001). All of them are based on the MDPL2.Rockstar dark matter halo catalog and the merger tree information therein.

If you want to retrieve all galaxies that reside within a given dark matter halo along with the halo’s properties, you can join the tables either using
rockstarId or depthFirstId. Since there are database indexes on both columns, this should be fast, e.g. for the halo with rockstarId = 12663401752:

SELECT s.*, r.* FROM MDPL2.SAG s, MDPL2.Rockstar r
WHERE r.rockstarId = 12663401752
AND s.rockstarId = r.rockstarId 

The depthFirstId is especially useful if you need to extract galaxy information for main branches or complete merger trees, also see this blog post on galaxy tree queries.
For retrieving the galaxies for all progenitor halos of a given dark matter halo, do queries like this:

SELECT s.snapnum, s.redshift, s.rockstarId, s.depthFirstId, s.GalaxyType, s.MstarDisk, s.MZstarDisk
  SELECT depthFirstId, mainLeaf_depthFirstId 
  FROM MDPL2.Rockstar
  WHERE rockstarId = 12663401752
) AS r
WHERE s.depthFirstId BETWEEN 
    r.depthFirstId AND r.mainLeaf_depthFirstId
  AND s.GalaxyType = 0
ORDER BY s.snapnum

Click on the links below to get more information on the tables types and example queries.

Next Page »

The simulation BigMDPL was recently added and comes now with the full Rockstar table, a halo catalog with merger tree information for 80 snapshots. The Rockstar table has the same format as for the other simulations. If you use it for your scientific work, please cite this dataset as doi:10.17876/cosmosim/bigmdpl/001.

Here is a slice at snapshot number 79 (redshift 0) for z ~ 1786 Mpc/h, where the most massive halo lies. It shows a density-colored plot of dark matter halo positions (scaled with their size and made half transparent) extracted from the Rockstar table.

Since the Rockstar table contains also the links between halos and their progenitors and the depthFirstIds, you can retrieve all progenitors of a dark matter halo with just one SQL query. Here’s an example for the most massive halo at redshift 0 in that simulation:

Get progenitors of a halo

SELECT p.rockstarId, p.snapnum, p.x, p.y, p.z, p.Mvir, p.Rvir 
FROM BigMDPL.Rockstar AS p, 
  (SELECT depthFirstId, lastProg_depthFirstId 
   FROM BigMDPL.Rockstar WHERE rockstarId=14568342049) AS r 
WHERE p.depthFirstId BETWEEN r.depthFirstId AND r.lastProg_depthFirstId 
ORDER BY p.snapnum

This query makes use of the properties of depthFirstIds: all halos with ids between the current halos id and its last progenitor id (lastProg_depthFirstId) must be progenitors.

Here’s a picture of the 3D distribution of these progenitors, colored by snapshot number. See how the halos merged at the final timestep into one big object?

If you want to track back the halo’s main progenitors only (e.g. for checking the accretion of mass, evolution of spin etc.), you can use the mainLeaf_depthFirstId instead.

Get the main branch

SELECT p.* FROM BigMDPL.Rockstar AS p, 
  (SELECT depthFirstId, mainLeaf_depthFirstId 
   FROM BigMDPL.Rockstar WHERE rockstarId=14568342049) AS r
WHERE p.depthFirstId BETWEEN r.depthFirstId AND r.mainLeaf_depthFirstId 
ORDER BY p.snapnum

This uses the mainLeaf_depthFirstId which is the depthFirstId of the last leaf on the main branch, i.e. it’s the earliest found main progenitor (with mmp-flag = 1).

Check out the documentation page on merger trees to learn more about the structure of merger trees in the database.

Next Page »

Jupyter notebooks are a great tool, especially for explaining and testing different things. So I decided it would be great to show how to use the uws-client as a library, as explained in this previous post: uws-client for python scripting, in an interactive python notebook. And here you go: the first CosmoSim UWS notebook is finished now!

You can download the notebook from GitHub:
Setup your own virtual environment using venv, install the requirements and start the notebook server session with

jupyter notebook

Then open cosmosim-uws-intro.ipynb, which gives a basic introduction to job handling using a simple example. Use Shift+Enter to run each command.

Here’s a screenshot of the final figures:

Jupyter notebooks with more elaborate examples may follow in the future.

Next Page »

The SQL query results at CosmoSim return the values with as many digits as are stored in the database – but maybe that’s too much for you.
In that case, you can use the ROUND-function. It is a standard SQL function that rounds to the given number of digits after the comma, i.e. ROUND(x,3) will round a number like 123.7529215654 to 123.753. So if you want to retrieve the rounded positions of 10 FOF clusters from the MDR1.FOF table, then use a query like this:

SELECT ROUND(x,3) as rounded_x, ROUND(y,3) as rounded_y, ROUND(z,3) as rounded_z

Please do not use CONVERT and DECIMAL, e.g. CONVERT(x, DECIMAL(10,3)) for this. Yes, this is actually correct (My)SQL syntax, but the parser used by our parallel query reformulator (PaQu) is not parsing this correctly in the current version (yes, it’s a bug). We may enable it in the future, but for now, please use ROUND() instead.

Next Page »

The uws-client (PyPi-link) provides a Python script and library that can be used to interact with the UWS interface of CosmoSim (and any other UWS-enabled services) from the command line. This is very useful for writing scripts, e.g. if you need to send thousands of jobs for retrieving the data you need for your research.

We had collected some basic shell-scripts (bash) for CosmoSim already, but this time I want to show you how to do it all in Python, using the uws-client as a library. You can find the necessary functions also yourself: they are defined in uws/UWS/ and used in uws/cli/

Install the latest release of the uws-client:

pip install uws-client

Start a new interactive Python (2.7) console:


Now, within the Python environment, you need to import the UWS module:

from uws import UWS

Define your username and password and then use them to create a client-instance:

myusername = 'xxxx'
mypassword = 'xxxx'
url = ''
c = UWS.client.Client(url, myusername, mypassword)

For creating a new job, first define the necessary parameters, i.e. for CosmoSim, set the query string and choose a query queue:

parameters = {'query': 'SELECT x,y,z FROM MDR1.FOF LIMIT 10', 'queue': 'long'}

Now create a job with these parameters:

job = c.new_job(parameters)

The returned job object will contain all the attributes of the job, like job_id, phase and parameters. You can check the parameters like this:

job.parameters[0].id, job.parameters[0].value
job.parameters[1].id, job.parameters[1].value

Currently, the job is still in pending phase, which can be checked using the ‘get_phase’ function:


As long as a job is in this phase, the parameters can still be adjusted: set the new parameters
and use the set_parameters_job function:

parameters = {'query': 'SELECT Mvir FROM MDR1.BDMV LIMIT 10', 'queue': 'long'}
job = c.set_parameters_job(job.job_id, parameters)

Start the job now:

job = c.run_job(job.job_id)

Check the current phase:


This will show ‘QUEUED’, because the job is submitted to the queueing system.
Get the current phase by sending another request to the server:


By the way, there is a print-function defined for the job object, so you can get a nice job detail view using this:

print job

You can check the phase of the job regularly in order to find out when it is completed, or you use the WAIT parameter for your get_job request and provide a wait-time in seconds. Then the request will return only when the specified wait-time is over or the job-phase has changed:

job = c.get_job(job.job_id, wait='30')

If you add a phse as well, then it will only wait, if the job is currently in the specified phase. Please note that only “active” job phases (those where a phase change is still possible, i.e. PENDING, QUEUED or EXECUTING) make sense.

job = c.get_job(job.job_id, wait='30', phase='QUEUED')

When your job is ready, you probably want to download the results.
First check with the job results by printing the job’s possible results with print job or using the job.results-object:


For CosmoSim (and other Daiquiri-instances), there is one result for each format, each with a different id. The first result usually has id ‘csv’, which we will be using in this example.
In order to download this result, do:

url = str(job.results[0].reference)
c.connection.download_file(url, myusername, mypassword, 'results.csv')

You probably wondered already how you can get an overview of all the jobs that you created already. Of course there is a function for getting the complete job list as well!

jobs = c.get_job_list({})

You can get the job_ids from the job list using:

jobid = jobs.job_reference[0].id

This jobid can be used to query further details for the job in an extra request as before:

job = c.get_job(jobid)

You can also use print for the job list to get all jobs with basic information printed on screen:

print jobs

Note that you can also use the new UWS 1.1 job list filtering capabilities here, i.e. you can filter by a time (jobs created AFTER a certain timestamp), choose the last xx jobs or filter by phase:

jobs = c.get_job_list({'after': '2016-11-09'})
jobs = c.get_job_list({'last': '10'})
jobs = c.get_job_list({'phases': ['PENDING', 'ERROR']})

Please try it out and report back any issues that you may encounter. Have fun!

Next Page »

The new version of the IVOA Universal Worker Service (UWS) is now released!
We have been experimenting with the new UWS 1.1 version already in the past to give implementation feedback, i.e. CosmoSim already supports the new features.
Now, finally, UWS 1.1 is the official new version!

By the way, our uws-client (also at PyPi) supports the new features as well, but is still backwards compatible, if you want to use it with UWS 1.0 services.

Next Page »

After submitting a query, you always had to confirm the query plan (and you also had the possibility to edit it directly, before confirming). This functionality is now switched off, because it caused problems with the session management. You can still view the full query plan in the Job overview tab, in the bottom section Query Plan. But you cannot edit it manually before query submission anymore.

If you discover a bug in a query plan, please contact us or submit your issue directly to the PaQu issue list at GitHub.

Next Page »

The heading says it already: don’t use LIMIT rowcount OFFSET offset or LIMIT offset, rowcount (replace offset and rowcount by integers) in your SQL queries on our system. You won’t get the results you expect.

OFFSET is usually used for doing pagination, i.e. for cases when you want to get everything, but piecewise. E.g. you could say: make a query and return the first 10 result rows, then in the next query set offset=10 and retrieve the next 10 result rows and so on:

SELECT mass FROM MDR1.FOF WHERE snapnum=85 ORDER BY mass DESC LIMIT 10,10;
SELECT mass FROM MDR1.FOF WHERE snapnum=85 ORDER BY mass DESC LIMIT 20,10;

This works fine on single node servers, but not on a sharded database systems like the one behind CosmoSim. Imagine you asked for LIMIT 100,10. What would happen? The query would be submitted as is to the shard nodes, which would do the ordering just on the piece of data set which they store locally, skip the first 100 and then return 10 rows back to the head node. In the end we would have 10×10 rows on the head node (a package of 10 rows from each of the 10 shard nodes).

Currently, PaQu applies any LIMIT condition also on the head node table in the final step, so in the end, nothing would be returned to you, which is not what you expected. Even if you got the full 100 rows from the head node, this is not really what you expected. We could only provide the same behaviour as on a single node server, if we copied all the data first tot he head node and then applied the OFFSET – which would involve too much data copying to be useful,

In fact, OFFSET is not as useful as it may seem: in order to apply the OFFSET, every time one would need to calculate the complete result set beforehand anyway, so there is no gain in query execution time.

If you need to do queries with complicated calculations or joins piece by piece (which may occur quite often, especially when avoiding the timeout limit), please use a (unique) identifier from the table for this. For example use fofId for FOF-tables, rockstarId for Rockstar-tables etc. Consult the table documentation to find the right one for you.

Next Page »

Another important change since the previous maintenance is our preliminary update to UWS 1.1, which is needed for scripted access to the databases. Within the IVOA there are ongoing discussions about new features for UWS, which will soon become recommendation. We have implemented them already as test implementation for the new standard, so you can test them yourself! The main features include now job filtering possibilities (by phase, time, last jobs) and a WAIT-option, if you e.g. submit a job and want to wait until it is executed or finished. Our uws-client also supports these features, so you can do now requests like this:

  • List only jobs in PENDING and EXECUTING phase:
    http --auth <username>:<password> --print b GET


    uws --host --user <username> --password <password> list --pending --executing
  • List only last 10 jobs:
    http --auth <username>:<password> --print b GET


    uws --host --user <username> --password <password> list --last 10
  • List jobs started after given date-time:
    http --auth <username>:<password> --print b GET


    uws --host --user <username> --password <password> list --after 2016-01-01

For more examples, please see the README of the uws-client.

Have fun with this new functionality! If you have more suggestions, send them to us. We could propose them for the future evolution of UWS.

Next Page »

The web application behind CosmoSim, Daiquiri, was updated yesterday (some of you may have noticed the downtime).
The main new functionality is job grouping: below your job list you can create and edit groups for your jobs; and use drag and drop to sort jobs into a new group.


Please try it out and give us some feedback!

Next Page »

Proudly powered by Daiquiri
©2016 The CosmoSim databaseImprint and Data Protection Statement