datafile¶
Reads a file containing columns of data and creates a dictionary according to header information using the following format:
comments start with
#
in the first column :# My Comment
header information starts with
#!
, where#
is also in the first column. It must precede the data.each columns is represented by
name[dtype, col.nr.]/
, dtype (optional) is the data type and col.nr (starting at 0) is the column numberdata are separated by white space NOT commas!
data types:
s : string
f : float
i : integer
blank lines are ignored
NOTE: if data formats are entered they should be specified for all columns
Example data:
#! p_miss[0]/ siglt[2]/ s01[3]/ alt[4]/
200. 1.35e-4 -1.e-3 0.1
220. 2.56e-4 -2.e-4 -0.1
230. 3.47e-6 -3.e-5 1.1
The header can also contain data type information:
#! p_miss[f,0]/ siglt[f,2]/ s01[f,3]/ alt[f,4]/
200. 1.35e-4 -1.e-3 0.1
220. 2.56e-4 -2.e-4 -0.1
230. 3.47e-6 -3.e-5 1.1
Example that opens a file and create a dfile object:
>>> f0 = dfile('sig_LT.dat')
Loop over content:
>>> for l in f0:
>>> ...pm = l['p_miss']
>>> ...sig_lt = l['siglt']*10000.
>>> ...print pm, sig_lt, l['alt']
>>> # end for l
Variables can also be accessed directly as numpy arrays (if installed):
>>> pm = f0['p_miss']
>>> sig_lt = f0['siglt']*10000.
They can also be converted to attributes:
>>> f0.make_attr()
>>> print f0.p_miss
>>> print f0.sig_lt
Variable names are also called keys.
Save the data file as a csv file, this is useful for exporting and formatting for documents (other than latex)
>>> f0.write_csv(filename)
- class LT.datafile.dfile(filename, debug=False, new=False, skip=True, use_numpy=True, fast=False, adata=None)¶
Open a file, read and interpret the contents and return a dfile object:
>>> df = dfile('my_datafile')
keywords:
debug = False (default) : print additional iformation
- fast = False (default)load data in a fast way (using np.loadtxt) this has
some limits for string entries. Best used for large purely numerical data.
- use_numpy = True (if numpy is installed)set automatically but can be overridden
some attributes of datafile are not availables when numpy is not present
- adata = my_lines (default = None)use the data provided in a list of strings (my_lines) as data file content.
The content of my_lines must follow the datafile syntax.
- add_data(keys, line)¶
add data to the data file
d.add_data(‘x:y:z’,’xval yval zval’)
xval, yval, zval are the values as they would be entered in a data file line
both arguments are strings and describe a complete line i.e. values for all variables defined in the header line.
- add_header_comment(text)¶
add a comment line to the header. The # at the beginning is automatically added !
for a normal comment start with a space
to add a parameter start the comment with a
- add_key(key, format='f')¶
add a new key, this is useful if you want to add new data assign new values using a loop like:
>>> df.add_key('newkey',format='i')
The new data need to be stored as follows:
>>> for d in df.data: >>> d['newkey']= some_new_value
where df is the datafile and ‘newkey’ the new key. The new data set can be saved using
fp = open('new_file','w')
anddf.write_all(fp)
- add_parameter(name, value)¶
add a parameter to the comment section
- check_data(func, data, key, *args)¶
function used to check if data fulfill a condition provided by the user. The function is assumed to return True of False.
- delete_key(key)¶
remove a key and all values associated with it
- eval_data(eval_str)¶
iterator over all data evaluating an expression contained in eval_str
- get_all_data()¶
return a list of all data stored
- get_data(key, sel_func=None, sel_args=None)¶
return all data for key subject to the results of a selector function.
one can define a selector function (sel_func) using the arguments stored in sel_args. This function is evaluated for each data record and only those data are returned for which the condition is fulfilled.
sel_func: a user provided function returning True or False
sel_args: a list of arguments used in the sel_func function
- Example::
assume the data contain a variable (key) called ‘name’ you want to select only those data where name contains a certain substring ‘Jo’
Define this function:
>>> def myfind(data, key , what): >>> ...where = data[key] >>> ...return (str.find(where, what) >= 0)
now you can select the data using:
>>> df.get_data('name',myfind, ['name','Jo'] )
This should return a list of names containing the substring ‘Jo’
- get_data_eval(key, eval_str)¶
return all data for the key under the condition that the expression in eval_str is True.
- get_data_list(keylist, sel_func=None, sel_args=None)¶
return all the data corresponding to the key list as follows:
>>> a = df.get_data_list('key1:key2:key3') >>> a = df.get_data_list('key1:key2:key3', myfind, ['name','J'])
return those data where the name values contain the character J
a contains the list of data
- get_data_list_eval(keylist, eval_str)¶
similar function as select_data_eval but it only returns the values for the keys defined in keylist.
- get_full_header()¶
return all line up to the header line
- get_header()¶
return the header lines
- get_keys()¶
return a list of keys
- name()¶
print the filename associate with this instance
- save(file=None)¶
Save the current datafile.
With the keyword: file = ‘new_file’
the datafile will be written into the new_file name
- scale(key, factor)¶
multiply all values of key with a factor
- select_data(sel_func=None, sel_args=None)¶
returns an iterator for the data. As in get_data() a selector function and its arguments can be supplied:
conditions can be applied:
sel_func: a user provided function returning True or False
sel_args: a list of arguments used in the sel_func function
The iterator returned can be used as follows:
>>> for d in df.select_data(): >>> ...print d
or the result can be converted to a list:
>>> list(df.select_data())
Using a selector function:
- Example: assume the data contain a variable (key) called ‘name’
you want to select only those data where name contains a certain substring ‘sub’
Define this function:
>>> def myfind(data, key , what): >>> ...where = data[key] >>> ...return (str.find(where, what) >= 0)
now you can select the data using:
>>> list( df.select_data( myfind, 'name', 'sub') )
- select_data_eval(eval_str)¶
returns an iterator for data selected with an eval expression stored in the string eval_str each dataset item is accessed using the name data:
>>> df.select_data_eval("data['x'] >= 0.")
returns only those data items where the value of the x-column in the file is larger than or equal to 0.
- show_all_data()¶
print all data and keys stored
- show_data(keylist)¶
print all the data corresponding to the key list:
>>> df.show_data('key1:key2:key3')
- show_keys()¶
print a list of variable names in the dictionary
- sort(key, **kwargs)¶
sort the data according to the values in key
- update_header()¶
update header line of this data file including format. This makes sure that the header line is in sync with the dictionary keys
- write_all(fp, complete_header=False)¶
write all data to a file associated to fp
if complete_header = True include the complete header including all comments
- write_complete_header(fp)¶
write entire header including comments and internal parameters to file with handle fp. Example:
fp = open(‘mydile.data’,’w’)
datafile.write_comlete_header(fp)
- write_csv(f)¶
save the current file as a csv file
f : file name to be used
- write_header(fp)¶
write only header line of this data file into file fp, including format
- write_line(fp, i)¶
write data line i into file fp
- write_selected(fp, index_list, complete_header=False)¶
write a new datafile with an identical header but enter only those data with an index given in index_list. If the index does not exist print a message and skip it.
if complete_header = True include the complete header including all comments