Managing a large output file by creating multiple little output files

Blue Bar separator

I originally wrote this macro to manage the large output file that can be generated by my pm macro. The problem with that macro is that the longer the macro runs the larger the output file. At some point it gets to large to easily work with and of course you cannot run the macro indefinately because you can reach the maximum file size or exceed disk space.

The cycle_output_files command macro resolves this problem by running the pm macro with a speficied process_name and output file. The output file is suffixed with a number from 0 thru N-1 where N is one of the arguments to cycle_output_files. It monitors the size of the output file and when it hits the specified limit (another argument) it kills the process and starts a new one using the next output file name. With the proper file size and number of files you can save the last hour, day or weeks worth of data.

You can use cycle_output_files to run a specific command (example 1), a command macro that does NOT call start_process (example 2), or a command macro that DOES call start_process (example 3). Macros that call start_process must accept the arguments -output_path and -process_name and use those arguments in the call to start_process. You can also start cycle_output_files as a started process (example 4) so it is not tied to a terminal window or a login session.

Display Form

------------------------------ cycle_output_files ---------------------------- 
 process_name:                                               
 output_path:                                                             
 max_size:                    
 max_num:                   
 check_seconds: 60
 command:               

Command-Line Form

cycle_output_files process_name output_path max_size max_num [check_seconds]   
                   [command]

Arguments

process_name string
This is the base name of the process that will be run. The name will include a suffix between 0 and max_num - 1 (see below).

output_path string
This is the base name that the output_file will have. The actual output file names will be OUTPUT_PATH.SUFFIX.out, where suffix is a number between 0 and max_num - 1 (see below). If you specify a name containg the string ".out" OUTPUT_PATH will consist of all characters before the ".out" string.

max_size number
This is the maximum size in blocks of the output_files. Note that this size is ONLY APPROXIMATE. The file size is checked once every check_seconds seconds (see below). It is possible for a file to be smaller than max_size at one check and grow larger than max_size before the next check. How much larger it grows will depend on the speed that it grows and the size of check_seconds.

max_num number
This is the maximum number of files to create. Files will have the name output_path.N.out where N ranges from 0 to max_num -1. You cannot assume that output_file.4.out was written after output_file.1.out These file names constitute a circular buffer. The only way to tell the proper order of the files is to look at the date_time modified values.

check_seconds number
The number of seconds between checks of the file size.

command string
This is the command to execute.

Examples

Example 1

In this first example I am directly executing an analyze_system request with its own built in repeat interval. Because of that I need to include the start_process as part of my command and because it is an analyze_system request I need to use the -privileged argument. Creative use of the string and quote command functions are needed to pass the arguments through all the layers of command processing.

I ran the cycle_output_files from a terminal window. The bold line is what I typed. The next line is cycle_output_files' interpretation of the arguments, There is also 1 line of stop_process output every time the process is stopped. If you are going to run cycle_output_files this way I recommend that you set the pause_lines of your terminal to 0. Note that breaking out of the command will NOT stop the stcp_meters_test process so that is my next step. If I had not stopped the stcp_meters_test process it would continue to run and the output file would continue to grow. Note that I suffix the stcp_meters_test with an *. In this case I can look at the name of the last process stopped by the cycle_output_files macro and manually stop the next process, but I think that using a * suffix is easier.

The last part of this example shows the stcp_meters output files. You will note that the files are larger than then 5 blocks I requested. I would have to reduce the value of check_seconds to get these files closer to the 5 block limit. Note also that the oldest file is stcp_meters.2.out and the newest is stcp_meters.1.out. The suffix number is NOT and indication of the outputs order in time.

cycle_output_files stcp_meters_test stcp_meters 5 3 60 start_process (quote anal
+yze_system -request_line (string (quote stcp_meters -all -long -interval 5))) -
+privileged
cycle_output_files stcp_meters_test stcp_meters 5 3 60 start_process analyze_sys
+tem -request_line 'stcp_meters -all -long -interval 5' -privileged
 Stopping Noah_Davids.CAC (stcp_meters_test0).
 Stopping Noah_Davids.CAC (stcp_meters_test1).
 Stopping Noah_Davids.CAC (stcp_meters_test2).
 Stopping Noah_Davids.CAC (stcp_meters_test0).
 Stopping Noah_Davids.CAC (stcp_meters_test1).
 Stopping Noah_Davids.CAC (stcp_meters_test2).
 Stopping Noah_Davids.CAC (stcp_meters_test0).
BREAK
Request?  (stop, continue, debug, keep, login, re-enter) s
ready  09:26:00
stop_process stcp_meters_test*
Verify processes to be stopped.
  Noah_Davids.CAC (stcp_meters_test1)?  (yes, no, info) yes
 Stopping Noah_Davids.CAC (stcp_meters_test1).
ready  09:26:25

list -sort date_modified stcp_meters*

Files: 3, Blocks: 30

w          8 08-04-11 09:26:25  stcp_meters.1.out
w         11 08-04-11 09:25:43  stcp_meters.0.out
w         11 08-04-11 09:24:43  stcp_meters.2.out

ready  09:26:28

Example 2

In this example I am executing a command macro that does its own looping. I still need to include the "start_process" as part of the command given to cycle_output_files, but it is a much simpler command line. I also still need the -privileged since it is still calling analyze_system.

I've showed the command macro at the end of the output. It is the kind of macro that I expect will be used most of the time. It executes some command (or commands), sleeps for some amount of time and then loops.

cycle_output_files stcp_meters_test stcp_meters 5 3 60 start_process foo -privil
+eged
cycle_output_files stcp_meters_test stcp_meters 5 3 60 start_process foo -privil
+eged
 Stopping Noah_Davids.CAC (stcp_meters_test0).
 Stopping Noah_Davids.CAC (stcp_meters_test1).
 Stopping Noah_Davids.CAC (stcp_meters_test2).
 Stopping Noah_Davids.CAC (stcp_meters_test0).
 Stopping Noah_Davids.CAC (stcp_meters_test1).
BREAK
Request?  (stop, continue, debug, keep, login, re-enter) s
ready  11:00:29
stop_process stcp_meters*
Verify processes to be stopped.
  Noah_Davids.CAC (stcp_meters_test2)?  (yes, no, info) yes
 Stopping Noah_Davids.CAC (stcp_meters_test2).
ready  11:00:38

d foo.cm

%phx_vos#m15_mas>SysAdmin>Noah_Davids>foo.cm  08-04-11 11:07:35 mst

&attach_input
analyze_system
&label again
stcp_meters -all -long
sleep -minutes 1
&goto again

Example 3

In this example The command that I give to cycle_output_files is a macro that itself calls start_process. The key here is that the macro being called must take the -output_path and -process_names arguments because cycle_output_files appends those two arguments with appropriate values to the command provided as an argument to execute. In addition the macro must pass those arguments and values to the start_process comand that it calls, that way cycle_output_files can monitor the output file and kill the process.

I've showed both the command macro that cycle_output_files calls and the command macro that that macro calls. Note that I use a strings command function in the start_process command in the first macro to call the second.

cycle_output_files dump_genet dump_genet 10 3 30 foo #enet.m15.12.2
cycle_output_files dump_genet dump_genet 10 3 30 foo #enet.m15.12.2
 Stopping Noah_Davids.CAC (dump_genet0).
 Stopping Noah_Davids.CAC (dump_genet1).
 Stopping Noah_Davids.CAC (dump_genet2).
 Stopping Noah_Davids.CAC (dump_genet0).
BREAK
Request?  (stop, continue, debug, keep, login, re-enter) s
ready  20:30:31
stop_process dump_genet*
Verify processes to be stopped.
  Noah_Davids.CAC (dump_genet1)?  (yes, no, info) y
 Stopping Noah_Davids.CAC (dump_genet1).
ready  20:30:41


d foo.cm

%phx_vos#m15_mas>SysAdmin>Noah_Davids>foo.cm  08-04-12 20:08:52 mst

&begin_parameters
OUTPUT_PATH option(-output_path),string
PROCESS_NAME option(-process_name),string
ARGS      options:unclaimed
&end_parameters

start_process (string bar &ARGS&) -process_name &PROCESS_NAME& -output_path &OUT
+PUT_PATH& -privileged

ready  20:08:52
d bar.cm

%phx_vos#m15_mas>SysAdmin>Noah_Davids>bar.cm  08-04-12 20:08:55 mst

&begin_parameters
ENET enet:string
&end_parameters

&attach_input
analyze_system
&label again
..display_line *** (date).(time) ***
dump_genet &ENET&
sleep -minutes 1
&goto again

Example 4

This final example shows an execution of the pm macro, which is what I wrote cycle_output_files for originally. I also start cycle_output_files as a started process so I do not tie up a terminal window or require that I remain logged into to run the process. Packet_monitor also requires a privileged process so my process must be started with the -privileged control argument.

Note that I use the (date) command function as part of the name of the output file. This function is executed at the command line, what gets passed to the cycle_output_files command is a name with a fixed date. Regardless of how many days cycle_output_file runs the output files will always have as part of their name the date that cycle_output_files was started - NOT the current date. You can see this by noting that the date time modified of all the files is 08-04-14 while the file names are pm.08-04-13.*.

To stop things I need to stop both the cycle_output_files process and the pm* process that it starts. It is always safest to stop the cycle_output_files process first. If you stop the started process first there is some chance that you stop it right before cycle_output_files was going to kill it and start it up again and that it actually does start it up again before you can kill the cycle_output_files process.

start_process 'cycle_output_files pm pm.(date) 500 10 60 (string pm * -no_arp -h
+ost 172.16.1.116)' -privileged
ready  20:40:39

stop_process cycle_output_files
Verify processes to be stopped.
  Noah_Davids.CAC (cycle_output_files)?  (yes, no, info) yes
 Stopping Noah_Davids.CAC (cycle_output_files).
ready  20:48:52

stop_process pm*
Verify processes to be stopped.
  Noah_Davids.CAC (pm2)?  (yes, no, info) yes
 Stopping Noah_Davids.CAC (pm2).
 
list -sort date_modified pm.08-04-13*

Files: 10, Blocks: 4882

w         42 08-04-14 10:53:23  pm.08-04-13.4.out
w        535 08-04-14 10:52:55  pm.08-04-13.3.out
w        539 08-04-14 10:46:50  pm.08-04-13.2.out
w        539 08-04-14 10:40:45  pm.08-04-13.1.out
w        539 08-04-14 10:34:39  pm.08-04-13.0.out
w        539 08-04-14 10:28:34  pm.08-04-13.9.out
w        537 08-04-14 10:22:29  pm.08-04-13.8.out
w        537 08-04-14 10:16:23  pm.08-04-13.7.out
w        537 08-04-14 10:10:18  pm.08-04-13.6.out
w        538 08-04-14 10:04:12  pm.08-04-13.5.out

ready  10:53:23

cycle_output_files.cm

                                                                  
& cycle_output_files.cm starts here                                             
&
& cycle_output_files.cm
& version 1.0 08-04-10
& version 1.1 10-11-26 Added disclaimer
&
& Noah.Davids@stratus.com
&
& The latest version of this macro and documentation can be found at
&    http://noahdavids.org/self_published/cycle_output_files.html
&
& This software is provided on an "AS IS" basis, WITHOUT ANY WARRANTY OR ANY
& SUPPORT OF ANY KIND. The AUTHOR SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES
& OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE.  This disclaimer
& applies, despite any verbal representations of any kind provided by the
& author or anyone else.

&begin_parameters
PROCESS_NAME process_name:string,req
OUTPUT_PATH output_path:string,req
MAX_SIZE    max_size:number,req
MAX_NUM     max_num:number,req
CHECK_SECONDS  check_seconds:number=60
COMMAND    command:unclaimed
&end_parameters
&
&
&set OUT (index &OUTPUT_PATH& '.out')
&if &OUT& > 0
&then &set_string OUTPUT_PATH (substr &OUTPUT_PATH& 1 (calc &OUT& - 1))
&
display_line cycle_output_files &PROCESS_NAME& &OUTPUT_PATH& &+
          &MAX_SIZE& &MAX_NUM& &CHECK_SECONDS& &COMMAND&
&
&set SUFFIX -1
&label again
&set SUFFIX (mod (calc &SUFFIX& + 1) &MAX_NUM&)
&if (exists &OUTPUT_PATH&.&SUFFIX&.out)
&then delete_file &OUTPUT_PATH&.&SUFFIX&.out
&COMMAND& -output_path &OUTPUT_PATH&.&SUFFIX&.out &+
               -process_name &PROCESS_NAME&&SUFFIX&
&
&label check
&if (exists &OUTPUT_PATH&.&SUFFIX&.out)
&then &set SIZE (file_info &OUTPUT_PATH&.&SUFFIX&.out blocks_used)
&else &set SIZE 0
&if &SIZE& > &MAX_SIZE&
&then &do
   stop_process &PROCESS_NAME&&SUFFIX& -no_ask
   &goto again
&end     
sleep -seconds &CHECK_SECONDS&
&goto check
&
& cycle_output_files ends here


Blue Bar separator
This page was last modified on 10-11-26
mailbox Send comments and suggestions
to ndav1@cox.net