a wrapper for using pdftex as a library call
Hi, I am working on a wrapper in C so that application can call pdftex via a library call. Can you please have a look at the proposed API and comment on it if you find a potential problem? I attached two files: test0.c: a minimal test file test1.c: the same file with comments Thanks, Thanh
On 1/31/07, Thanh Han The
Hi,
I am working on a wrapper in C so that application can call pdftex via a library call. Can you please have a look at the proposed API and comment on it if you find a potential problem? I attached two files:
test0.c: a minimal test file test1.c: the same file with comments
Thanks, Thanh
the attachments seem to be discarded by the listserv, so I resend the files as inline text: test0.c: ========================== #include "runptex.h" int main() { pdftex_data_struct tmp; if (init_pdftex_data(&tmp, "/home/thanh/runptex/good-file.tex", "/home/thanh/tmp/runptex", "-fmt=pdflatex" ) != 0) pds_print_error_and_exit(&tmp); if (run_pdftex(&tmp) != 0) pds_print_error_and_exit(&tmp); printf("running pdftex succeeded, the output is %s in directory %s\n", tmp.pdf_file, tmp.working_dir); destroy_pdftex_data(&tmp); return 0; } ========================== test1.c: ========================== #include "runptex.h" int main() { pdftex_data_struct tmp; /* pdftex_data_struct is defined in runptex.h as follows: typedef struct { char *tex_file; char *pdf_file; char *log_file; char *working_dir; char *pdftex_opts; int return_code; char *return_msg; } pdftex_data_struct; */ if (init_pdftex_data(&tmp, "/home/thanh/runptex/good-file.tex", "/home/thanh/tmp/runptex", "-fmt=pdflatex") != 0) /* Prototype: int init_pdftex_data(pdftex_data_struct *pds, const char *tex_file, const char *working_dir, const char *pdftex_opts); Description: initialize pds as follows: - sets all fields of pds to NULL/0 - checks that tex_file and working_dir are: - not null - not too long (< MAX_FILENAME_LENGTH) - contains only allowed characters (defined by is_path_char()) - absolute (full) path - checks that tex filename (the last component of tex_file): - contains only allowed characters (defined by is_filename_char()) - has extension '.tex' - checks that pdftex_opts contains only allowed characters (defined by is_option_char()) - checks that tex_file can be read - checks that working_dir is a directory - store relevant strings (paths, pdf/log filenames, etc.) in pds Return: 0 if ok, otherwise pds->return_code (>0 in case of error) */ pds_print_error_and_exit(&tmp); /* Prototype: void pds_print_error_and_exit(pdftex_data_struct *pds); Description: a helper function for testing purpose; just print the error message stored in pds and exit the program */ if (run_pdftex(&tmp) != 0) /* prototype: int run_pdftex(pdftex_data_struct *pds); description: execute pdftex as follows: - construct the argument to call pdftex from pds->tex_file and pds->pdftex_opts - change working dir to pds->working_dir - try to create an empty pdf file and an empty log file to verify file permissions - copy pds->tex_file to the working directory - run system() to excecute the pdftex command constructed above - checks whether the log file has been created - checks whether the log file contains any error, ie line(s) beginning with '!' - checks whether the pdf file has been created - check for the pdf header mark and eof mark */ pds_print_error_and_exit(&tmp); printf("running pdftex succeeded, the output is %s in directory %s\n", tmp.pdf_file, tmp.working_dir); destroy_pdftex_data(&tmp); /* Prototype: void destroy_pdftex_data(pdftex_data_struct *pds); Description: free the strings stored in pds */ return 0; } ==========================
Thanh Han The wrote:
I am working on a wrapper in C so that application can call pdftex via a library call. Can you please have a look at the proposed API and comment on it if you find a potential problem?
Trying to do something like this cross-platform is very difficult because not all systems have the necessary underlying functionatlity to do it right. For instance, the init_pdftex_data interface in your proposal has a working directory parameter as a string. That's a security problem. Since it's not guaranteed that there is a reference (open file descriptor etc) in the directory somebody might change the directory (rename an existing one) and the TeX run overwrites other files. Or, more likely, a part of the path name is changed (symlink attack). The only way to guarantee that directory the caller intends to use is indeed used is by passing in a file descriptor. In the POSIX world this is no problem. The file descriptor is inherited through a fork() call and before the exec() call to pdftex you call fchdir(fd). He is where you'll find problems since not all systems can implement this. I assume your 'run_pdftex' interface is synchronous. IMO It would be at least required to have an asynchronous version as well. I.e., a version where you initiate the start and then later independently query and if necessary wait for the result. The reason is obvious: the program can do work on its own while TeX is running. Parallelism is extremely important going forward. And an implementation detail: _never_ expose data structures unless it is really, *REALLY* needed. I'm talking here about the pdftex_data_struct, of course. Direct access to any of its members in the user code is in no way performance critical. The initiated TeX runs are quite expensive in terms of execution time so that any memory allocation performed is completely negligible. So I propose to make the structure completely opaque. I.e., in the public header only have typedef struct pdftex_data_struct pdftex_data_t; (I renamed the struct as well, _t is often used to indicate type names). Then change the init_pdftex_data() function to take a pdftex_data_t** parameter. The function will itself allocate the memory for the structure. If allocation fails the pointer variable pointed to by the parameter is set to NULL. Otherwise to the newly allocated memory. Error handling when returning from init_pdftex_data() has to handle this case (BTW: why not return an error code and not just success/failure information from the functions, then you don't have to pass a pointer to the tmp variable to pds_print_error). Anyway, if you make this change the information about the struct is completely encapsulated in your code. This is important for maintainability since it gives you the opportunity to change the implementation as much as you want as long as the function interfaces remain the same. About pds_print_error_and_exit: such an interface is usually not useful except in tiny little programs. Assume you write a graphical shell for TeX. You don't want to terminate the program after a failed run, the user should be able to fix problems and rerun. What is needed, though, is the ability to show an error string. So, what maybe is needed is to have a function which returns an error string which can be printed in the appropriate way (on terminal, in dialog box, whatever). About the interface naming: C's flat namespace is crowded. To minimize the risk of conflicts you should standardize on a common prefix for all function and type names and stick with it. E.g., pdftex_data_struct -> pdftexlib_data_struct init_pdftex_data -> pdftexlib_data_init pds_print_error_and_exit -> pdftexlib_error run_pdftex -> pdftexlib_run you get the idea. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
participants (2)
-
Thanh Han The
-
Ulrich Drepper