Dear list, playing with buffer contents, I have the following file: \setupinteraction[state=start] \setupinteractionscreen[option={attachment}] \startbuffer[test] just a test and another one \stopbuffer \starttext \ctxlua{require("util-sha")} \def\shabuffer#1% {\cldcontext{utilities.sha2.hash256(buffers.raw("#1"))}} \def\shafile#1% {\cldcontext{utilities.sha2.hash256(io.loaddata("#1"))}} \def\shabufferfile#1% {\cldcontext{utilities.sha2.hash256(buffers.raw("#1"))}} \shabuffer{test} \savebuffer[test][temporary-αβγ, prefix=no] \shafile{temporary-αβγ} \attachment[buffer=test, name=\shabufferfile{test}, method=hidden] \stoptext I mean, to get hash of the file attached to the document, I need to save the buffer for "context(utilities.sha2.hash256(io.loaddata(buffer)))". But I don’t need to save the buffer to attach it to the PDF document. My question is how to define \shabufferfile to avoid \savebuffer (only required to get the hash). An approach would be the following one. If I’m not totally wrong, "savebuffer" (https://github.com/contextgarden/context/blob/main/tex/context/base/mkxl/buf...) may be just replacing new lines with "\n" in the original buffer (https://github.com/contextgarden/context/blob/main/tex/context/base/mkxl/buf...). The function string.replacenewlines() is defined at https://github.com/contextgarden/context/blob/main/tex/context/base/mkiv/uti.... If I’m not totally wrong about savebuffer replacing newlines with "\n", I wonder how to create a temporary buffer with such a replacement, so that it could be hashed later. I hope my question is clear. Many thanks in advance for your help, Pablo
Hi Pablo,
I mean, to get hash of the file attached to the document, I need to save the buffer for "context(utilities.sha2.hash256(io.loaddata(buffer)))".
But I don’t need to save the buffer to attach it to the PDF document.
My question is how to define \shabufferfile to avoid \savebuffer (only required to get the hash).
The SHA calculation isn't working properly because of a weird newline issue. Try this: \setupinteraction[state=start] \setupinteractionscreen[option={attachment}] \startbuffer[test] just a test and another one \stopbuffer \starttext \startluacode require("util-sha") function sha256(str) return utilities.sha2.hash256( str:gsub(string.char(0x0D), string.char(0x0A)) ) end \stopluacode \def\shabuffer#1% {\cldcontext{sha256(buffers.raw("#1"))}} \def\shafile#1% {\cldcontext{sha256(io.loaddata("#1"))}} \shabuffer{test} \savebuffer[test][temporary-αβγ, prefix=no] \shafile{temporary-αβγ} \attachment[buffer=test, name=\shabuffer{test}, method=hidden] \stoptext You can remove the "\savebuffer" and the "\shafile"; I just kept that in to show that the two hashes are now the same. -- Max
On 9/23/22 06:01, Max Chernoff via ntg-context wrote:
[…] The SHA calculation isn't working properly because of a weird newline issue. Try this: […] function sha256(str) return utilities.sha2.hash256( str:gsub(string.char(0x0D), string.char(0x0A)) ) end […]
Hi Max, this works perfectly fine with Linux "str:gsub('\r','\n')", but I can’t make it work in Windows. I always thought that Unix used LF (\n, if I’m not wrong) to mark a new line, and Windows used CRLF (\r\n). How are new lines marked in the buffer? As \r instead of \r\n or \n? At least, Notepad (the minimal plain text editor in Windows) doesn’t recognize newlines if I attach the buffer to the PDF document as a .txt file. Many thanks for your help, Pablo
On 9/23/22 17:06, Pablo Rodriguez via ntg-context wrote:
[…] return utilities.sha2.hash256( str:gsub(string.char(0x0D), string.char(0x0A)) ) […]
On 9/23/22 06:01, Max Chernoff via ntg-context wrote: this works perfectly fine with Linux "str:gsub('\r','\n')", but I can’t make it work in Windows.
Hi again Max, this seems to solve the issue in Windows too: \startbuffer[test] just a test and another one \stopbuffer \starttext \startluacode require("util-sha") function sha256(str) if os.name == "windows" then return utilities.sha2.hash256(str:gsub("\r", "\r\n")) else return utilities.sha2.hash256(str:gsub("\r", "\n")) end end \stopluacode \def\shabuffer#1% {\cldcontext{sha256(buffers.raw("#1"))}} \def\shafile#1% {\cldcontext{utilities.sha2.hash256(io.loaddata("#1"))}} \shabuffer{test} \savebuffer[test][temporary-αβγ, prefix=no] \shafile{temporary-αβγ} \stoptext But now I don’t understand is the following issue: if the saved file contains "\r\n", why does basic Notepad the new lines? "\r\n" are the chars to get new lines in Windows. Or what am I missing here? Many thanks for your help, Pablo
Hi Pablo,
But now I don’t understand is the following issue: if the saved file contains "\r\n", why does basic Notepad the new lines?
"\r\n" are the chars to get new lines in Windows. Or what am I missing here?
I'm not too sure what you're asking here, but Notepad was somewhat- recently updated to handle both CRLF and LF line endings: https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/ But I do agree that the line ending handling seems a little odd. I find it surprising that the buffers internally use CR line endings since no systems in the past 20 years use that. Also, you should probably check to make sure that the results of the file don't depend on the current code page on Windows. Try writing out a buffer from ConTeXt with the following contents: АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя First, run "chcp 65001" before running "context" and record the size of the file written. Then, run "chcp 1251" and run "context" again. Hopefully the file size doesn't change; but if it does, then that means that the binary content of any file written will depend on the system's default code page, which would complicate making reproducible hashes. -- Max
On 9/26/2022 2:05 AM, Max Chernoff via ntg-context wrote:
Hi Pablo,
But now I don’t understand is the following issue: if the saved file contains "\r\n", why does basic Notepad the new lines?
"\r\n" are the chars to get new lines in Windows. Or what am I missing here?
I'm not too sure what you're asking here, but Notepad was somewhat- recently updated to handle both CRLF and LF line endings:
https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
But I do agree that the line ending handling seems a little odd. I find it surprising that the buffers internally use CR line endings since no systems in the past 20 years use that.
how about tex ... \number\endlinechar \number\numexpr`M-`A+1\relax % plain sets up `^^M ... you don't want to know how much hassle dealing with line endings in tex is
Also, you should probably check to make sure that the results of the file don't depend on the current code page on Windows. Try writing out a buffer from ConTeXt with the following contents:
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
First, run "chcp 65001" before running "context" and record the size of the file written. Then, run "chcp 1251" and run "context" again. Hopefully the file size doesn't change; but if it does, then that means that the binary content of any file written will depend on the system's default code page, which would complicate making reproducible hashes. if that were the case nothing would work .. so it's bytes in - bytes out
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hi Hans, Pablo,
But I do agree that the line ending handling seems a little odd. I find it surprising that the buffers internally use CR line endings since no systems in the past 20 years use that.
how about tex ...
\number\endlinechar \number\numexpr`M-`A+1\relax % plain sets up `^^M
Argh, how could I have forgotten about that. Yes, that makes complete sense.
First, run "chcp 65001" before running "context" and record the size of the file written. Then, run "chcp 1251" and run "context" again. Hopefully the file size doesn't change; but if it does, then that means that the binary content of any file written will depend on the system's default code page, which would complicate making reproducible hashes.
if that were the case nothing would work .. so it's bytes in - bytes out
Ok good, that's what I was expecting. I've unfortunately used some programs that even fairly recently depended on the system code page, so I'm always a little cautious.
Hi Max,
I realized later that I was doing something wrong. My fault here.
Glad that you've figured it out.
I thought that ConTeXt would output the same character encoding as in the source file when saving a buffer.
Yes, Hans confirmed that that is correct. Thanks, -- Max
On 9/26/22 02:05, Max Chernoff via ntg-context wrote:
Hi Pablo,
But now I don’t understand is the following issue: if the saved file contains "\r\n", why does basic Notepad the new lines?
"\r\n" are the chars to get new lines in Windows. Or what am I missing here?
I'm not too sure what you're asking here, but Notepad was somewhat- recently updated to handle both CRLF and LF line endings:
https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
Hi Max, I realized later that I was doing something wrong. My fault here.
[...] Also, you should probably check to make sure that the results of the file don't depend on the current code page on Windows. Try writing out a buffer from ConTeXt with the following contents:
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
First, run "chcp 65001" before running "context" and record the size of the file written. Then, run "chcp 1251" and run "context" again. Hopefully the file size doesn't change; but if it does, then that means that the binary content of any file written will depend on the system's default code page, which would complicate making reproducible hashes.
For more than two decades, all my TeX sources are written in UTF-8. I thought that ConTeXt would output the same character encoding as in the source file when saving a buffer. I haven’t found this issue and I’d say that all my saved buffers are UTF-8 encoded. Many thanks for your help, Pablo
On 9/26/22 02:05, Max Chernoff via ntg-context wrote:
Hi Pablo,
But now I don’t understand is the following issue: if the saved file contains "\r\n", why does basic Notepad the new lines?
"\r\n" are the chars to get new lines in Windows. Or what am I missing here?
I'm not too sure what you're asking here, but Notepad was somewhat- recently updated to handle both CRLF and LF line endings:
https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
Hi Max,
I realized later that I was doing something wrong. My fault here.
[...] Also, you should probably check to make sure that the results of the file don't depend on the current code page on Windows. Try writing out a buffer from ConTeXt with the following contents:
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
First, run "chcp 65001" before running "context" and record the size of the file written. Then, run "chcp 1251" and run "context" again. Hopefully the file size doesn't change; but if it does, then that means that the binary content of any file written will depend on the system's default code page, which would complicate making reproducible hashes.
For more than two decades, all my TeX sources are written in UTF-8.
I thought that ConTeXt would output the same character encoding as in the source file when saving a buffer.
I haven’t found this issue and I’d say that all my saved buffers are UTF-8 encoded.
On 9/26/2022 7:24 PM, Pablo Rodriguez via ntg-context wrote: the magic is in savedata(name,replacenewlines(content),"\n",option == v_append) because tex reads in and then lost what it saw (cr lf crlf) we use the line endings of the operating system (good old typewriters and windows use cr+lf and old macs uses cr while linux uses lf) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Max Chernoff
-
Pablo Rodriguez