Up until now, whenever we’ve read from or written to a file, we’ve just put an upper bound on the number of bytes we were reading or writing. For example in our original simple echo program we used a buffer of 500 bytes, and we put the value 500 into the rdx
buffer when making the read and write system calls. If we carry on this way, we’ll always have to put a maximum size on input and output. Let’s learn how to do this properly!
We are going to write another simple echo program. However, this time, we’ll use a loop to read and write the input. We’ll also use a register to store a memory address like a pointer.
Ok, let’s look at some code:
.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10
.section .data
.section .bss
.lcomm buffer_data, BUFFER_SIZE
.section .text
.globl _start
_start:
read_from_buffer:
movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall
movq %rax, %rbx
movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall
decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit
jmp read_from_buffer
exit:
movq $60, %rax
movq $0, %rdi
syscall
The first two lines of this program introduce some new syntax. The equ
keyword allows us to define a constant that will be substituted by the assembler. This is just like the #define
pre-processor directive in C or C++. Here we define two constants:
.equ BUFFER_SIZE, 2o
.equ NEW_LINE, 10
The first, BUFFER_SIZE
, is the size of the buffer we will be using, in this case 20 bytes. The second, NEW_LINE
is just the ascii character code for a newline. Defining constants like this makes our code more readable and maintainable. Next, in the bss
section we define a buffer named buffer_data
of length buffer_size
.
Now we have the meat of our program: a loop that starts with the label read_from_buffer
. Inside this loop we have a read system call, a write system call, and a conditional jump.
The read system call:
movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall
reads BUFFER_SIZE
worth of data from stdin
to our buffer buffer_data
. When control returns from the read system call the kernel will leave a return value in the register rax
. This value will either be the number of bytes that the kernel read or a negative number indicating an error. For now, we ignore the error case. So, we move the value in rax
into rbx
to save it. Then we perform a write system call:
movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall
The only new point here is that we move the value in rbx
into rdx
. This means we only ask the kernel to write the number of bytes that were actually read. Now, we do a conditional jump:
decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit
We are checking to see if the final character we have read from stdin
is a newline. The register rbx
, contains the number of bytes we have read. So, to get the index of the last byte that we read, we decrement it. Then we use index addressing mode, buffer_data(,%rbx,1)
, to get the value of the last byte that we have read. This tells the cpu to read the value in rbx
and count that many bytes past the start of buffer_data
and load the value it finds. We compare this value with the ascii value for a newline. If the final character was a newline, we jump to the usual exit with code 0. Otherwise, the next instruction is the unconditional jump jmp read_from_buffer
which brings us back to the start of the loop.
When we assemble, link and run this code, once it hits the first input system call, the shell will prompt the user for input on the command line. Suppose the user enters some text and hits enter. The kernel stores this text in the stdin
file. In our system call we only asked for 20 bytes, so the kernel copies (at most) 20 bytes into our buffer, and discards them from stdin
. The rest of the text that was input persists in stdin
. Once we’ve written these bytes to stdout
we can go back and read the next chunk from stdin
. However, the user only gets prompted once, even though our code reads from the stdin
file multiple times.
So now we know how to read and write input the proper way!