R E A D. Read the whole page. Read it! READ IT! Don’t even open your code editor until you finish reading! Seriously! And read the grading rubric!

Now you know enough to make your first real, useful program: a command-line PNG file tool.

PNG is a very common image file format these days. PNG files are split into several chunks. You will make a tool that:


The PNG file format

Many binary file formats, including PNG, are chunked. The file is split into several pieces, and each piece has an identifier and a length.

A very important point:

PNG is a BIG-ENDIAN file format. Thoth, and most computers today, is a little-endian machine. This means that any integers bigger than a byte that you read in must be byte-swapped. This is simple to do and is explained in the project details.

The overall format

Any PNG file will look like this:

from beginning to end: first is the file signature. then the IHDR chunk. then any number of other chunks. finally, the IEND chunk.

The file signature is a sequence of 8 bytes that uniquely identifies this as a PNG file. It is the following array of bytes (NOT a zero-terminated string!):

137, 'P', 'N', 'G', '\r', '\n', 26, '\n'

After the file signature comes a sequence of chunks. The IHDR chunk is always the first, and the IEND chunk is always the last.

Chunk format

Every chunk (including IHDR and IEND) has this same layout:

the first 4 bytes are the length, as a BIG ENDIAN integer. the next 4 bytes are the identifier. then comes the data, which is as many bytes as the length field said. finally, the last 4 bytes are a CRC.

The first 4 bytes are the length, as an unsigned BIG-ENDIAN!!!!!!!! integer.

The next 4 bytes are the identifier; this is a sequence of 4 ASCII characters, but not zero-terminated. This is something like IHDR.

Then there are length bytes of data. length can be 0, in which case there are… no data bytes!

Finally, the last 4 bytes are the CRC. We’ll ignore this, but you’ll have to skip over it when you read the file. (The CRC is used to detect file corruption.)

Here is an example chunk, showing the bytes in the order they appear in the file in hexadecimal:

the first 4 bytes are 00, 00, 00, 0D. this is a big-endian integer 13. the next 4 bytes are 49, 48, 44, 52. this is 'IHDR' in ASCII. then there are 13 bytes of data. finally, the last 4 bytes are 93, E1, C8, 29. we don't care about these.


Your program and the starting point

Right click this link and download to get the starting code. All I’ve done is implement the boring argument parsing and given you some helpful utility functions.

Also here are some test PNG files:

Read the comments on the little tiny functions at the top of the file! They’re useful!

Compile like so:

gcc -Wall -Werror --std=c99 -g -o readpng readpng.c

You will be able to run it in four different ways:

./readpng
./readpng somefile.png
./readpng somefile.png dump
./readpng somefile.png text

If you try those right now, they’ll say they’re unimplemented and exit. You have to implement the command functions: show_info, dump_chunks, and show_text.


1. Open the file and check the file signature!

From now on, whenever I tell you to do something, make a function for it. You will be graded on coding style. This is not CS 0007. You cannot put everything in main. Don’t try it.

Make a new function. This function needs to take the filename and return a FILE*. It should:

Then, call that function in show_info using the filename argument that was given.

Now compile and test it!!

For example:


2. Showing the file info (show_info)

Make a function to read a chunk’s header (the length and identifier fields). Tips:

The first chunk in the file (right after the file signature) should be the IHDR chunk. Since you just fread‘ed the file signature, the file position is already in the right place to start reading the chunk, so no fseek is necessary.

In show_info, after opening the file, use that function you just made to read the first chunk.

BEFORE YOU CONTINUE, test that this code works:

Once you’ve verified that’s working, you can read the actual file info. The 13 data bytes that follow the chunk’s header are:

Make a struct for this and make a function to fread a copy of that struct. Don’t forget to byte-swap the width and height after reading!

Think back to the sizeof() exploration you did in lab 2. Use that knowledge to pick appropriate types for each field. Remember, char is not only for text.

Even though you only put 13 bytes of fields in this struct, it will end up as 16 bytes. (Why?) Because of that, when you use fread() to read an instance of this struct from the file, use a constant 13, not sizeof(). Otherwise, your file position will get out of place.

Now you can print out the info. Notes:

When you run it as ./readpng cookiebear.png, it should look like:

File info:
  Dimensions: 512 x 443
  Bit depth: 8
  Color type: RGB + Alpha
  Interlaced: no

For graybear.png:

File info:
  Dimensions: 512 x 443
  Bit depth: 8
  Color type: Grayscale + Alpha
  Interlaced: no

And for has_text.png:

File info:
  Dimensions: 32 x 32
  Bit depth: 4
  Color type: Grayscale
  Interlaced: no

3. Showing all the chunks (dump_chunks)

Now that you have a function to read a chunk header, this one should be straightforward.

dump_chunks should work like this:

  1. open and check the file, like before.
  2. in a loop:
    1. read the chunk header.
    2. print the chunk’s type and length. (remember, %.4s)
    3. if the chunk is an IEND chunk, exit the loop.
    4. otherwise, skip the chunk (read below).

Look at the diagram showing how chunks are laid out. You already read the length and identifier; now you need to skip the data and CRC. This can be done in one line of code.

Don’t forget: Ctrl+C stops a runaway program!

Done correctly, the outputs on the three test files should be:

This is the beauty of chunked file formats: you don’t even have to know what most of these chunks are! But you can easily see the structure and find the things you do care about.


4. Extracting textual data (show_text)

You can see above that has_text.png has some chunks with the tEXt identifier. These are used to embed human-readable information in the file. Think things like keywords, descriptions, copyright info, and so on.

This mode will be pretty simple:

  1. open and check the file, like before.
  2. in a loop:
    1. read the chunk header.
    2. if it’s an IEND chunk, stop.
    3. if it’s a tEXt chunk, read it and display the name and value.
    4. skip any other chunks.

Each tEXt chunk is a sort of “key-value pair”; a name which says what the text is, and a value which is the actual text. These chunks’ data looks like this:

the first part of the chunk is the name, which is followed by a zero terminator byte. the rest of the chunk is the value, but there is no zero terminator at the end of the value!

The chunk’s length includes the length of the name, the zero terminator in the middle, and the length of the value.

Since these chunks can be any length, you will have to dynamically allocate space to hold the text data for printing. If you use malloc, don’t forget what you have to do when you’re done with that space!

Notes:

Done correctly, using ./readpng has_text.png text should show something like:

Title:
  PngSuite

Author:
  Willem A.J. van Schaik
(willem@schaik.com)

Copyright:
  Copyright Willem van Schaik, Singapore 1995-96

Description:
  A compilation of a set of images created to test the
various color-types of the PNG format. Included are
black&white, color, paletted, with alpha channel, with
transparency formats. All bit-depths allowed according
to the spec are present.

Software:
  Created on a NeXTstation color using "pnmtopng".

Disclaimer:
  Freeware.

Neither of the other two images has any text.


Extra credit (up to +10 points)

If you do the extra credit, please put a comment at the top of your source code telling the grader that you implemented it.

Since this is a chunked file format, it’s pretty straightforward to add additional chunks into the file. The PNG format is also forgiving about the location of tEXt chunks.

For extra credit, implement a new mode that works like this:

./readpng somefile.png add Name "This is a value for that text"

and this would modify somefile.png by adding a new tEXt chunk where the name is “Name” and the value is “This is a value for that text”.

When you use “quotes” on the command line, the stuff in quotes will be a single item in argv, so don’t worry about having to handle that.

Notes:


Grading

We will be compiling your programs with the following options, so be sure to compile with them while you develop as well:

$ gcc -Wall -Werror --std=c99 -g -o readpng readpng.c

Submission

Name your file with proj1, like abc123_proj1.tar.gz. proj1. Not project1. Not readpng. Not proj01. proj1. proj1. proj1. proj1. proj1. proj1. proj1. proj1. proj1. proj1. proj1. proj1.

Submit ONLY YOUR readpng.c FILE INSIDE A TAR FILE NAMED abc123_proj1.tar.gz.

Please don’t include the PNG files in your submission. They’ll waste a bunch of AFS space.

You can make a new directory and copy your readpng.c file into there, and then tar that directory.

Now you can submit using the same directions as lab1, but replace lab1 with proj1.