Web Notebook Guidelines

most material here courtesy of Kevin Yank, a former HCI TA

Introduction
Provide a Front Page
Keep Files Inside the Notebook Directory
Use Valid Links
Use Relative Links
Link Depth
Advanced Considerations

Introduction

The project notebooks are archived on the due date of each deliverable by wget. This program makes a few assumptions about websites it is assigned to archive. In this document I will outline a few guidelines you must follow to make sure that it will be archived properly and readable by those who are assessing your work. You are responsible for following these guidelines! If wget fails to archive your notebook because you didn't take the instructions in this document into account, this will affect the assessment of your work.

For those curious, the exact syntax used by wget is: wget -b -i <urls> -t 3 -P <target location absolute path where notebooks are archived> -p -k -r -l 31 -np --user=hciagent --password=<web-agent-password>

All teams are strongly encouraged to put up a sample notebook early so that they can see the results of the test grabs.

Provide a Front Page

For the instructor to set up wget to retrieve the web notebooks, each team will have to provide a URL (address) for the front page of their notebook.

Say your ECE user ID is kyank (as mine happens to be). That means your ECE Web site is accessible from http://www.ece.mcgill.ca/~kyank/. Now, let's say you created a subdirectory in your account's www directory called hci for your team's web notebook. Web pages stored in that directory would be available from http://www.ece.mcgill.ca/~kyank/hci/filename.html. By convention, you should name the front page of your notebook index.html, which will let people view the front page just by typing http://www.ece.mcgill.ca/~kyank/hci/. You would then provide this address to the instructor as the front page of your notebook.

Keep Files Inside the Notebook Directory

Continuing from the previous section, all of the files that make up your web notebook should then be stored within that hci directory, or in subdirectories thereof. wget will not archive any files in directories outside the directory tree rooted at your front page.

For example, if you wanted to provide images of your team members and you happened to have a portrait of yourself called myportrait.jpg on your ECE Web site, you might think to reuse that copy of the file and just refer to it as follows:

<img src="../myportrait.jpg" width=50 height=100 border=0 alt="picture of me">

Since myportrait.jpg is not within the hci directory tree, it will not be archived by wget and will be missing from the copy of your notebook that is marked. Instead, you should copy myportrait.jpg into the hci directory and refer to that copy in your <img> tag:

<img src="myportrait.jpg" width=50 height=100 border=0 alt="picture of me">

Alternatively, you could create a subdirectory of hci called images, and put the image file there:

<img src="images/myportrait.jpg" width=50 height=100 border=0 alt="picture of me">

Since myportrait.jpg is in a subdirectory of hci, wget will archive it properly.

Use Valid Links

One of the first things you'll learn if you're knew to HTML is how to create hyperlinks. Basically, you surround the text or image you want users to click on with an <A> tag containing an HREF attribute, like this:

<a href="report1.html"> Click here to see our first report. </a>

In this example, clicking on the text will load the page report1.html into the user's browser. You can also link to files stored in directories:

<a href="path/to/report2.html"> Click here to see our second report. </a>

In this example, report2.html is stored in a directory called to that is a subdirectory of path, which in turn is a subdirectory of the current directory (where the current document is stored). Note that to separate directory names in the path, UNIX-style slashes (/) are used as opposed to Windows-style backslashes (\). A common mistake is to use backslashes instead of slashes, which make for invalid links that wget may not be able to follow.

Another thing to notice is that none of the file or directory names contain spaces. This is no coincidence. Spaces in file/directory names can also foul up wget, so avoid them at all costs. There are ways around this limitation, but they aren't worth the trouble.

Use Relative Links

Both of the links in the previous section belong to a class of links called relative links. These are links that use the current directory to determine the location of the file being linked to. In the first example (href="report1.html"), report1.html is to be found in the same directory as the current document, since no path information is provided. In the second example (href="path/to/report2.html"), the directory path (which contains the directory to, which in turn contains report2.html) is also in the same directory as the current document.

If a link isn't relative, it is said to be an absolute link. Absolute links provide either the full path to a file (e.g. href="/~kyank/hci/report1.html"), or provide the full Web address (e.g. href="http://www.ece.mcgill.ca/~kyank/hci/path/to/report2.html"). Both of these types of absolute links will be archived correctly by wget, but absolute links are not acceptable for use in your web notebook!

In the first type of absolute link (complete path), the link will not work in the archived version of your site because the ~kyank and hci directories will not exist in the archived version of your site. Only the subdirectories of hci will be recreated.

In the second type of absolute link (complete URL), the link in the archived version will point to the 'live' version of the file on your site instead of the archived version. This is bad for two reasons:

If the marker is not connected to the Internet when reviewing the archived web notebook, this link will fail.
If you make changes to the live version of the document after the submission deadline, the marker might assume you are trying to cheat. Rest assured, we will be watching for this.

The upshot of all this is that you should only use relative links in your web notebooks if you are linking to something that will be marked. You may, of course, use absolute (complete URL) links to link to external references and resources related to your project. The guidelines in this section apply equally well to <img> tags, and any other tag that requires the browser to load another file.

Link Depth

When archiving the web notebooks, we will have wget set to archive to a link depth of 10. That is, if a user starting from the front page must click on more than 10 links to get to a particular page of your site, that page will not be archived by wget. If for some special reason your web notebook must use a link depth greater than 10 (for example, if your design prototype is Web-based and it exhibits a high link depth), contact the instructor well in advance so that we can make the necessary adjustments to wget.

Advanced Considerations

If you are already familiar with the basics of Web design, you may be thinking of spicing up your document with JavaScript, Cascading Style Sheets (CSS), Flash, or even a Java Applet or two. In general, wget will accomodate most of these with a few constraints. If you have no idea what any of these are, or if you have no intention of using them in your web notebook, you can safely skip this section.

JavaScript

JavaScript itself is immaterial to the archivability of your site. As far as wget is concerned, JavaScript is just another part of the Web page it is retrieving. The trouble comes when you use JavaScript to load other files. You need to do this in a way that wget will recognize so that it can download those other files for archiving as well.

wget will find external JavaScript files included using <script> tags without difficulty. Here's an example:

<script language="JavaScript" src="myscript.js"></script>

In this case, the myscript.js file will be downloaded and archived correctly.

A common use of JavaScript is to produce pop-up windows using the onClick event handler of the <a> tag. Here's an example:

<a href="javascript:void(0);" onClick="window.open('file.html','popupwin','width=600,height=440');popupwin.focus();"> link </a>

wget can't interpret JavaScript; as a result, file.html would not be archived in this example. To correct this problem, you need to put file.html in the href attribute of the link. By making the onClick event handler return false, you can tell the Web browser to ignore the href attribute. Here's an example that is functionally identical, but which wget can archive properly:

<a href="file.html" onClick="window.open('file.html','popupwin','width=600,height=440');popupwin.focus(); return false;"> link </a>

Another common use of JavaScript is to swap images in response to mouse events. Such effects are commonly referred to as mouseovers. Mouseovers rely on JavaScript swapping one image for another by changing the src attribute of an <img> tag on the fly. Since wget can't interpret JavaScript, it will usually not archive any image file that only appears in JavaScript code.

To force wget to download the required files, you can use 'dummy links' to point out the files that need to be downloaded. Dummy links are just <a> tags with nothing inside them. Here's an example of a link that won't appear anywhere on the Web page, but which wget will follow and archive:

<a href="image1.gif"></a>

Cascading Style Sheets (CSS)

CSS, like JavaScript, is largely transparent to wget. There are, however, a couple of issues that deserve mention.

First of all, external Style Sheet files may be linked to HTML documents using the <link> tag as follows:

<link rel="stylesheet" type="text/css" href="styles.css">

wget will successfully detect such links and archive and archive the Style Sheet files properly.

CSS may be used to specify background images for HTML elements, as in the following example:

BODY { background-image: url(watermark.jpg); }

Since wget cannot interpret CSS, the image file in this example (watermark.jpg) would not normally be archived. For such images, you'll need to use the 'dummy link' method described in the previous section:

<a href="watermark.jpg"></a>

Flash

Flash is an authoring tool used to create animated and interactive elements for display on Web pages. These elements are called Flash movies. For an example of a web notebook that uses Flash for its main menu, see the PayBuddy Site.

Since the Flash authoring tool uses an HTML <embed> tag to place a Flash movie in a Web page, wget can read this tag and download the Flash movie file for archiving. Unfortunately, wget cannot make any sense of the contents of a Flash movie, so any files that you link to within your Flash movie will not be downloaded by wget. This is a particularly serious problem if you intend to use a Flash movie as the main menu system of your Web page.

To ensure that the files you link to in your Flash movies are downloaded, you are encouraged to provide regular text links as an alternative navigation system. Alternatively, you can use the 'dummy link' method introduced in the JavaScript section above.

Java Applets

Java applets are especially problematic for archiving, since they use the HTML <applet> and <param> tags, which wget may not interpret, and often use additional Java class files internally, which wget has no way of knowing about.

If you plan to make use of Java applets in your web notebook, you will most likely have to make use of the 'dummy link' method described in the JavaScript section above to identify all the files that wget needs to download. Especially later in the term where Java applets may be used to demonstrate design prototypes, project groups that wish to make use of applets on their site are encouraged to discuss this with the instructor well in advance so we can run some tests to ensure that they archive properly.

Last updated on 28 August 2016
by Jeremy Cooperstock