most material here courtesy of Kevin Yank, a former HCI TA
The project notebooks are archived on the due date of each deliverable by wget. This program makes a few assumptions about websites it is assigned to archive. In this document I will outline a few guidelines you must follow to make sure that it will be archived properly and readable by those who are assessing your work. You are responsible for following these guidelines! If wget fails to archive your notebook because you didn't take the instructions in this document into account, this will affect the assessment of your work.
For those curious, the exact syntax used by wget is: wget -b -i <urls> -t 3 -P <target location absolute path where notebooks are archived> -p -k -r -l 31 -np --user=hciagent --password=<web-agent-password>
All teams are strongly encouraged to put up a sample notebook early so that they can see the results of the test grabs.
For the instructor to set up wget to retrieve the web notebooks, each team will have to provide a URL (address) for the front page of their notebook.
Say your ECE user ID is kyank
(as mine happens to be). That means
your ECE Web site is accessible from http://www.ece.mcgill.ca/~kyank/
.
Now, let's say you created a subdirectory in your account's www
directory called hci
for your team's web notebook. Web pages stored
in that directory would be available from http://www.ece.mcgill.ca/~kyank/hci/filename.html
.
By convention, you should name the front page of your notebook index.html
,
which will let people view the front page just by typing http://www.ece.mcgill.ca/~kyank/hci/
.
You would then provide this address to the instructor as the front page of your
notebook.
Continuing from the previous section, all of the files that make up your web
notebook should then be stored within that hci
directory, or in
subdirectories thereof. wget will not archive any files in directories outside
the directory tree rooted at your front page.
For example, if you wanted to provide images of your team members and you happened
to have a portrait of yourself called myportrait.jpg
on your ECE
Web site, you might think to reuse that copy of the file and just refer to it
as follows:
<img src="../myportrait.jpg" width=50 height=100
border=0 alt="picture of me">
Since myportrait.jpg
is not within the hci
directory
tree, it will not be archived by wget and will be missing from the copy of your
notebook that is marked. Instead, you should copy myportrait.jpg
into the hci directory and refer to that copy in your <img>
tag:
<img src="myportrait.jpg" width=50 height=100
border=0 alt="picture of me">
Alternatively, you could create a subdirectory of hci
called images
,
and put the image file there:
<img src="images/myportrait.jpg" width=50 height=100
border=0 alt="picture of me">
Since myportrait.jpg
is in a subdirectory of hci
,
wget will archive it properly.
One of the first things you'll learn if you're knew to HTML is how to create
hyperlinks. Basically, you surround the text or image you want users to click
on with an <A>
tag containing an HREF
attribute, like this:
<a href="report1.html"> Click
here to see our first report. </a>
In this example, clicking on the text will load the page report1.html
into the user's browser. You can also link to files stored in directories:
<a href="path/to/report2.html">
Click here to see our second report. </a>
In this example, report2.html
is stored in a directory called
to
that is a subdirectory of path
, which in turn is
a subdirectory of the current directory (where the current document is stored).
Note that to separate directory names in the path, UNIX-style slashes (/
)
are used as opposed to Windows-style backslashes (\
). A common
mistake is to use backslashes instead of slashes, which make for invalid links
that wget may not be able to follow.
Another thing to notice is that none of the file or directory names contain spaces. This is no coincidence. Spaces in file/directory names can also foul up wget, so avoid them at all costs. There are ways around this limitation, but they aren't worth the trouble.
Both of the links in the previous section belong to a class of links called
relative links. These are links that use the current directory to determine
the location of the file being linked to. In the first example (href="report1.html"
),
report1.html
is to be found in the same directory as the current
document, since no path information is provided. In the second example (href="path/to/report2.html"
),
the directory path
(which contains the directory to
,
which in turn contains report2.html
) is also in the same directory
as the current document.
If a link isn't relative, it is said to be an absolute link. Absolute
links provide either the full path to a file (e.g. href="/~kyank/hci/report1.html"
),
or provide the full Web address (e.g. href="http://www.ece.mcgill.ca/~kyank/hci/path/to/report2.html"
).
Both of these types of absolute links will be archived correctly by wget, but
absolute links are not acceptable for use in your web notebook!
In the first type of absolute link (complete path), the link will not work
in the archived version of your site because the ~kyank
and hci
directories will not exist in the archived version of your site. Only the subdirectories
of hci
will be recreated.
In the second type of absolute link (complete URL), the link in the archived version will point to the 'live' version of the file on your site instead of the archived version. This is bad for two reasons:
The upshot of all this is that you should only use relative links in your web
notebooks if you are linking to something that will be marked. You may, of course,
use absolute (complete URL) links to link to external references and resources
related to your project. The guidelines in this section apply equally well to
<img>
tags, and any other tag that requires
the browser to load another file.
When archiving the web notebooks, we will have wget set to archive to a link depth of 10. That is, if a user starting from the front page must click on more than 10 links to get to a particular page of your site, that page will not be archived by wget. If for some special reason your web notebook must use a link depth greater than 10 (for example, if your design prototype is Web-based and it exhibits a high link depth), contact the instructor well in advance so that we can make the necessary adjustments to wget.
If you are already familiar with the basics of Web design, you may be thinking of spicing up your document with JavaScript, Cascading Style Sheets (CSS), Flash, or even a Java Applet or two. In general, wget will accomodate most of these with a few constraints. If you have no idea what any of these are, or if you have no intention of using them in your web notebook, you can safely skip this section.
JavaScript itself is immaterial to the archivability of your site. As far as wget is concerned, JavaScript is just another part of the Web page it is retrieving. The trouble comes when you use JavaScript to load other files. You need to do this in a way that wget will recognize so that it can download those other files for archiving as well.
wget will find external JavaScript files included using <script>
tags without difficulty. Here's an example:
<script language="JavaScript" src="myscript.js"></script>
In this case, the myscript.js
file will be downloaded and archived
correctly.
A common use of JavaScript is to produce pop-up windows using the onClick
event handler of the <a>
tag. Here's an example:
<a href="javascript:void(0);" onClick="window.open('file.html','popupwin','width=600,height=440');popupwin.focus();">
link </a>
wget can't interpret JavaScript; as a result, file.html
would
not be archived in this example. To correct this problem, you need to put file.html
in the href attribute of the link. By making the onClick event handler return
false, you can tell the Web browser to ignore the href attribute. Here's an
example that is functionally identical, but which wget can archive properly:
<a href="file.html" onClick="window.open('file.html','popupwin','width=600,height=440');popupwin.focus();
return false;"> link </a>
Another common use of JavaScript is to swap images in response to mouse events.
Such effects are commonly referred to as mouseovers. Mouseovers rely
on JavaScript swapping one image for another by changing the src
attribute of an <img>
tag on the fly. Since
wget can't interpret JavaScript, it will usually not archive any image file
that only appears in JavaScript code.
To force wget to download the required files, you can use 'dummy links' to
point out the files that need to be downloaded. Dummy links are just <a>
tags with nothing inside them. Here's an example of a link that won't appear
anywhere on the Web page, but which wget will follow and archive:
<a href="image1.gif"></a>
CSS, like JavaScript, is largely transparent to wget. There are, however, a couple of issues that deserve mention.
First of all, external Style Sheet files may be linked to HTML documents using
the <link>
tag as follows:
<link rel="stylesheet" type="text/css"
href="styles.css">
wget will successfully detect such links and archive and archive the Style Sheet files properly.
CSS may be used to specify background images for HTML elements, as in the following example:
BODY {
background-image: url(watermark.jpg);
}
Since wget cannot interpret CSS, the image file in this example (watermark.jpg
)
would not normally be archived. For such images, you'll need to use the 'dummy
link' method described in the previous section:
<a href="watermark.jpg"></a>
Flash is an authoring tool used to create animated and interactive elements for display on Web pages. These elements are called Flash movies. For an example of a web notebook that uses Flash for its main menu, see the PayBuddy Site.
Since the Flash authoring tool uses an HTML <embed>
tag to place a Flash movie in a Web page, wget can read this tag and download
the Flash movie file for archiving. Unfortunately, wget cannot make any sense
of the contents of a Flash movie, so any files that you link to within your
Flash movie will not be downloaded by wget. This is a particularly serious problem
if you intend to use a Flash movie as the main menu system of your Web page.
To ensure that the files you link to in your Flash movies are downloaded, you are encouraged to provide regular text links as an alternative navigation system. Alternatively, you can use the 'dummy link' method introduced in the JavaScript section above.
Java applets are especially problematic for archiving, since they use the HTML <applet> and <param> tags, which wget may not interpret, and often use additional Java class files internally, which wget has no way of knowing about.
If you plan to make use of Java applets in your web notebook, you will most likely have to make use of the 'dummy link' method described in the JavaScript section above to identify all the files that wget needs to download. Especially later in the term where Java applets may be used to demonstrate design prototypes, project groups that wish to make use of applets on their site are encouraged to discuss this with the instructor well in advance so we can run some tests to ensure that they archive properly.
Last updated on 28 August 2016
by Jeremy Cooperstock