Thursday, November 13, 2008

Week 10 Comments

Comment #1:
I commented on Jen’s “Intro to Information Technology” page:
https://www.blogger.com/comment.g?blogID=1475137707322366107&postID=7838913334025505790&page=1

Comment #2:
I commented on Megan’s “The Alley View” page:
https://www.blogger.com/comment.g?blogID=1139180432200060758&postID=5904996233988888078

Reading Response #10: Harvest Time

Of the readings assigned this week, I found “Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting” to be the most interesting as it peripherally touched upon the relationship between metadata harvesting and “The Deep Web.”

Typically, harvesting culls information from Web pages using the metadata tags embedded in the HTML code. Initially, there was some dispute as to whether the function should be performed digitally or manually, the concern being that people would create too many disparate terms while a purely software-based procedure would exclude important semantic relationships. With Dublin Core rapidly becoming a standard for metadata schema, Web pages are increasingly adhering to a similar standard and format. But does this improvement in search methods extend to the “Deep Web”?

The “Deep Web” includes information that is available to the public but resides outside the scope of traditional search engines because data is stored in proprietary databases, only accessible by direct inquiries. However, the OIA Protocol allows search engines without normal access to this information to index pages hosted on the “Deep Web” through OAI repositories. As a significant portion of digital information resides on the “Deeb Web” this new component of accessibility is important as it helps to promote open-source and transparent policies in regards to public information.

Muddiest Point #10

I was curious to why uploading my website to the Pitt's page didn't work the first time but it did the second. In the first instance, I followed Dr. He's instructions explicitly but I kept getting a 403 error message. The second time, I used instructions from the Technology Help desk. The only difference between the Help Desk's instructions and Dr. He's was that I typed in a telnet address in my browser. This address led to a request for a source of software to open it and I selected FileZilla. The FileZilla screen popped open and when I sent the file, it was visible on Pitt's page.

Saturday, November 8, 2008

Assignment #6

To view my completed web page, please click here.

Thursday, November 6, 2008

Week 9 Comments

Comment #1:
I commented on Lauren’s blog “LIS 2600 Land”
https://www.blogger.com/comment.g?blogID=4181925387762663697&postID=3072857614832667163

Comment #2:
I commented on Theresa’s blog “Intro to Information Technology”
https://www.blogger.com/comment.g?blogID=5586031599791302355&postID=9132805377535596301

Reading Response #9: Gone Fishing

Michael Bergman’s article “The Deep Web: Surfacing Hidden Value” is an important white paper because it addresses both the limitations of current search engine formats and the structure of information on the Web. Google, today’s most popular search engine, relies on an aggregate formula to create a list of query results intended to minimize duplicates and increase the number of relevant resources. However this format is inherently imperfect because it relies on a system of popularity through web citation (similar to the way the most prominently published scientific journal articles cite each other). As a result, a web page with relevant information might end up further down on a list of query results because it has not been cited adequately by other web pages.

A bigger problem with this search method is that it skims over the larger repository of information available in the Deep Web. Most of the information here is digitally available but instead of being hosted on a “surface page”, it is embedded in proprietary databases that are linked to but function off of the Internet.

The Deep Web should be a primary concern for several reasons. Currently a great deal of development is being done on more semantic and comprehensive search capabilities. For this work to be functional and current, it has to be able to adapt to the exponential increase in digital information as well as its location, both on surface web pages and the Deep Web.

Also, the availability of information is one of the most important components of digital network systems because without it, the democratic intention of the web is meaningless. Bergman gives the example of several federal organizations that post their information online but not in a format accessible by commercial web engines; the majority of the information is hidden in the “Deep Web.” Though not intentionally deceptive, this unexplored territory of information could inadvertently become an intentional iron curtain. As the format of information transitions from analog to digital, it is important that the same amount of information be readily available.

Muddiest Point #9

This week's lecture went over my head. I thought I had a basic understanding of HTML but realized I didn't when I wasn't able to discern the difference between HTML and XML.