April 2012
This semester I’ve been attending weekly Data.gov/Linked Open Government Data meeting at Tetherless World in preparation for a summer internship at Data.gov in Washington, DC. I’m still working through the hiring process for the internship, there are a number of requirements to complete before actually being formally hired. Additionally, I’ve been trying to do some work with the csv2rdf4lod tool to convert government data on mine safety. I’ve gotten the converter working, but I need to work on enhancing the data conversion manually.
Yesterday I presented a poster on the OrgPedia visualization from last semester at RPI’s third annual Undergraduate Research Symposium. The poster can be found on the TWC site here: http://tw.rpi.edu/web/doc/Using_d3js_Visual_Corporate_Board
Feb 27, 2012
This Summer it’s looking like I’ll be interning at Data.gov, so I’ve been trying to get more of a feel for the LOGD (linked open government data) programs going on at Tetherless World. I’ve been attending weekly meetings about Data.gov related LOGD projects at TWC, and I’ve also been tasked with working with the csv2rdf4lod tool to convert some government data on mine safety. So far I’ve been extremely busy with 22 academic credits before research and extracurriculars, but I’m enjoying getting to work with more government related research.
Fall Semester 2011
This semester at TWC I assisted Xian Li with the OrgPedia organizational transparency site. While I tried to help a bit with the OrgPedia pages and the information they present, my most significant contribution to the project was the board members network demo I created with Bharath Santosh. I already blogged about the creation of the demo here. It was definitely a highlight of the semester for me, and now that I have some experience with using Python to pull and format web data, I’d like to further explore visualization with the d3.js library – perhaps next semester.
Outside of TWC, in the former half of the semester I worked on an independent study about my experiences as a intern in the US House of Representatives this past summer. I ended up writing almost forty single-spaced pages on the role of information technology in the federal government as I observed it both on the clerical day-to-day side and in larger policy issues such as open government data and cybersecurity strategy. My experience in Washington taught me a lot about the policy implications that the sort of work done at the TWC can have, and on a broader level, the interplay of technology and policy at large.
Overall, I had a great semester, and I’m looking forward to a busy schedule in the new year. I plan to continue my involvement with the TWC Undergraduate Lab.
OrgPedia Board Members Network
I recently worked with Bharath Santosh to help Xian Li with a demo of the OrgPedia organizational transparency site. The project involved creating an interactive graph visualization of connections between members of corporate boards (the final product can be found here). Given a list of a few hundred stock tickers and access to the LittleSis API, the goal was to ultimately produce a JSON file of board members that could be use by the D3.js force-directed graph framework. I started by looking up each ticker symbol, yielding a JSON file with a unique ID number for each company. My script then queried the API for actual company page associated with that ID and stored the names, company associations, and URIs of each board member. Finally, a JSON file for the D3.js graph was output describing the ~2800 board members and the links between each of them.
While I had used Python a bit for command line scripting, I hadn’t really dug into it before this project. The work gave me a better taste for the language and its capabilities. I made extensive use of the “urllib” library for accessing web content, and worked with opening up the data in JSON files. Bharath helped me with the syntax of program and some of the graph construction. While I was aware of Python’s reputation for ease of use and high level abstraction, working with it let me experience this abstraction first hand, I was very impressed. The ease with which complex multistep operations could be completed let me focus more on the flow of the data through the process rather than the specifics of handling it. The project also gave me a bit more hands on experience with JSON.
Bharath has a blog post up here detailing some of what he did on the visualization side. And be sure to check out the final visualization at http://tw.rpi.edu/orgpedia/node/80756.
Working on OrgPedia
This semester I’m helping Xian Li with the OrgPedia organizational transparency project. I’m working on adding semantic information to the site’s person pages, pulling data various open data stores. I’ve so far been trying to get RDF data from Freebase, so that we can then parse it and figure out what sort of information we want to present, as some of the data available is superflous. I haven’t worked very extensively in PHP before, so I’ve been experimenting with the cURL library, which allows users to connect to servers with various protocols. Using cURL I can connect to Freebase and request RDF+XML data as opposed to standard HMTL that the site would normally return.
The OrgPedia project has been really interesting so far and I look forward to working on it more.
Summer Internship 2011
This summer I spent eleven weeks interning in the Washington, D.C. office of US Representative Mike Kelly. The internship was a great chance to explore my longstanding interest in government, though I was also able to work with policy issues in information technology, web science, and computer science, including federal IT reform, open government data release, and cybersecurity.
While most of my work was traditional intern tasks, sorting mail, writing letters, answering phones, and running errands, I did get to spend a substantial working with issues more pertinent to my studies. As Representative Kelly serves as Vice Chairman of the Oversight and Government Reform Committee’s Subcommittee on Technology, Information Policy, Intergovernmental Relations and Procurement Reform, he was involved with legislative work addressing technology issues. At one subcommittee hearing on federal IT reform and government transparency I was able to see former US CIO Vivek Kundra testify on his work, alongside CIOs from Department of the Interior, Department of Energy, and Department of Veterans Affairs. While waiting for a quorum of two congressmen so that the hearing could start I was able to approach the witness panel and speak to Kundra for several minutes, discussing my interest in his Data.gov project and my involvement with the Tetherless World Constellation, which he was well aware of. At the hearing Kundra testified about the benefits of open government data and noted the value of data “mashups”, many of which have come out of the TWC. In preparation for the hearing I discussed IT issues with Representative Kelly, including federal data center usage, free and open source software, cybersecurity, and Semantic Web for open government data. I explained Semantic Web as being a way of linking data and information in such a way as to give it computer-understandable meaning.
Cybersecurity was an important issue this summer, and I was able to attend hearings addressing the matter from the Oversight and Government Reform Committee and the Homeland Security Committee. I also represented the office at two think-tank hosted briefing sessions on the matter for government and industry officials, where I was able to hear from officials including the former CIO of the US Air Force, the Legal Counsel for the newly formed Cyber Command, and Senator Harry Reid’s senior advisor for defense issues.
Through regular lectures for interns hosted by the Committee on House Administration I was able to hear from and interact with a number of important government figures and public intellectuals including former Defense Secretary Donald Rumsfeld, who I asked about the Executive Branch’s decision to classify cyber attacks the same as they would any other military attack; New York Times editorialist David Brook, who I asked about the role of digital distribution in his work; and Ralph Nader, who I spoke to one-on-one with about “humanistic” uses of information technology and computer science.
Overall, I had a great time and learned a lot at my internship, and I did far more than I ever expected to. The experience was invaluable to me as someone with interests in government and technology.
May 5, 2011
This semester at The Tetherless World I wasn’t able to get as much as I planned done, due to a very heavy course load, but I learned a lot. One really powerful idea I had for the Health Search web app that I am helping Dominic Difranzo with is the ability to do a ‘diff’ on two different zip codes. Seeing information on one is powerful, but having a side by side comparison of two regions would make the app especially powerful. I edited the code so as to be able to display the information of two zips, though I did not have the time to change the interface so that a user could do a query for two. Implementing this should be trivial though. Giving users a percent difference between two zips should also be trivial, as all the numbers are available.
As I mentioned in an earlier post, in March I worked with Bharath Santosh on a project with Facebook Profiles and FOAF. That project really stands out to me now as one of the funnest experiences this semester. We worked all night from 8 PM to 8 AM, coming in with nothing and leaving with a finished project. Aside from the semantic web aspect of what we did, the actual code work behind it was a lot of fun. We had Colin Rice on Ubuntu working in Python, Bharath on Windows with Java, and I worked in C on Mac OS X. While I’ve worked in C++ for my classes here at RPI, it was my first time using C, and while not radically different from C++, it was a good memorable learning experience.
I’ll be back a the TWC next semester with some new ideas to blog about.
