A Web Service for My Dog
R. Alexander Milowski
milowski at sims.berkeley.edu
#1
Hudson and the Beach
This is Hudson:
He likes to go to the beach because:
He can run there.
There are lots of stinky things at the beach.
Every so often there is a dead animal/fish to roll on.
#2
The Problem
We go to Fort Funston on the Pacific Ocean in the morning.
There are tall dirt cliffs along the ocean and it is a long walk down a steep hill to get to the beach.
If the tide is moderately high, there isn't much beach.
In addition, San Francisco's "treated" sewage line (ick!) blocks walking along the beach when the tide is high.
So, we try to only go to the beach when the tide is low.
#3
The Problem - Complications
I forget to check the tides.
I'm usually at the park trying to gauge the tide level from far above the beach.
...and I'm not that good at it.
I really need access to local tide information when we are at the park.
#4
Remember Hudson
How could you deny him the beach?
#5
Google is your Friend
Searching for "tide information" on Google returns 7,380,000 pages.
The first couple of pages quickly gets you to: http://co-ops.nos.noaa.gov/index.html
This is the National Oceanic and Atmospheric Administration (NOAA) run by the US government.
Their "tides online" service is at: http://tidesonline.nos.noaa.gov/geographic.html
#6
The Raw Data
Clicking through the state map gets you to a station somewhere of interest.
Here is the San Francisco data.
At the bottom of the page is a link to the data: http://tidesonline.nos.noaa.gov/data_read.shtml?station_info=9414290+San+Francisco,+CA
There doesn't seem to be a non-HTML version of this data available but more digging could prove otherwise.
#7
Can We Reliably Request the Data?
A quick experiment shows that we can drop everything but the number:
http://tidesonline.nos.noaa.gov/data_read.shtml?station_info=9414290
Some digging shows that each station code is prefixed by a zip code. This number might be a zip+4.
It doesn't matter as the codes are all listed on the web pages.
The data we need is located inside a XHTML pre element and so can be parsed.
#8
A Solution
Create a web application that will access the NOAA website to get tide information and display it in an XHTML format that can be displayed on a cell phone.
#9
Application Strategy
We'll wrap up requesting the information from the NOAA as a web service.
The input and output should be something that requires minimal effort by the using application.
We won't make assumptions about what data is needed and so the web service will return all the information available. This maximizes re-purposing of the service.
We'll create another web application that accesses this web service.
That web application will display the low and high tide information relative to the request time as appropriate for a cell phone.
#10
Technology Choices - Web Service
We could write the web service as JSP pages or as a servlet.
But we'll be manipulating XML (and HTML).
Maybe XSLT would be helpful.
It will be a multi-step process to translate the HTML page into a response.
An XML Pipeline would be very helpful here.
#11
Because these are my slides...
We'll use the smallx XML Pipelines to implement the service.
It provides the ability to interact with other web resouces and manipulate their content.
You could write this in JSP or as a Servlet.
...but I wrote it and so I used an XML Pipeline.
#12
The Request and Response
We'll make up a request document:
<t:tideinfo location="9414290" xmlns:t="http://www.smallx.com/services/tideinfo/2005"/>
And the response will be added as children:
<t:tideinfo location="9414290" xmlns:t="http://www.smallx.com/services/tideinfo/2005"> <t:tide-data> <t:tide-level date="04/02/2005" time="12:00:00-7:00" level="-0.l1"/> <t:tide-level date="04/02/2005" time="12:06:00-7:00" level="-0.l6"/> ... </t:tide-data> </t:tideinfo>
where the date, time, and level are take from the first four columns of the data. We'll use the predicted level since we might want future tide information (see the data for more on this).
#13
Setting Up the Pipeline
The nice thing about pipelines is that you can just "run them" without worrying about "web stuff".
We can setup the pipeline by creating a new pipeline document in netbeans:
<p:pipe xmlns:p="urn:publicid:IDN+smallx.com:pipeline:1.0" name="get-tides" xmlns:c="urn:publicid:IDN+smallx.com:component-language:1.0" xmlns:f="urn:publicid:IDN+smallx.com:server:forms:post:1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:t="http://www.smallx.com/services/tideinfo/2005" > <p:identity/> </p:pipe>
We can translate the request into a URL request for the URL step by replacing 'p:identity' with:
<p:template> <xsl:for-each select="t:tideinfo"> <xsl:copy> <xsl:copy-of select="@*"/> <t:tide-data> <c:url-get href="http://tidesonline.nos.noaa.gov/data_read.shtml?station_info={@location}" parse-as-html="true"/> </t:tide-data> </xsl:copy> </xsl:for-each> </p:template>
The [p:]template step is a short-hand for XSLT. It runs one template on the document info item.
#14
Making the Request
The [c:]url-get element can be processed by the URL step to make a request to a URL.
The 'href' is the URL it will retrieve.
The 'parse-as-html' attribute controls converting HTML into XHTML.
The pipeline now looks like:
<p:pipe xmlns:p="urn:publicid:IDN+smallx.com:pipeline:1.0" name="get-tides" xmlns:c="urn:publicid:IDN+smallx.com:component-language:1.0" xmlns:f="urn:publicid:IDN+smallx.com:server:forms:post:1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:t="http://www.smallx.com/services/tideinfo/2005" > <p:template> <xsl:for-each select="t:tideinfo"> <xsl:copy> <xsl:copy-of select="@*"/> <t:tide-data> <c:url-get href="http://tidesonline.nos.noaa.gov/data_read.shtml?station_info={@location}" parse-as-html="true"/> </t:tide-data> </xsl:copy> </xsl:for-each> </p:template> <p:url/> </p:pipe>
#15
Processing the XHTML: Getting the 'pre' Elements
We now have an XHTML document as the child of [t:]tide-data.
We need the second XHTML pre element parsed into [t:]tide-level elements.
Let's focus on the XHTML by limiting the scope to the XHTML html element.
<p:subtree select="t:tide-data/h:html"> ... </p:subtree>
This makes the XHTML html element the document element for the contained steps.
Inside that element we can "project a view" so that only the XHTML pre elements remain:
<p:subtree select="t:tide-data/h:html"> <p:view select="/h:html/h:body/h:pre"/> </p:subtree>
Now only the XHTML pre elements remain.
#16
Processing the XHTML: Select the Correct 'pre' Element
We can get rid of the extra XHTML pre element by noticing that it always contains an XHTML b element child.
This step can remove that element:
<p:subtree select="h:pre"> <p:route> <p:when test="h:pre/h:b"> <p:delete/> </p:when> </p:route> </p:subtree>
This deletes a 'pre' element when it contains a 'b' element.
#17
Processing the XHTML: Removing the 'font' Element
The data we want is now inside a 'font' element inside the remaining 'pre' element.
They should have used CSS to do this... but they didn't! Bad NOAA!
This step will remove the wrapping 'font' element while preserving its children:
<p:unwrap select="h:font"/>
#18
Processing the XHTML: Converting the Data
The data can be parsed by two regex steps.
The first tags the lines.
The second converts each line into a [t:]tide-level element.
Here is part of those two steps:
<p:regex select="h:pre" pattern=".+" matches="line"/> <p:regex select="line" pattern="^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)"> <p:template> <xsl:param name="group-1"/> <xsl:param name="group-2"/> <xsl:param name="group-3"/> <xsl:param name="group-4"/> <xsl:variable name="month" select="substring-before($group-1,'/')"/> <xsl:variable name="day" select="substring-before(substring-after($group-1,'/'),'/')"/> <xsl:variable name="year" select="substring-after(substring-after($group-1,'/'),'/')"/> <t:tide-level date="{$year}-{$month}-{$day}" level="{$group-4}"> <xsl:choose> <xsl:when test="$group-3='EST'"> <xsl:attribute name="time"><xsl:value-of select="$group-2"/>-5:00</xsl:attribute> </xsl:when> <xsl:when test="$group-3='EDT'"> <xsl:attribute name="time"><xsl:value-of select="$group-2"/>-4:00</xsl:attribute> </xsl:when> ... </t:tide-level> </p:template> </p:regex>
#19
Processing the XHTML: Last Step
We still have that 'pre' element floating around.
We can remove it and keep the [t:]tide-level elements by:
<p:unwrap select="h:pre"/>
#20
Limiting to tideinfo Elements
We said we wanted to process only the [t:]tideinfo elements.
We can do this by wrapping the whole pipeline with a subtree step:
<p:subtree select="t:tideinfo"> ... </p:subtree>
Now we are done: gettides.pd
#21
Setting Up the Web Service
We'll create a web application project in Netbeans.
Configure the web.xml to run pipelines for .pd files:
<servlet> <servlet-name>PipelineEngine</servlet-name> <servlet-class>com.smallx.servlet.PipelineServlet</servlet-class> <init-param> <param-name>check-for-changes</param-name> <param-value>true</param-value> </init-param> <init-param> <param-name>default-pipeline</param-name> <param-value>tideinfo.pd</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>PipelineEngine</servlet-name> <url-pattern>*.pd</url-pattern> </servlet-mapping>
We can create an index.xml file that is the default input for GET:
<html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Service Error</title></head> <body> <p>This service does not respond to GET requests.</p> </body> </html>
The application is now ready to be used! Try it out.
#22
A Streaming Service
None of the components used required a tree to be built.
This pipeline streams the content from start to finish.
This should reduce overhead when many requests are run concurrently.
#23
Oh, but there is a problem...
We get tide data for a three day period.
We also haven't singled out low and high tides.
That won't be very helpful to the using web application that is only interested in today.
Let's create another service that uses the previous pipeline to restrict entries to today and also finds the low and high tides.
#24
Using the gettides.pd Pipeline
We can create a pipeline 'tideinfo.pd' that uses the 'gettide.pd' pipeline:
<p:pipe xmlns:p="urn:publicid:IDN+smallx.com:pipeline:1.0" name="tideinfo" xmlns:c="urn:publicid:IDN+smallx.com:component-language:1.0" xmlns:f="urn:publicid:IDN+smallx.com:server:forms:post:1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:t="http://www.smallx.com/services/tideinfo/2005" xmlns:date="http://exslt.org/dates-and-times" > <p:subtree select="t:tideinfo"> <p:pipeline src="gettide.pd"/> </p:subtree> </p:pipe>
Let's add today's date by adding this step before the [p:]pipeline step.
<p:template> <xsl:for-each select="t:tideinfo"> <xsl:copy> <xsl:attribute name="today"><xsl:value-of select="date:date()"/></xsl:attribute> <xsl:copy-of select="@*|node()"/> </xsl:copy> </xsl:for-each> </p:template>
#25
Restricting to Today's Date
We can use this XSLT step to restrict to today's date:
<p:xslt> <xsl:transform version="1.0"> <xsl:variable name="today" select="string(/t:tideinfo/@today)"/> <xsl:template match="t:tide-data"> <xsl:copy> <xsl:apply-templates select="*"/> </xsl:copy> </xsl:template> <xsl:template match="t:tide-level[@date!=$today]"/> <xsl:template match="t:tide-level"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="@*|*"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:transform> </p:xslt>
#26
Finding Low and Hige Tides
We can use this XSLT step to find the low and high tides and set them as attributes:
<p:xslt> <xsl:transform version="1.0"> <xsl:template match="t:tideinfo"> <xsl:variable name="today" select="string(@today)"/> <xsl:copy> <xsl:attribute name="low-at"> <xsl:for-each select="t:tide-data/t:tide-level" mode="first"> <xsl:sort data-type="number" order="ascending" select="@level"/> <xsl:if test="position()=1"> <xsl:value-of select="@time"/> </xsl:if> </xsl:for-each> </xsl:attribute> <xsl:attribute name="high-at"> <xsl:for-each select="t:tide-data/t:tide-level" mode="first"> <xsl:sort data-type="number" order="descending" select="@level"/> <xsl:if test="position()=1"> <xsl:value-of select="@time"/> </xsl:if> </xsl:for-each> </xsl:attribute> <xsl:copy-of select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="t:tide-level[1]" mode="first"> <xsl:value-of select="@time"/> </xsl:template> <xsl:template match="t:tide-level" mode="first"/> </xsl:transform> </p:xslt>
#27
The Service is Complete!
The full source of this pipeline: tideinfo.pd
By using the two XSLT steps, we now require two trees to be built.
The pipeline streams up to the point that those steps are run.
It is live at: http://www.smallx.com/tideinfo-service
#28
Your Dog Knows XML?
He's still pouting.
We'll solve that next time by putting this service on my phone.