DokuWiki

It's better when it's simple

User Tools

Site Tools


plugin:sqlraw

This is an old revision of the document!


sqlraw Plugin

Compatible with DokuWiki

Adora Belle, Weatherwax, Binky

plugin Execute an sql query on the raw data of either a csv file or a scraped webpage table

Last updated on
2013-11-14
Provides
Syntax
Repository
Source

This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.

Similar to sql

Tagged with csv, database, sql

This plugin will get data from either a csv file or scrape a table on a web page and load it into a temporary database. It will execute the provided sql query on the database and output the results as a table

Installation

To use this plugin a user needs access to a mysql database and privilege to

  • Create temporary tables,
  • Select, and
  • Insert.

I recommend creating a user account and granting database specific privilege to that account, specifying a database external from your other databases. That user account information will be specified in the plugin software and the user will not need to know the login information.

Install the plugin using the Plugin Manager and the download URL above, which points to latest version of the plugin. Refer to Plugins on how to install plugins manually.

Update the configuration data for the user account login information for the database.

:!: External requirements:

  1. PEAR::DB. Install the Pear DB.php package.
  2. If the table sort capability is wanted then the Keywords plugin is needed. If you do not want the sort capability you can delete the following files from the install
  • sorttable.min
  • action.php

Examples/Usage

  • The following is data from a scrape of a US Employment data table. (Note that column headings have no spaces when a new line is started. Otherwise use an underscore for spaces. Eg: Oct_2013). This example uses the fixTable='1' option to correct issues with the table.
<sqlraw  id='unemployment' title='Unemployment by education level' source='scrapeUrl' link="http://www.bls.gov/news.release/empsit.a.htm" fixTable='1'>SELECT category, Oct_2013 FROM temptable where (Category = "Less than a high school diploma" or Category = "High school graduates, no college" or Category = "Some college or associate degree" OR Category = "Bachelor's degree and higher")  ;</sqlraw>
  • The following is an example using a dokuwiki page with a table

Create a dokuwiki page called runexample with the following content. This example created a page at users:admin:data.

  
====== Running ======

^ Date ^ Distance ^ Time ^ Excuse ^ Factor ^ Events ^
| 2012-06-22 |  |  |  Time  |  |  |
| 2012-06-24 |  3.5  |  35  |  |  Hot  |  |
| 2012-06-26 |  3.5  |  34  |  |  |  |
| 2012-06-28 |  |  |  |  |  6.2 mile race (Lehigh Valley Zoo - 10k) |
| 2012-06-30 |  3.6  |  35  |  |  |  |
| 2012-07-02 |  |  |  |  |  |
| 2012-07-04 |  |  |   |   |  |
|  2012-07-06  |  3.7  |  36  |   |   |  |
|  2012-07-08  |  4  |  37  |    |  Tired  |   |
|  2012-07-10  |    |    |     |     |  the marathon to end all marathons |
|  2012-07-12  |  4.1  |  37  |     |     |   |
|  2012-07-14  |  4.1  |  37  |     |     |   |
|  2012-07-16  |  4.2  |  38  |     |     |   |
|  2012-07-18  |  4.2  |  38  |     |     |   |
|  2012-07-04  |  3.4  |  30  |     |  Hot  |   |

Create another dokuwiki page with the following content. This uses the sqlraw plugin to query the existing table and display the query results in a table.

 
<sqlraw id='myrun' title='My Running' source='scrapeUrl'  link="users:admin:data:runexample" startMarker="Running">SELECT date, round(time/distance,1) FROM temptable;</sqlraw>

This page processes a scrape of the supplied link to my_run_log page to get the first table present following a marker. The data is put into a temporary database table. Then a query is run on the temporary table to select only date and calculate the speed in minutes per mile. The query produces an output table which is replaced by a plot of the data.

  • The following is an example using the sort capability for the created table
 
  <sqlraw id='prices' class='sortable' title='World Gasoline Prices' source='scrapeUrl' link="http://en.wikipedia.org/wiki/Gasoline_and_diesel_usage_and_pricing">SELECT country_territory AS Country, us$_us_gallon_95_ron AS "US Dollars for Gallon Regular", RIGHT(date_of_price, 8) AS Date FROM temptable;</sqlraw>
{{keywords>sqlRawSort}}

Set the class as 'sortable“. Use the keywords plugin to pull in the sorting javascript code. Clicking on a column header will sort the table one way (ASC, DESC) and clicking again will sort the table the other direction.

  • The following example gets the data from an online csv file. The source option is now set to csvfile. The url link points to a downloadable csv file.
  <sqlraw id="earthquakes" source="csvfile" title="Large Recent Earthquakes" link="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_week.csv">SELECT time, mag, depth, place FROM temptable WHERE mag > 5.0 AND depth < 70;</sqlraw> 

Syntax

Basic syntax:

<sqlraw source='csvfile|scrapeUrl' link="url">mysql query;</sqlraw>
Parameter Values Description
id 'string' an html id for the table. (This can then be used by tableplot plugin to plot.)
link 'string' url to csv or table
startMarker 'string' a marker of text to start looking for the next table. Only needed if not using the first table on the page, For dokuwiki pages this may be required as the sidebar page index is usually the first table.
display 'inline' or 'block' default block
position 'horizontal' or 'vertical' Table orientation. Default is horizontal
class 'string' css class identifier (eg: sortable for table sorting) (default inline)
title 'string' Table caption
source 'csvfile' or 'scrapeUrl' Source of data is either a csv file or a table on a webpage
caption '0' or '1' A '1' specifies to skip the 1st row as it is a caption. Default is '0'.
tableNumber 'number' The position number of a table to scrape on a web page. For example '2' would scrape the 2nd table. The default is '1'.
fixTable '0' or '1' A '1' will run a function that attempts to handle such items as row or col span, or a caption as a table heading. The default is '0' to execute faster.

Configuration and Settings

Configuration item

'sqlraw_tempdb'

is used to set the temporary database login. It is in the format of

'mysql://user:password@hostname/database'

'sqlraw_debugfilepath'

provides a file path and filename to save the raw data for scraped tables. The table data will be saved as a csv file. This is useful for debugging the table output from a scraped table. An empty or null string will specify do not write the csv file. Note that this is only for table data. The path is relative to the dokuwiki base install folder. If not needed for debug then set to a empty string to execute faster.

'sqlraw_restrict_names'

If true (=1) then scraped table headings will be searched for all elements in the configuration array of 'mysqlDisallow' and if found they will be replaced with the corresponding element in 'mysqlReplace'. This allows removing header elements that will cause errors when the heading is used as a database field name.

'sqlraw_mysqlDisallow'

array of elements to replace if restrict_names is true (1)

'sqlraw_mysqlReplace'

array of replacement elements for disallowed header elements.

Development

The plugin will run considerably faster on scraping website tables if the following debug capabilities are not used. . They are turned off with the following settings.

in the conf/default.php file

$conf['sqlraw_debugfilepath'] = '';

On an individual page, the following default parameter.

fixTable = '0'

Change Log

Known Bugs and Issues

None

ToDo/Wish List

FAQ

Discussion

plugin/sqlraw.1384695462.txt.gz · Last modified: 2013-11-17 14:37 by tom_c

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki