====== SQLRaw Plugin ====== ---- plugin ---- description: Execute an sql query on the raw data of either a csv file or a scraped webpage table author : tom_c email : tcafferty@glocalfocal.com type : syntax lastupdate : 2013-11-17 compatible : Adora Belle, Weatherwax, Binky depends : conflicts : similar : sql tags : sql, csv, database downloadurl: https://github.com/TomCafferty/plugin-sqlraw/zipball/master bugtracker : https://github.com/TomCafferty/plugin-sqlraw/issues sourcerepo : https://github.com/TomCafferty/plugin-sqlraw donationurl: ---- This plugin will get data from either a csv file or scrape a table on a web page and load it into a temporary database. It will execute the provided sql query on the database and output the results as a table ===== Installation ===== To use this plugin a user needs access to a mysql database and privilege to * Create temporary tables, * Select, and * Insert. I recommend creating a user account and granting database specific privilege to that account, specifying a database external from your other databases. That user account information will be specified in the plugin software and the user will not need to know the login information. Search and install the plugin using the [[plugin:extension|Extension Manager]]. Refer to [[:Plugins]] on how to install plugins manually. Update the configuration data for the user account login information for the database. :!: **External requirements:** - [[http://pear.php.net/package/DB|PEAR::DB]]. Install the Pear DB.php package. - If the table sort capability is wanted then the [[plugin:keywords|Keywords plugin]] is needed. If you do not want the sort capability you can delete the following files from the install * sorttable.min * action.php ===== Examples/Usage ===== Examples can be seen at [[http://clean.glocalfocal.com/doku.php?id=start|Plugin Examples Site]] * The following is data from a scrape of a US Employment data table. (Note that column headings have no spaces when a new line is started. Otherwise use an underscore for spaces. Eg: Oct_2013). This example uses the fixTable='1' option to correct issues with the table. SELECT category, Oct_2013 FROM temptable where (Category = "Less than a high school diploma" or Category = "High school graduates, no college" or Category = "Some college or associate degree" OR Category = "Bachelor's degree and higher") ; * The following is an example using a dokuwiki page with a table Create a dokuwiki page called runexample with the following content. This example created a page at users:admin:data. ====== Running ====== ^ Date ^ Distance ^ Time ^ Excuse ^ Factor ^ Events ^ | 2012-06-22 | | | Time | | | | 2012-06-24 | 3.5 | 35 | | Hot | | | 2012-06-26 | 3.5 | 34 | | | | | 2012-06-28 | | | | | 6.2 mile race (Lehigh Valley Zoo - 10k) | | 2012-06-30 | 3.6 | 35 | | | | | 2012-07-02 | | | | | | | 2012-07-04 | | | | | | | 2012-07-06 | 3.7 | 36 | | | | | 2012-07-08 | 4 | 37 | | Tired | | | 2012-07-10 | | | | | the marathon to end all marathons | | 2012-07-12 | 4.1 | 37 | | | | | 2012-07-14 | 4.1 | 37 | | | | | 2012-07-16 | 4.2 | 38 | | | | | 2012-07-18 | 4.2 | 38 | | | | | 2012-07-04 | 3.4 | 30 | | Hot | | Create another dokuwiki page with the following content. This uses the sqlraw plugin to query the existing table and display the query results in a table. SELECT date, round(time/distance,1) FROM temptable; This page processes a scrape of the supplied link to my_run_log page to get the first table present following a marker. The data is put into a temporary database table. Then a query is run on the temporary table to select only date and calculate the speed in minutes per mile. The query produces an output table which is replaced by a plot of the data. * The following is an example using the sort capability for the created table SELECT country_territory AS Country, us$_us_gallon_95_ron AS "US Dollars for Gallon Regular", RIGHT(date_of_price, 8) AS Date FROM temptable; {{keywords>sqlRawSort}} Set the class as 'sortable". Use the keywords plugin to pull in the sorting javascript code. Clicking on a column header will sort the table one way (ASC, DESC) and clicking again will sort the table the other direction. * The following example gets the data from an online csv file. The source option is now set to csvfile. The url link points to a downloadable csv file. SELECT time, mag, depth, place FROM temptable WHERE mag > 5.0 AND depth < 70; ===== Syntax ===== Basic syntax: mysql query; ^ Parameter ^ Values ^ Description ^ | id | 'string' | an html id for the table. (This can then be used by tableplot plugin to plot.) | | link | 'string' | url to csv or table | | startMarker | 'string' | a marker of text to start looking for the next table. Only needed if not using the first table on the page, For dokuwiki pages this may be required as the sidebar page index is usually the first table. | | display | 'inline' or 'block' | default block | | position | 'horizontal' or 'vertical' | Table orientation. Default is horizontal | | class | 'string' | css class identifier (eg: sortable for table sorting) (default inline) | | title | 'string' | Table caption | | source | 'csvfile' or 'scrapeUrl' | Source of data is either a csv file or a table on a webpage | | caption | '0' or '1' | A '1' specifies to skip the 1st row as it is a caption. Default is '0'. | | tableNumber | 'number' | The position number of a table to scrape on a web page. For example '2' would scrape the 2nd table. The default is '1'. | | fixTable | '0' or '1' | A '1' will run a function that attempts to handle such items as row or col span, or a caption as a table heading. The default is '0' to execute faster. | ===== Configuration and Settings ===== Configuration item **'sqlraw_tempdb'** is used to set the temporary database login. It is in the format of 'mysql://user:password@hostname/database' **'sqlraw_debugfilepath'** provides a file path and filename to save the raw data for scraped tables. The table data will be saved as a csv file. This is useful for debugging the table output from a scraped table. An empty or null string will specify do not write the csv file. Note that this is only for table data. The path is relative to the dokuwiki base install folder. If not needed for debug then set to a empty string to execute faster. **'sqlraw_restrict_names'** If true (=1) then scraped table headings will be searched for all elements in the configuration array of 'mysqlDisallow' and if found they will be replaced with the corresponding element in 'mysqlReplace'. This allows removing header elements that will cause errors when the heading is used as a database field name. **'sqlraw_mysqlDisallow'** array of elements to replace if restrict_names is true (1) **'sqlraw_mysqlReplace'** array of replacement elements for disallowed header elements. ===== Development ===== The plugin will run considerably faster on scraping website tables if the following debug capabilities are not used. . They are turned off with the following settings. in the conf/default.php file $conf['sqlraw_debugfilepath'] = ''; On an individual page, the following default parameter. fixTable = '0' === Change Log === {{rss>https://github.com/TomCafferty/plugin-sqlraw/commits/master.atom date}} === Known Bugs and Issues === None === ToDo/Wish List === ===== FAQ ===== ===== Discussion ===== [[plugin:sqlraw:discussion|Sqlraw discussion page]]