Tir Neb - welcome to no man's land

Friday 29 May 2015

Strata + Hadoop World, London, 2015

Introduction

Conference website: http://strataconf.com/big-data-conference-uk-2015
Schedule of talks: http://strataconf.com/big-data-conference-uk-2015/public/schedule/grid/public

I recently attended the Strata Hadoop World conference in London from 5-7 May 2015, including two half-day tutorial sessions (on Apache Cassandra and R/RStudio) on the first day.

This conference is the main annual Hadoop conference in Europe and is strongly oriented towards business users and the technology industry. It is sponsored by the technology publisher O’Reilly and by Cloudera, one of the main commercial Hadoop suppliers, with additional sponsorship from several other major companies in the Big Data arena. The conference exhibitors also included smaller suppliers of tools and services to the commercial Big Data sector. IT service providers and consultancies were heavily represented here.

This year’s conference offered a wide range of talks, tutorials and (at extra expense) workshops on a variety of topics broadly relating to Hadoop and Big Data, including several sessions on Apache Spark, which seems to be emerging as a significant cross-platform technology for Big Data.

Other talks looked at business applications of Big Data technology e.g. in banking, retail and telecoms, as well as less commercial topics such as Big Data for health research.

From the attendees’ directory provided by the conference organisers, there seemed to be around 900 attendees. A large proportion of these were from outside the UK, mainly from elsewhere in Europe but also from Asia. The conference organisers announced that this year would see the first Strata Hadoop World in Asia (in Singapore), which is clearly aimed at a growing market for these Big Data events worldwide.

General themes

Hadoop is going mainstream but moving past the hype

Not surprisingly, the main focus of many sessions was around Hadoop and related tools and applications. It was striking to see that several major organisations are already integrating Hadoop into their business processes. Although many of us still struggle with basic Hadoop installation and architecture, it is clear that these technologies are increasingly seen as a mainstream platform for providing business critical services. Prominent Hadoop users mentioned here included BT, Barclays, Bloomberg, IBM etc.

It was also encouraging to see that there is a growing body of expertise around how to use these technologies e.g. sessions on designing and implementing enterprise applications with Hadoop. Access to this expertise is vital, as there is still a shortage of reliable and up-to-date documentation on how to build large scale applications with Big Data technologies in practice.

It is still early days compared to traditional relational databases systems, where the industry has decades of experience, but my impression is that Hadoop is no longer seen as a risky cutting-edge technology by mainstream commercial users.

However, it is interesting to note that Infoworld reported recently that demand for Hadoop is falling as people start to look more closely at alternative Big Data platforms. It is suggested that this is partly due to the lack of Hadoop skills on the market, and to the availability of tools like NoSQL databases that may be more suitable for many users. And of course, it may simply be that Hadoop is finally moving past the initial "peak of inflated expectations" in the Gartner hype cycle.

If so, then this is a welcome development, as a more critical attitude towards the Hadoop eco-system on the part of customers may further encourage suppliers to focus on providing tools and expertise to make it easier (and cheaper) to adopt and exploit Hadoop where appropriate.

Hadoop eco-system is growing

The industry already seems to recognise this need for a broader range of services and support for organisations that are starting to explore Hadoop as a platform for enterprise applications. A growing number of technology and service providers are building their own products and services around Hadoop. These range from massive companies like Amazon, eBay and Google, through established technology suppliers such as SAS Institute, Autodesk and Informatica, to smaller specialist consultancies and software suppliers offering a range of tools and services.

To some extent, one might still ask whether some of these companies are simply applying “Big Data” gloss to their existing products (e.g. Informatica), without necessarily representing a major transformation in their products themselves. But this desire to link their products to Hadoop at least indicates a general sense of confidence that Hadoop is a viable technology platform on which to build enterprise data services for the longer term. Even if the glory days of the hype cycle are behind us.

Data science still cool

If Hadoop is no longer quite so cool, there was more reassuring news for Big Data hipsters, as another prominent strand in the conference offering was the growing application of data science to real-world business needs. This ranged from simple statistical analysis and imaginative data visualisation to machine learning for analytics and reporting. One of the short key note talks was from Cait O'Riordan, describing how music identification service Shazam is able to spot hits before they happen.

There were all-day workshops on Apache Spark and on machine learning with Python, which ran alongside the conference and were sold out well in advance, despite being quite expensive. There is obviously a strong demand for training in these technologies, although it is not clear how far these technical skills are being applied in practice or to what extent they are complemented and informed by a solid understanding of basic statistics.

At least some “data science” is probably still more “data” than “science”, but many organisations are clearly aware of the need to build their data science skills, and several were actively recruiting for “data scientists” and “information architects” at the conference.

Big Data > Hadoop, and Google wants a slice of the action

Although this conference was primarily about Hadoop, other Big Data storage and processing technologies featured in several sessions.

There was a half-day tutorial on Apache Cassandra, which provided a quick introduction to this distributed, high-availability, column-family database, which is used for commercial Big Data applications such as video streaming (NetFlix, Sky) and time series data (Apple, British Gas, eBay). This data model is restrictive but very powerful in the right context.

Apache Mesos is an abstraction layer for managing distributed resources transparently, which can be combined with cluster-based tools such as Spark or Cassandra. Google also offers the commercial Mesos-based “data centre operating system” Mesosphere as an easy-deploy option on their Google Compute Platform.

Google announced a new version of their BigTable NoSQL database service, which will support the existing Hadoop HBase API, effectively making BigTable a potential replacement for HBase in some applications. BigTable was one of the inspirations for HBase (and Apache Cassandra) so this may represent a certain degree of convergence.

Of course, using Google BigTable would require users to commit to the Google cloud platform for this aspect of their application architecture. Google has recently been promoting its Google Cloud Platform as a potential alternative to Amazon’s more comprehensive (and complex) cloud services, so this announcement is clearly part of a larger strategy to gain a greater share of the market for cloud-based PaaS products.

Apache Spark is definitely one to watch

Databricks is the company behind Apache Spark and was set up by the core Spark creators. They were strongly represented at this conference, with contributions from founders Paco Nathan and Patrick Wendell. Databricks supported the Spark training course that ran alongside the conference, and there were also several talks on various aspects of Apache Spark for ML and general data processing beyond the Hadoop platform.

Apache Spark offers a distinctive approach to distributed data processing, which can be deployed in various modes e.g. on top of a Hadoop cluster, on a Mesos cluster, on a stand-alone cluster or on a local machine e.g. for development or data exploration.

IBM veteran Rod Smith highlighted Spark in a short talk on emerging technologies, pointing to its unified programming model for batch, realtime and interactive processing that offers a way to integrate data from multiple sources, including streaming data. He also mentioned growing interest in notebook-style interfaces e.g. IPython Notebook.

BT and Barclays are both looking at or already using Spark in some of their data pipelines, while other companies are starting to build tools and services around Spark e.g. Spanish start-up Stratio demonstrated their new SparkTA data analytics platform.

Datastax, the company behind the Apache Cassandra database, now offers a Spark Connector to allow Cassandra and Spark to work together on the same cluster, although this is not yet compatible with the current version of Spark.

Meanwhile, Databricks is currently testing its cloud-based PaaS Databricks Cloud, which will provide a notebook-style interface to a hosted Spark installation, allowing users to upload, transform, analyse and report on their data entirely via the browser.

This looks like a promising and powerful tool, especially for data scientists/analysts, although it is not yet clear when it will be available, or what the hosting model will be. However, Patrick Wendell, one of the developers of Spark and co-founder of Databricks, told me that it will be possible to deploy Databricks Cloud onto a private AWS environment, which might help to address concerns over issues such as data location and confidentiality.

Spark is still developing quickly, with new features being added every few months, but despite its relative immaturity there seems to be a strong interest in exploring Spark’s potential for Big Data applications from developers, tool providers and businesses.

Conclusion

As discussed above, this conference gave a strong sense that the core Hadoop platform has come of age and is now being used for mainstream business applications in the real world. This is helping to generate a wider eco-system of skills, services and tools around Hadoop, which should help later arrivals to make an easier transition to these distinctive distributed data technologies by exploiting the knowledge and expertise now available in the market.

The future of large scale data storage in the enterprise is clearly going to be based on distributed platforms, whether this is on Hadoop, Mesos or some other technology such as NoSQL databases, or indeed a combination of these. Streaming for real-time data processing is a particularly hot topic, offering scope for new applications that would not previously have been possible.

Technology and service providers are all keen to exploit this relatively new market, both through improvements to existing products (SAS, Informatica, HP etc) and through new and innovative products (e.g. Databricks Cloud) that take advantage of the power of these new technologies.

Perhaps ironically, other popular topics covered technologies that have been developed or adopted as a response to perceived historical short-comings in the core Hadoop platform. Google’s HBase-on-BigTable announcement is clearly aiming to attract existing HBase users away from Hadoop. Meanwhile, Apache Cassandra has long offered an alternative column-family database with its own distinctive features compared to HBase.

Apache Spark seems to be especially promising as a widely applicable and very flexible general-purpose distributed processing engine for data from a wide range of sources. Spark offers a way to integrate processing for a variety of common data-processing tasks, from data acquisition and ETL through traditional analytics to machine learning and even real-time analysis of streaming data.

Beyond technology, there seems to be a growing awareness of the need for and usefulness of data science skills in industry as well as academia. Several keynote talks touched on topics familiar to traditional data analysts, such as issues of data quality, provenance, security and confidentiality, as well as the need for a more general understanding of how to make intelligent practical use of Big Data.

Data technology - especially for Big Data - is changing rapidly. Many organisations are understandably reluctant to put themselves at the bleeding edge of technology, but we still need to be looking at what industry leaders are doing today, in order to help us identify what we should be doing tomorrow. This conference provided a welcome opportunity to do this.

Wednesday 8 April 2015

Apache Spark 1.3 data frames in IPython Notebook

I recently installed the latest version (1.3) of Apache Spark and started looking at some of the things that have changed since my last post on Spark and IPython Notebook.

There seems to be a lot going on around the Spark SQL libraries, including the new DataFrame abstraction, so I put together a quick IPython notebook exploring just a few aspects of Spark SQL.

Spark SQL is one of the key factors in why my own organisation is looking at Spark for large scale data applications, because it gives us an easier, more consistent and relatively familiar abstraction over data from a variety of sources. The new DataFrame API seems to fit well within this trend towards more generic approaches to heterogeneous data sources.

In this demo notebook, I loaded up a file of sample CSV data on house sales from the UK Land Registry (see licence), applied a Spark SQL table to it, then performed some simple aggregation queries using the older API and the new DataFrame functions. This is just a quick experiment, but my impression is that the DataFrame API queries are significantly faster than the older equivalents.

As an old database developer, I'm not sure yet how far this API will provide the same flexibility as good old SQL, but it's certainly an interesting and potentially useful option

My notebook is available for download, and you'll need to install Apache Spark version 1.3 and IPython Notebook (I use the excellent Anaconda Python distribution which includes IPython Notebook).

The download includes my sample CSV file, but you should be able to adapt the code to work with your own CSV data fairly easily. Enjoy!

Monday 23 February 2015

Apache Spark and IPython Notebook - simple demo

I've been playing around with Apache Spark for a while now, mainly using Scala. A couple of my colleagues are interested in learning about Spark as well, but they're data scientists, not developers, and they are more comfortable using Python. So far so good - Spark has a nifty Python shell and a comprehensive Python API, after all.

But what my colleagues really like is nice interactive tools without too much command-line voodoo. And they're already using IPython Notebook to give them all this goodness for their Python data science work.

Happily, it turns out you can use IPython Notebook with a local Apache Spark installation, so I wrote a quick demo notebook to illustrate how you can use Spark to do the traditional word count inside a notebook.

The repo is on BitBucket so you can do a git clone or download the code as a zip-file. It's just a single notebook, a data folder containing a couple of text files, and a sample shell-script for starting the notebook on Linux.

So if you're new to Spark with IPython Notebook, feel free to try it out.

Friday 16 January 2015

MongoDB User Authentication - the basics

Overview

When you're just starting out with MongoDB e.g. playing with it on your home PC, you don't need to worry about setting up database users etc, because authentication is disabled by default.

But once you start thinking about building an application with MongoDB, you'll probably need to think about which users should be able to read/write to which databases/collections. For example, ordinary users should probably only be able to read/write certain collections in a given database, while your application admin would need to be able to do things like create collections and create indexes etc in the application database.

Coming from an RDBMS background, I found MongoDB's user set-up a bit confusing, so I put together this post on how to set up a couple of users for an application database, where some users can only read/write in a specific database/collection. I hope this will help you to get started with MongoDB's approach to user permissions etc.

Remember that this is just a learning example, so don't rely on this for your production systems.

Make sure you refer to the MongoDB authentication guide for reliable and up-to-date advice on how to do this for real!

This example also relies on the localhost exception when enabling authentication and creating your first user, so make sure this is working on your server machine. You don't want to lock yourself out of your own database!

What we will do here

This post will show you how you can do the following:

enable authentication on a local MongoDB server
create a user administrator (you need to do this as soon as you enable authentication)
create a DBA super-user who can do anything
create an application-specific database
create an admin user for the application database only
create a couple of collections inside the application database
create an application-specific user role with read/write permissions for specific collections only
create an application-specific user with that role
check that this user can only access the specified collections

That should be enough to get you started with MongoDB user authentication.

Databases

admin: this is where you define privileges that apply to any database.
myappdb: an application-specific database where you want to define local user privileges.

Roles

root: can do anything.
userAdminAnyDatabase: can manage users on any database.
dbOwner: can manage the relevant DB (we use this in myappdb).
myAppUserRole: has limited access to specific collections in in myappdb.

Users

siteUserAdmin: this user can manage users across the MongoDB server.
superuser: this user can do anything on any DB via the “root” role.
myappowner: this user manages the “myappdb” database via the “dbOwner” role.
bob: this user has the “myAppUserRole” role within “myappdb“ only.

Enable authentication on MongoDB server

See http://docs.mongodb.org/manual/tutorial/enableauthentication/ for detailed instructions.

Modify configuration to enable authentication

This is based on a Ubuntu server (other platforms may have slightly different config files).

Stop MongoDB:

sudo service mongod stop

Edit /etc/mongod.conf (in Ubuntu) and set auth=true.

This is the simplest authentication method, but you can also set up SSH authentication via keyfiles, as described in the MongoDB documentation (see above).

Restart MongoDB:

sudo service mongod start

Create default user administrator (must connect via localhost)

Connect to MongoDB shell on the DB server (localhost):

mongo

Switch to admin DB and create the site user administrator:

use admin

db.createUser({

user: "siteUserAdmin",

pwd: "secret",

roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]

})

Reconnect to MongoDB shell on (localhost) DB server with this user:

mongo -u siteUserAdmin -p secret --authenticationDatabase admin

Create a super-user

Log into the MongoDB shell as the site user administrator (as above).

Switch to admin DB and create a “root” user called “superuser”:

use admin

db.createUser(

{

user: "superuser",

pwd: "123456",

roles: [ "root" ]

}

)

Set up authentication for your application

Create an application database

Connect to MongoDB shell as the superuser (authenticated via admin DB)

mongo -u superuser -p 123456 --authenticationDatabase admin

Create your application database:

use myappdb

Create a DB owner for the application database

Make sure you are logged in as superuser and working in myappdb (see above).

Create the application owner called “myappowner” for this database:

db.createUser(

{ user: "myappowner",

pwd: "secret",

roles:["dbOwner"]})

Reconnect to MongoDB as the DB owner (authenticated via your application DB):

mongo -u myappowner -p secret --authenticationDatabase myappdb

Create collections in application database

use myappdb

Create a collection that we will later define as readonly for ordinary users:

db.myappreadonly.insert({"Foo":"Bar"})

Check it worked:

db.myappreadonly.find()

{ "_id" : ObjectId("54b7a63850d47a6b57031616"), "Foo" : "Bar" }

Now create a readwrite collection:

db.myappreadwrite.insert({"Baz":"Bax"})

Check it worked:

db.myappreadwrite.find()

{ "_id" : ObjectId("54b7a67b50d47a6b57031617"), "Baz" : "Bax" }

Create an application-specific user role

Make sure you are logged in as application owner (see above) and using myappdb.

Now create a local user role on this DB, with appropriate access to the collections we

created above:

db.createRole({

role: "myAppUserRole",

privileges: [

{ resource: { db: "myappdb",collection:"myappreadonly"},

actions: ["find"]},

{ resource: { db: "myappdb", collection: "myappreadwrite"},

actions: ["find","update","insert"]}

], roles:[]})

Create a specific user with this role:

db.createUser(

{user:"bob",

pwd: "secret",

roles:["myAppUserRole"]})

Check your application user has correct permissions

Reconnect as this user and switch to myappdb:

mongo -u bob -p secret --authenticationDatabase myappdb

use myappdb

See if you can read the collections correctly

db.myappreadonly.find()

{ "_id" : ObjectId("54b7a63850d47a6b57031616"), "Foo" : "Bar" }

db.myappreadwrite.find()

{ "_id" : ObjectId("54b7a67b50d47a6b57031617"), "Baz" : "Bax" }

You should be able to insert into the readwrite collection:

db.myappreadwrite.insert({"bob":"allowed"})

WriteResult({ "nInserted" : 1 })

db.myappreadwrite.find()

{ "_id" : ObjectId("54b7a67b50d47a6b57031617"), "Baz" : "Bax" }

{ "_id" : ObjectId("54b7a98099923fa639b3c70f"), "bob" :

"allowed" }

You should not be able to insert into the readonly collection:

db.myappreadonly.insert({"bob":"not allowed"})

WriteResult({

" writeError " : {

"code" : 13,

"errmsg" : "not authorized on myappdb to execute

command { insert: \"myappreadonly\",

documents: [ { _id: ObjectId('54b7a97299923fa639b3c70e'),

bob: \"not allowed\" } ],

ordered: true }" }})

Check the application user cannot do anything else

e.g. cannot show all collections in myappdb:

show collections

20150115T11:49:29.040+0000 error: {

"$err" : "not authorized for query on

myappdb.system.namespaces",

"code" : 13} at src/mongo/shell/query.js:131

e.g. cannot create collections or insert data on another DB:

use test

switched to db test

db.bobcoll.insert({"bob":"bad"})

WriteResult({

"writeError" : {

"code" : 13,

"errmsg" : "not authorized on test to execute command

{ insert: \"bobcoll\", documents: [ { _id:

ObjectId('54b7aa6699923fa639b3c710'), bob: \"bad\" } ],

ordered: true }"

}

})

Conclusion

Using these techniques, it should be possible to set up a simple set of users/roles to manage basic access to your database and application.
MongoDB provides many more roles, e.g. for managing access to clusters, but these are beyond the scope of this simple introduction.
See the MongoDB manual for comprehensive advice on security.

Thursday 11 December 2014

Scala Exchange 2014 - impressions

Well, I just got back from Scala Exchange 2014 and I enjoyed the conference so much I decided to resurrect this dormant blog and write about it.

What's Scala Exchange?

Scala Exchange is billed as the biggest annual Scala conference in Europe, which is probably true as there aren't many competitors for that particular niche anyway. I read somewhere that there were around 500 attending this year, with around 50 speakers, and it seems to be growing every year. The event was organised by Skills Matter, who are very active on the London tech scene, supporting conferences, meet-ups, training courses etc, and this experience showed. The event ran pretty smoothly and the venue was well chosen, although a couple of talks were standing-room-only. I thought the conference was very good overall, but you can check out the videos of the talks for yourself as Skills Matter is putting them online as "skillscasts" (free registration may be required to access them).

Keynotes

Martin Odersky talking about work on the Scala compiler to improve binary compatibility across versions.
Lars Hupel talking about the Type Level project, which provides alternatives and extensions to the standard Scala libraries.
Dean Wampler on big data and Scala.
Runar Bjarnason on why functional programming is good for you.

This range of topics was fairly representative of the conference as a whole, with plenty of stuff for the hard-core FP language nerds, but also a reasonable selection of talks aimed at the "unwashed masses" (like yours truly) who are just beginning their exploration of Scala.

Scala meets the unwashed masses

Scala has a (not entirely unfair) reputation as a difficult language with a steep learning curve and some parts of the Scala community have occasionally appeared rather academic or unwelcoming to novices. I think this is largely historical baggage and doesn't really represent my own experience as a relative newcomer to Scala, but Scala is certainly harder to start out with than, say, Python.

This tension between the potential complexity and sophistication of many aspects of the language (I found the talk on type classes entirely mystifying!), and the practical challenges for ordinary application developers trying to learn Scala and apply it to relatively mundane applications, remains a prominent issue for the community. During the panel discussion, there were a number of comments on the need to encourage wider adoption of Scala and how best to achieve this.

Martin Odersky pointed out that this is partly a consequence of Scala's increasing adoption by the "unwashed masses", an ironic reference to the many thousands of people who have been encouraged to explore Scala, not least through his own massively successful course on Coursera. Odersky clearly recognises that as Scala moves away from its academic roots and into the wider industry, it's inevitably going to encounter a new audience of developers with a far broader range of skills, experience and expectations.

This discussion also revealed an interesting divergence of opinion between some of the panel members, where Miles Sabin (Type Level guru) felt strongly that Scala should drop the standard library and allow users simply to pick their own libraries, while Martin Odersky was equally firm in his view that Scala should come with "batteries included". As a new Scala developer and one of the "unwashed masses", I'm definitely with Martin on this one - the last thing I want when I'm learning a language like Scala is to have to build my own set of libraries before I can even start doing anything practical.

Scala is the new Linux?

To be honest, this debate reminded me of the way some hard-core Linux hackers look down on people who want a GUI front-end on their Linux machines - "Why do you want a pre-packaged GUI desktop when you can pick your own?". But a computer is just a tool, and its only value is in how well it enables the user to achieve their goals. If a nice GUI - or a standard library - will help me deliver what my customers need more easily and more effectively, then that's what the tool should provide.

On the plus side, there were several talks specifically aimed at people who are only just starting to adopt Scala, with some useful tips on key language features to explore, and those to ignore for now. When in doubt, Dick Wall's three laws of code seemed like a pretty good rule of thumb:

Code must work
Code must be simple
Code can be clever, but not at the expense of rules 1 and 2

I think quite a few Scala gurus could benefit from those simple rules too!

Scaling down with Scala

One feature of a number of talks was the focus on using Scala for relatively low-end applications. We've seen plenty of talk about the benefits of using Scala for large scale concurrency and Big Data, but it was refreshing to hear about people who are using it for smaller applications:

Rebeca Grenier talked about her experience at ""Eating Well" magazine in Vermont, where they replaced a traditional CMS (PHP, MySQL, Drupal) with the Typesafe stack (Scala Play etc) using Slick for database access.
Gary Higham described how BBC Children's websites are being moved from a PHP-based platform to Scala and Play, with a small team of developers who'd never worked with Scala (or even Java in most cases).
Peter Hilton (who co-wrote Play For Scala) presented a couple of case studies on using Play for rapidly implementing internal web applications for clients in the Netherlands, taking a pragmatic approach of selecting the easiest default options wherever possible e.g. no fancy JavaScript front-end, DB access with Slick, regular Play MVC approach etc.

I think this is an interesting potential niche for Scala i.e. for applications that might benefit from the stability and flexibility of the JVM platform, but where traditional Java EE would be massive overkill. Of course, there are other JVM-based tools that might offer this flexibility e.g. Grails, but Play offers a combination of reasonably easy implementation (if you stick to the happy path as Peter Hilton suggests) with a robust and scalable platform. As an old Oracle Forms developer, marooned for years in the wilderness of enterprise Java bloatware, I'm encouraged to see Rapid Application Development finally emerging again on the JVM.

But where are the women?

The Scala Exchange conference was a very positive experience from my perspective, but I was struck by the narrow demographic represented by the audience, who were almost entirely male and 25 to 50 years of age. And there were a lot of beards (mine included)!

This is a general problem in the IT industry, and I strongly suspect Scala is even more extreme in its demographic than the industry as a whole. If you look at the many online talks about Scala, there are very few women presenting and - if Scala Exchange is typical - very few (<5%?) women even attending these events.

I think the community still needs to do more to reach out to the "unwashed masses", and to women in particular, by breaking down some of the prejudices that still exist both within and towards the Scala community, so that next year's Scala Exchange audience might start to be more representative of the industry and wider world, and perhaps a little less like a 1970s Jethro Tull concert!

Saturday 7 May 2011

Love and graffiti

Graffiti seen today in Norwich: "Practise unconditional love"

Written underneath: "I will if you will"

Wednesday 27 April 2011

Eye contact

Female grizzly. Nekite River, BC, Canada

They say you shouldn't make eye contact with strangers in the big city, and don't stare at people – it's rude.

So there is a certain frisson when North America's second largest predator lifts its head from a leisurely breakfast of fresh sedge grass, along the banks of a tidal islet in the Nekite River estuary, and calmly looks you straight in the eye. Suddenly you are no longer the big kid on the block, as that appraising gaze looks you up and down and finds you of no great significance to its owner's plans for the day.

The female grizzly – 300 pounds of muscle honed by ferocious natural selection, with a naturally inquisitive mind trained by her mother's own formidable example – is in fact largely vegetarian at this time of year, springtime in the Great Bear Rainforest of coastal British Columbia. After months of hibernation, she is more concerned with feeding up on nutritious young plants than fulfilling tired stereotypes of the ravenous predator.

And she has another admirer. A young male grizzly, flushed with adolescent optimism, has been following her as she moves along the bank, hoping to be the lucky suitor when – if – she comes into season in the next few days. She seems largely untroubled by his attentions, but perhaps he is beginning to annoy her just a little, because she stops feeding and pauses to consider her options. And we are part of her calculations.

Responsible eco-tourism operators are careful to minimise the impact of visitors on the lives of the creatures we come to see. Yet sometimes those creatures surprise us with their own adaptability. Here on the Nekite, guides suggest that grizzly feeding patterns may have shifted slightly to take advantage of the morning and evening boat trips from the nearby floating lodge. Females and younger bears seem happy to feed on the estuary while being observed discreetly by visitors, but the larger males who normally dominate the feeding grounds seem less certain of how to respond to these feeble boat-borne intruders and tend to avoid peak viewing times. So it may be that carefully managed tourism can actually benefit the sections of the bear population that are most vulnerable to starvation early in the season.

And our female has another trick up her sleeve. The decisive moment has arrived, and she carefully lowers herself into the water and launches into a brisk doggy paddle across our bows towards the opposite bank. Just as she expected, her admirer seems torn between his own desire to follow her and a certain caution over approaching the visitors too closely, a fact the female hopes to exploit as she makes her getaway. But he is young and foolish and after a brief moment of doggy puzzlement, sat back on his haunches like a confused retriever, he too launches into the river and pursues his love to pastures new, disappearing after her into the shoulder-high grasses of the opposite shore.

Omnia vincit amor.